Good afternoon, colleagues! When developing my web server, I encountered the following problem: I want to perform lexical analysis of the http request using state machines, and for this I need to know how to build an NCA based on the existing RFN 2616 BNF in the description of the HTTP protocol . For example, the request-line is described by products that include the output of the method, space, and URI and HTTP version and is set as follows:

 Request-Line = Method SP Request-URI SP HTTP-Version CRLF 

But the products for the description of `Method``:

 Method = "OPTIONS" | "GET" | "HEAD" | "POST" | "PUT" | "DELETE" | "TRACE" | "CONNECT" | extension-method 

Well, respectively, further it would be possible to give a description of the other products, but, I think, the meaning is clear. That is, we have products that consist of other products, which consist of other products, etc.

In general, how to build a finite state machine on the basis of this, I roughly imagine (the “frontal” solution). But still I would like to know about the existing classical solutions and algorithms for constructing a finite state machine (which will perform the lexer function) on the basis of a BNF or some kind of KS grammar.

PS Please do not offer solutions from the "box", I am interested in how to do it yourself.

  • Do you want a lexical analyzer or a grammar analyzer? - Mikhail Vaysman
  • I need a lexical analyzer - dischain
  • Just in case, I would add that under the “globally” solution I meant a hierarchical finite automaton. - dischain

0