Noeud:Start Conditions, Noeud « Next »:, Noeud « Previous »:Using Flex, Noeud « Up »:Scanning with Flex



Start Conditions

Non keywords often need some form of conversion: strings of digits are converted into integers, and so on. This conversion often involves another scanning of the token, for instance to convert the escapes, e.g., \n, into character literals. Writing this scanner by hand is easy, but frustrating.

Sometimes one is limited by the theory itself: imagine your language supports nested comments. It is easily proven that a language of balanced parentheses1 cannot be described by regular expressions. Indeed, this would imply the existence of an FSR, with say q states. Then if we overflow its memory with more than q opening parentheses, it completely loses its count. Therefore there cannot be such an FSR, hence no regular expression, thus we are stuck! Nevertheless it would have been very easy to write a scanner solely tracking /* and */ and throwing away any other string.

Our scanners are nothing but automata, such as in the example 6.9. We could solve the two problems above simply if we could join the corresponding FSR to a new initial state labelled with some conditions:

                   if (in_body)   ,-----------------.
                ,---------------->|  body  scanner  |
               /                  `-----------------'
              /
        ,---./   if (in_comment)  ,-----------------.
     -->|   |-------------------->| comment scanner |
        `---'\                    `-----------------'
              \
           `g' \ if (in_string)   ,-----------------.
                `---------------->| string  scanner |
                                  `-----------------'
     
     Example 6.13: A Condition Driven FSR Combination
     

These are called start conditions. They allow to combine small parsers into a bigger one. The default start condition is named INITIAL, others can be introduced thanks to the Flex directive %x start-condition.... To set the current start condition, i.e., to select the eligible branch at the next run of the automaton, use BEGIN start-condition. This is not a form of return or goto, the execution proceeds normally in the current action.

Finally, to complete the description of the rules by their conditions, use either

     <start-condition, ...>pattern action
     

or

     <start-condition, ...>{
       pattern-1 action-1
       pattern-2 action-2
     }
     

Notes de bas de page

  1. ``Balanced parentheses'' is to be understood in its broadest sense: including begin/end, /*/*/ etc.