Noeud:Bison as a Grammar Checker, Noeud « Next »:, Noeud « Previous »:What is Bison, Noeud « Up »:Parsing



Bison as a Grammar Checker

Bison is dedicated to artificial languages, which, contrary to natural languages, must be absolutely unambiguous. More precisely it accepts only LALR(1) grammars, which formal definition amounts exactly to "parsable with a Bison parser". Although there are unambiguous grammars that are not LALR(1), we will henceforth use "ambiguous", or "insane", to mean "not parsable with Bison".

While the main purpose of Bison is to generate parsers, since writing good grammars is not an easy task, it proves itself very useful as a grammar checker (e.g., checking it is unambiguous). Again, a plain "error: grammar is insane" would be an information close to useless, fortunately Bison then provides means (i) to understand the insanity of a grammar, (ii) to solve typical ambiguities conveniently.

How exactly does an ambiguity reveal itself to Bison? We saw that Bison parsers are simple machines shifting tokens and reducing rules. As long as these machines know exactly what step to perform, the grammar is obviously sane. In other words, on an insane grammar it will sometimes hesitate between different actions: should I shift, or should I reduce? And if I reduce, which rule should I reduce?

In Looking for Arithmetics, we demonstrated that the naive implementation arithmetics is ambiguous, even if reduced to subtraction. The following Bison file focuses on it. The mark %% is needed for Bison to know where the grammar starts:

     %%
     expr: expr '-' expr | "number";
     
     Example 7.8: arith-1.y -- A Shift/Reduce Conflict
     

Running bison on it, as expected, ends sadly:

     $ bison arith-1.y
     error-->arith-1.y contains 1 shift/reduce conflict.
     

As announced, ambiguities lead Bison to hesitate between several actions, here 1 shift/reduce conflict stands for "there is a state from which I could either shift another token, or reduce a rule". But which? You may ask bison for additional details:

     $ bison --verbose arith-1.y
     error-->arith-1.y contains 1 shift/reduce conflict.
     

which will output the file arith-1.output1:

     State 4 contains 1 shift/reduce conflict.
     
     Grammar
         rule 0    $axiom -> expr $
         rule 1    expr   -> expr '-' expr
         rule 2    expr   -> "number"
     
     Terminals, with rules where they appear
         $         (-1)
         '-'       (45) 1
         error    (256)
         "number" (257) 2
     
     Nonterminals, with rules where they appear
         expr (5) on left: 1 2, on right: 1
     
     State 0    $axion -> . expr $           (rule 0)
                expr   -> . expr '-' expr    (rule 1)
                expr   -> . "number"         (rule 2)
                "number"    shift, and go to state 1
                expr        go to state 2
     
     State 1    expr  ->  "number" .         (rule 2)
                $default    reduce using rule 2 (expr)
     
     State 2    $axiom -> expr . $           (rule 0)
                expr   -> expr . '-' expr    (rule 1)
                $           go to state 5
                '-'         shift, and go to state 3
     
     State 3    expr -> expr '-' . expr      (rule 1)
                "number"    shift, and go to state 1
                expr        go to state 4
     
     State 4    expr -> expr . '-' expr      (rule 1)
                expr -> expr '-' expr .      (rule 1)
                '-'         shift, and go to state 3
                '-'         [reduce using rule 1 (expr)]
                $default    reduce using rule 1 (expr)
     
     State 5    $axiom -> expr . $           (rule 0)
                $           shift, and go to state 6
     
     State 6    $axiom -> expr $ .           (rule 0)
                $default    accept
     

You shouldn't be frightened by the contents: aside from being applied to a different grammar, this is nothing but the FSR we presented in example 7.2. Instead of being presented graphically, it is described by the list of its states and the actions that can be performed from each state. Another difference is that each state is represented by the degree of recognition of the various rules, instead of regular expressions.

For instance the state 3 is characterized by the fact that we recognized the first two symbols of expr '-' expr, which is denoted expr -> expr '-' . expr. In that state, if we see a number, we shift it, and go to state 1. If we see an expr, we go to state 4. The reader is suggested to draw this automaton.

Bison draws our attention on the state 4, for "State 4 contains 1 shift/reduce conflict". Indeed, there are two possible actions when the lookahead, the next token in the input, is -:

     State 4    expr  ->  expr . '-' expr    (rule 1)
                expr  ->  expr '-' expr .    (rule 1)
                '-'         shift, and go to state 3
                '-'         [reduce using rule 1 (expr)]
                $default    reduce using rule 1 (expr)
     
     Example 7.9: State 4 contains 1 shift/reduce conflict
     

We find again the ambiguity of our grammar: when reading number - number - number, which leads to the stack being expr '-' expr, when finding another -, it doesn't know if it should:

Well meaning, it even chose an alternative for us, shifting, which Bison signifies with the square brackets: possibility [reduce using rule 1 (expr)] will not be followed. Too bad, that's precisely not the choice we need...

The cure is well known: implement the grammar of the example 7.6. The reader is invited to implement it in Bison, and check the output file. In fact, the only difference between the two output files is that the state 4 is now:

     State 4    expr  ->  expr '-' "number" .    (rule 1)
                $default    reduce using rule 1 (expr)
     

With some practice, you will soon be able to understand how to react to conflicts, spotting where they happen, and find another grammar which is unambiguous. Unfortunately...


Notes de bas de page

  1. In addition to minor formatting changes, this output has been modified. In particular the rule 0, which Bison hides, is made explicit.