Programming with GNU Software

Noeud:Resolving Conflicts, Noeud Ť Next ť:Simple Uses of Bison, Noeud Ť Previous ť:Bison as a Grammar Checker, Noeud Ť Up ť:Parsing

Resolving Conflicts

Unfortunately, naive grammars are often ambiguous. The naive grammar for arithmetics, see example 7.4, is well-known for its ambiguities, unfortunately finding an unambiguous equivalent grammar is quite a delicate task and requires some expertise:

     expr: expr '+' term
         | expr '-' term
         | term;
     term: term '*' factor
         | term '/' factor
         | factor;
     factor: '(' expr ')'
           | "number";

     Example 7.10: An Unambiguous Grammar for Arithmetics

Worse yet: it makes the grammar more intricate to write. For instance, a famous ambiguity in the naive grammar for the C programming language is the "dangling else". Consider the following grammar excerpt:

     %%
     stmt: "if" "expr" stmt "else" stmt
         | "if" "expr" stmt
         | "stmt";

     Example 7.11: ifelse-1.y -- Ambiguous grammar for if/else in C

Although innocent looking, it is ambiguous and leads to two different interpretations of:

     if   (expr-1)
     if   (expr-2)
          statement-1
     else statement-2

depending whether statement-2 is bound to the first or second if. The C standard clearly mandates the second interpretation: a "dangling else" is bound to the closest if. How can one write a grammar for such a case? Again, the answer is not obvious, and requires some experience with grammars. It requires the distinction between "closed statements", i.e., those which if are saturated (they have their pending else), and non closed statements:

     %%
     stmt: closed_stmt
         | non_closed_stmt;
     closed_stmt: "if" "expr" closed_stmt "else" closed_stmt
                | "stmt";
     non_closed_stmt: "if" "expr" stmt
                    | "if" "expr" closed_stmt "else" non_closed_stmt;

     Example 7.12: ifelse-2.y -- Unambiguous Grammar for if/else in C

And finally, since we introduce new rules, new symbols, our grammar gets bigger, hence the automaton gets fat, therefore it becomes less efficient (since modern processors cannot make full use of their caches with big tables).

Rather, let's have a closer look at example 7.9. There are two actions fighting to get into state 4: reducing the rule 1, or shifting the token -. We have a match between two opponents: if the rule wins, then - is right associative, if the token wins, then - is left associative.

What we would like to say is that "shifting - has priority over reducing expr: expr '-' expr". Bison is nice to us, and allows us to inform it of the associativity of the operator, it will handle the rest by itself:

     %left '-'
     %%
     expr: expr '-' expr | "number";

     Example 7.13: arith-3.y -- Using %left to Solve Shift/Reduce Conflicts

     $ bison --verbose arith-3.y

This is encouraging, but won't be enough for us to solve the ambiguity of the example 7.5:

     %left '+'
     %%
     expr: expr '*' expr | expr '+' expr | "number";

     Example 7.14: arith-4.y -- An Ambiguous Grammar for *
     vs. +

vs. +

     $ bison --verbose arith-4.y
     error-->arith-4.y contains 3 shift/reduce conflicts.

Diving into arith-4.output will help us understand the problem:

     Conflict in state 5 between rule 2 and token '+' resolved as reduce.
     State 5 contains 1 shift/reduce conflict.
     State 6 contains 2 shift/reduce conflicts.
     ...
     State 5     expr  ->  expr . '*' expr   (rule 1)
                 expr  ->  expr '*' expr .   (rule 1)
                 expr  ->  expr . '+' expr   (rule 2)
                 '*'         shift, and go to state 3
                 '*'         [reduce using rule 1 (expr)]
                 $default    reduce using rule 1 (expr)
     
     State 6     expr  ->  expr . '*' expr   (rule 1)
                 expr  ->  expr '*' expr .   (rule 1)
                 expr  ->  expr . '+' expr   (rule 2)
                 '+'         shift, and go to state 3
                 '*'         shift, and go to state 4
                 '+'         [reduce using rule 1 (expr)]
                 '*'         [reduce using rule 1 (expr)]
                 $default    reduce using rule 1 (expr)

First note that it reported its understanding of %left '+': it is a match opposing token + against rule 2, and the rule is the winner.

State 6, without any surprise, has a shift/reduce conflict because we didn't define the associativity of *, but states 5 and 6 have a new problem. In state 5, after having read expr + expr, in front of a *, should it shift it, or reduce expr + expr? Obviously it should reduce: the rule containing * should have precedence over the token +. Similarly, in state 6, the token * should have precedence over the rule containing +. Both cases can be summarized as "* has precedence over +".

Bison allows you to specify precedences simply by listing the %left and %right from the lowest precedence, to the highest. Some operators have equal precedence, for instance series of + and - must always be read from left to right: + and - have the same precedence. In this case, just put the two operators under the same %left.

Finally, we are now able to express our complete grammar of arithmetics:

     %left '+' '-'
     %left '*' '/'
     %%
     expr: expr '*' expr | expr '/' expr
         | expr '+' expr | expr '-' expr
         | '(' expr ')'
         | "number";

     Example 7.15: arith-5.y -- Using Precedence to Solve
     Shift/Reduce Conflicts

Shift/Reduce Conflicts

which is happily accepted by bison:

     $ bison --verbose arith-5.y

The reader is invited to:

Implement the grammar of the example 7.10

Read the output file, and see that indeed it requires a bigger automaton than the grammar of the example 7.15.

Play with the option --graph

A complement of --verbose is --graph, which outputs a VCG graph, to be viewed with xvcg. View the automata of the various grammars presented.

Explain the Failure of Factored Arithmetics

Bison refuses the following grammar:

          %left '+' '-'
          %left '*' '/'
          %%
          expr: expr op expr | "number";
          op: '+' | '-' | '*' | '/';

          Example 7.16: arith-6.y -- Failing to Use Precedence

          $ bison arith-6.y
          error-->arith-6.y contains 4 shift/reduce conflicts.

Can you explain why? Bear in mind the nature of the opponents in a shift/reduce match.

Solve the dangling else

The precedence of a rule is actually that of its last token. With this in mind, propose a simple implementation of the grammar of example 7.10 in Bison.

We are now fully equipped to implement real parsers.