Noeud:Resolving Conflicts, Noeud « Next »:Simple Uses of Bison, Noeud « Previous »:Bison as a Grammar Checker, Noeud « Up »:Parsing
Unfortunately, naive grammars are often ambiguous. The naive grammar for arithmetics, see example 7.4, is well-known for its ambiguities, unfortunately finding an unambiguous equivalent grammar is quite a delicate task and requires some expertise:
expr: expr '+' term | expr '-' term | term; term: term '*' factor | term '/' factor | factor; factor: '(' expr ')' | "number";
Example 7.10: An Unambiguous Grammar for Arithmetics
Worse yet: it makes the grammar more intricate to write. For instance,
a famous ambiguity in the naive grammar for the C programming language
is the "dangling else
". Consider the following grammar
excerpt:
%% stmt: "if" "expr" stmt "else" stmt | "if" "expr" stmt | "stmt";
Example 7.11:ifelse-1.y
-- Ambiguous grammar forif
/else
in C
Although innocent looking, it is ambiguous and leads to two different interpretations of:
if (expr-1) if (expr-2) statement-1 else statement-2
depending whether statement-2 is bound to the first or second
if
. The C standard clearly mandates the second interpretation: a
"dangling else" is bound to the closest if
. How can one write
a grammar for such a case? Again, the answer is not obvious, and
requires some experience with grammars. It requires the distinction
between "closed statements", i.e., those which if
are saturated
(they have their pending else
), and non closed statements:
%% stmt: closed_stmt | non_closed_stmt; closed_stmt: "if" "expr" closed_stmt "else" closed_stmt | "stmt"; non_closed_stmt: "if" "expr" stmt | "if" "expr" closed_stmt "else" non_closed_stmt;
Example 7.12:ifelse-2.y
-- Unambiguous Grammar forif
/else
in C
And finally, since we introduce new rules, new symbols, our grammar gets
bigger, hence the automaton gets fat, therefore it becomes less
efficient (since modern processors cannot make full use of their caches
with big tables).
Rather, let's have a closer look at example 7.9. There are two actions fighting to get into
state 4: reducing the rule 1, or shifting the token -
. We have a
match between two opponents: if the rule wins, then -
is right
associative, if the token wins, then -
is left associative.
What we would like to say is that "shifting -
has priority over
reducing expr: expr '-' expr
". Bison is nice to us, and allows
us to inform it of the associativity of the operator, it will handle the
rest by itself:
%left '-' %% expr: expr '-' expr | "number";
Example 7.13:arith-3.y
-- Using%left
to Solve Shift/Reduce Conflicts
$ bison --verbose arith-3.y
This is encouraging, but won't be enough for us to solve the ambiguity of the example 7.5:
%left '+' %% expr: expr '*' expr | expr '+' expr | "number";
Example 7.14:vs.arith-4.y
-- An Ambiguous Grammar for*
vs.+
+
$ bison --verbose arith-4.y error-->arith-4.y contains 3 shift/reduce conflicts.
Diving into arith-4.output
will help us understand the problem:
Conflict in state 5 between rule 2 and token '+' resolved as reduce. State 5 contains 1 shift/reduce conflict. State 6 contains 2 shift/reduce conflicts. ... State 5 expr -> expr . '*' expr (rule 1) expr -> expr '*' expr . (rule 1) expr -> expr . '+' expr (rule 2) '*' shift, and go to state 3 '*' [reduce using rule 1 (expr)] $default reduce using rule 1 (expr) State 6 expr -> expr . '*' expr (rule 1) expr -> expr '*' expr . (rule 1) expr -> expr . '+' expr (rule 2) '+' shift, and go to state 3 '*' shift, and go to state 4 '+' [reduce using rule 1 (expr)] '*' [reduce using rule 1 (expr)] $default reduce using rule 1 (expr)
First note that it reported its understanding of %left '+'
: it is
a match opposing token +
against rule 2, and the rule is the
winner.
State 6, without any surprise, has a shift/reduce conflict because we
didn't define the associativity of *
, but states 5 and 6 have a
new problem. In state 5, after having read expr + expr
, in front
of a *
, should it shift it, or reduce expr + expr
?
Obviously it should reduce: the rule containing *
should have
precedence over the token +
. Similarly, in state 6, the token
*
should have precedence over the rule containing +
. Both
cases can be summarized as "*
has precedence over +
".
Bison allows you to specify precedences simply by listing the
%left
and %right
from the lowest precedence, to the
highest. Some operators have equal precedence, for instance series of
+
and -
must always be read from left to right: +
and -
have the same precedence. In this case, just put the two
operators under the same %left
.
Finally, we are now able to express our complete grammar of arithmetics:
%left '+' '-' %left '*' '/' %% expr: expr '*' expr | expr '/' expr | expr '+' expr | expr '-' expr | '(' expr ')' | "number";
Example 7.15: arith-5.y
-- Using Precedence to Solve
Shift/Reduce Conflicts
Shift/Reduce Conflicts
which is happily accepted by bison
:
$ bison --verbose arith-5.y
The reader is invited to:
--graph
--verbose
is --graph
, which outputs a
VCG graph, to be viewed with xvcg
. View the automata of
the various grammars presented.
%left '+' '-' %left '*' '/' %% expr: expr op expr | "number"; op: '+' | '-' | '*' | '/';
Example 7.16: arith-6.y
-- Failing to Use Precedence
$ bison arith-6.y error-->arith-6.y contains 4 shift/reduce conflicts.
Can you explain why? Bear in mind the nature of the opponents in a
shift/reduce match.
else
We are now fully equipped to implement real parsers.