Noeud:Bison as a Grammar Checker, Noeud « Next »:Resolving Conflicts, Noeud « Previous »:What is Bison, Noeud « Up »:Parsing
Bison is dedicated to artificial languages, which, contrary to natural languages, must be absolutely unambiguous. More precisely it accepts only LALR(1) grammars, which formal definition amounts exactly to "parsable with a Bison parser". Although there are unambiguous grammars that are not LALR(1), we will henceforth use "ambiguous", or "insane", to mean "not parsable with Bison".
While the main purpose of Bison is to generate parsers, since writing good grammars is not an easy task, it proves itself very useful as a grammar checker (e.g., checking it is unambiguous). Again, a plain "error: grammar is insane" would be an information close to useless, fortunately Bison then provides means (i) to understand the insanity of a grammar, (ii) to solve typical ambiguities conveniently.
How exactly does an ambiguity reveal itself to Bison? We saw that Bison parsers are simple machines shifting tokens and reducing rules. As long as these machines know exactly what step to perform, the grammar is obviously sane. In other words, on an insane grammar it will sometimes hesitate between different actions: should I shift, or should I reduce? And if I reduce, which rule should I reduce?
In Looking for Arithmetics, we demonstrated that the naive
implementation arithmetics is ambiguous, even if reduced to subtraction.
The following Bison file focuses on it. The mark %%
is needed
for Bison to know where the grammar starts:
%% expr: expr '-' expr | "number";
Example 7.8: arith-1.y
-- A Shift/Reduce Conflict
Running bison
on it, as expected, ends sadly:
$ bison arith-1.y error-->arith-1.y contains 1 shift/reduce conflict.
As announced, ambiguities lead Bison to hesitate between several
actions, here 1 shift/reduce conflict
stands for "there is a
state from which I could either shift another token, or reduce a rule".
But which? You may ask bison
for additional details:
$ bison --verbose arith-1.y error-->arith-1.y contains 1 shift/reduce conflict.
which will output the file arith-1.output
1:
State 4 contains 1 shift/reduce conflict. Grammar rule 0 $axiom -> expr $ rule 1 expr -> expr '-' expr rule 2 expr -> "number" Terminals, with rules where they appear $ (-1) '-' (45) 1 error (256) "number" (257) 2 Nonterminals, with rules where they appear expr (5) on left: 1 2, on right: 1 State 0 $axion -> . expr $ (rule 0) expr -> . expr '-' expr (rule 1) expr -> . "number" (rule 2) "number" shift, and go to state 1 expr go to state 2 State 1 expr -> "number" . (rule 2) $default reduce using rule 2 (expr) State 2 $axiom -> expr . $ (rule 0) expr -> expr . '-' expr (rule 1) $ go to state 5 '-' shift, and go to state 3 State 3 expr -> expr '-' . expr (rule 1) "number" shift, and go to state 1 expr go to state 4 State 4 expr -> expr . '-' expr (rule 1) expr -> expr '-' expr . (rule 1) '-' shift, and go to state 3 '-' [reduce using rule 1 (expr)] $default reduce using rule 1 (expr) State 5 $axiom -> expr . $ (rule 0) $ shift, and go to state 6 State 6 $axiom -> expr $ . (rule 0) $default accept
You shouldn't be frightened by the contents: aside from being applied to a different grammar, this is nothing but the FSR we presented in example 7.2. Instead of being presented graphically, it is described by the list of its states and the actions that can be performed from each state. Another difference is that each state is represented by the degree of recognition of the various rules, instead of regular expressions.
For instance the state 3 is characterized by the fact that we recognized
the first two symbols of expr '-' expr
, which is denoted
expr -> expr '-' . expr
. In that state, if we see a
number
, we shift it, and go to state 1. If we see an
expr
, we go to state 4. The reader is suggested to draw this
automaton.
Bison draws our attention on the state 4, for "State 4 contains 1
shift/reduce conflict". Indeed, there are two possible actions when
the lookahead, the next token in the input, is -
:
State 4 expr -> expr . '-' expr (rule 1) expr -> expr '-' expr . (rule 1) '-' shift, and go to state 3 '-' [reduce using rule 1 (expr)] $default reduce using rule 1 (expr)
Example 7.9: State 4 contains 1 shift/reduce conflict
We find again the ambiguity of our grammar: when reading number -
number - number
, which leads to the stack being expr '-' expr
,
when finding another -
, it doesn't know if it should:
shift, and go to state 3
), which results in
Stack | Input | shift 3
| |
0 expr 2 - 3 expr 4 | - number $ | shift 1
| |
0 expr 2 - 3 expr 4 - 3 | number $ | shift 1
| |
0 expr 2 - 3 expr 4 - 3 number 1 | $ | reduce 2
| |
0 expr 2 - 3 expr 4 - 3 expr | $ | go to 4
| |
0 expr 2 - 3 expr 4 - 3 expr 4 | $ | reduce 1
| |
0 expr 2 - 3 expr | $ | go to 4
| |
0 expr 2 - 3 expr 4 | $ | reduce 1
| |
0 expr | $ | go to 5
| |
0 expr 5 | $ | shift 6
| |
0 expr 5 $ 6 | | accept
|
i.e., grouping the last two expressions, in other words understanding
number - (number - number)
, or
[reduce using rule 1 (expr)]
), which results in
0 expr 2 - 3 expr 4 | - number $ | reduce 1
| |
0 expr | - number $ | go to 2
| |
0 expr 2 | - number $ | shift 3
| |
0 expr 2 - 3 | number $ | shift 1
| |
0 expr 2 - 3 number 1 | $ | reduce 2
| |
0 expr 2 - 3 expr | $ | goto 4
| |
0 expr 2 - 3 expr 4 | $ | reduce 1
| |
0 expr | $ | go to 5
| |
0 expr 5 | $ | shift 6
| |
0 expr 5 $ 6 | | accept
|
i.e., grouping the first two expressions, in other words understanding
(number - number) - number
.
Well meaning, it even chose an alternative for us, shifting, which Bison
signifies with the square brackets: possibility [reduce using rule
1 (expr)]
will not be followed. Too bad, that's precisely not the
choice we need...
The cure is well known: implement the grammar of the example 7.6. The reader is invited to implement it in Bison, and check the output file. In fact, the only difference between the two output files is that the state 4 is now:
State 4 expr -> expr '-' "number" . (rule 1) $default reduce using rule 1 (expr)
With some practice, you will soon be able to understand how to react to conflicts, spotting where they happen, and find another grammar which is unambiguous. Unfortunately...