Noeud:Simple Uses of Bison, Noeud « Next »:Using Actions, Noeud « Previous »:Resolving Conflicts, Noeud « Up »:Parsing
Bison is a source generator, just as Gperf, Flex, and others. It takes a list of rules and actions, and produces a function triggering the action associated to the reduced rule. The input syntax allows for the traditional prologue, containing Bison directives and possibly some user declarations and initializations, and the epilogue, typically additional functions:
%{ user-prologue %} bison-directives %% /* Comments. */ lfs: rhs-1 { action-1 } | rhs-2 { action-2 } | ... ; ... %% user-epilogue
Example 7.17: Structure of a Bison Input File
When run on a file foo.y
, bison
produces a file named
foo.tab.c
and possibly foo.tab.h
, foo.output
,
etc. This atrocious naming convention is a mongrel between the
POSIX standards on the one hand requiring the output to be
always named y.tab.c
, and on the other hand the more
logical foo.c
. You may set the output file name thanks to the
%output="
parser-filename"
directive.
This file is a C source including, in addition to your user-prologue and user-epilogue, one function:
int yyparse () | Fonction |
Parse the whole stream of tokens delivered by successive calls to
yylex . Trigger the action associated to the reduced rule.
Return 0 for success, and nonzero upon failures (such as parse errors). |
You must provide yylex
, but also yyerror
, used to report
parse errors.
For a start, we will implement the example 7.1. Because writing a scanner, which splits the input into tokens, is a job in itself, see Scanning with Gperf and Flex, we will simply rely on the command line arguments. In other words, we use the shell as a scanner (or rather, we let the user fight with its shells to explain where start and end the tokens).
%{ /* -*- C -*- */ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <error.h> static int yylex (void); static void yyerror (const char *); %} %output="brackens-1.c" %% expr: '(' expr ')' | 'N' ; %% /* The current argument. */ static const char **args = NULL; static int yylex (void) { /* No token stands for end of file. */ if (!*++args) return 0; /* Parens stand for themselves. `N' denotes a number. */ if (strlen (*args) == 1 && strchr ("()N", **args)) return **args; /* An error. */ error (EXIT_FAILURE, 0, "invalid argument: %s", *args); /* Pacify GCC that knows ERROR may return. */ return -1; } void yyerror (const char *msg) { error (EXIT_FAILURE, 0, "%s", msg); } int main (int argc, const char *argv[]) { args = argv; return yyparse (); }
Example 7.18: brackens-1.y
-- Nested Parentheses Checker
Which we compile and run:
$ bison brackens-1.y $ gcc -Wall brackens-1.c -o brackens-1 $ ./brackens-1 '(' 'N' ')' $ ./brackens-1 '(' 'n' ')' error-->./brackens-1: invalid argument: n $ ./brackens-1 '(' '(' 'N' ')' ')' $ ./brackens-1 '(' '(' 'N' ')' ')' ')' error-->./brackens-1: parse error
It works quite well! Unfortunately, when given an invalid input is it
quite hard to find out where the actual error is. Fortunately you can
ask Bison parsers to provide more precise information by defining
YYERROR_VERBOSE
to 1. You can also use the Bison directive
%debug
which introduces a variable, yydebug
, to make your
parser verbose:
%{ /* -*- C -*- */ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <error.h> static int yylex (void); static void yyyerror (const char *msg); #define YYERROR_VERBOSE 1 %} %debug %output="brackens-2.c" %% expr: '(' expr ')' | 'N' ; %% int main (int argc, const char *argv[]) { yydebug = getenv ("YYDEBUG") ? 1 : 0; args = argv; return yyparse (); }
Example 7.19: brackens-2.y
-- Nested Parentheses Checker
which, when run, produces:
$ ./brackens-2 '(' '(' 'N' ')' ')' ')' ./brackens-2: parse error, unexpected ')', expecting $
Wow! The error message is much better! But it is still not clear which one of the arguments is provoking the error. Asking traces from the parser gives some indication:
$ YYDEBUG=1 ./brackens-2 '(' '(' 'N' ')' ')' ')' Starting parse Entering state 0 Reading a token: Next token is 40 ('(') Shifting token 40 ('('), Entering state 1 Reading a token: Next token is 40 ('(') Shifting token 40 ('('), Entering state 1 Reading a token: Next token is 78 ('N') Shifting token 78 ('N'), Entering state 2 Reducing via rule 2 (line 32), 'N' -> expr state stack now 0 1 1 Entering state 3 Reading a token: Next token is 41 (')') Shifting token 41 (')'), Entering state 4 Reducing via rule 1 (line 32), '(' expr ')' -> expr state stack now 0 1 Entering state 3 Reading a token: Next token is 41 (')') Shifting token 41 (')'), Entering state 4 Reducing via rule 1 (line 32), '(' expr ')' -> expr state stack now 0 Entering state 5 Reading a token: Next token is 41 (')') error-->./brackens-2: parse error, unexpected ')', expecting $
The reader is invited to compare these traces with the "execution" by hand of the example 7.3: it is just the same!
Two things are worth remembering. First, always try to produce precise
error messages, since once an error is diagnosed, the user still has to
locate it in order to fix it. I, for one, don't like tools that merely
report the line where the error occurred, because often several
very similar expressions within that line can be responsible for the
error. In that case, I often split my line into severals, recompile,
find the culprit, educate it, and join again these lines... And second,
never write dummy printf
to check that your parser works: it is a
pure waste of time, since Bison does this on command, and produces
extremely readable traces. Take some time to read the example above,
and in particular spot on the one hand side the tokens and the shifts
and on the other hand the reductions and the rules. But forget about
"states", for the time being.