Programming with GNU Software

Noeud:Simple Uses of Bison, Noeud Ť Next ť:Using Actions, Noeud Ť Previous ť:Resolving Conflicts, Noeud Ť Up ť:Parsing

Simple Uses of Bison

Bison is a source generator, just as Gperf, Flex, and others. It takes a list of rules and actions, and produces a function triggering the action associated to the reduced rule. The input syntax allows for the traditional prologue, containing Bison directives and possibly some user declarations and initializations, and the epilogue, typically additional functions:

     %{
       user-prologue
     %}
     bison-directives
     %%
     /* Comments. */
     lfs:
         rhs-1 { action-1 }
       | rhs-2 { action-2 }
       | ...
       ;
     ...
     %%
     user-epilogue

     Example 7.17: Structure of a Bison Input File

When run on a file foo.y, bison produces a file named foo.tab.c and possibly foo.tab.h, foo.output, etc. This atrocious naming convention is a mongrel between the POSIX standards on the one hand requiring the output to be always named y.tab.c, and on the other hand the more logical foo.c. You may set the output file name thanks to the %output="parser-filename" directive.

This file is a C source including, in addition to your user-prologue and user-epilogue, one function:

int yyparse () Fonction

Parse the whole stream of tokens delivered by successive calls to yylex. Trigger the action associated to the reduced rule.
Return 0 for success, and nonzero upon failures (such as parse errors).

You must provide yylex, but also yyerror, used to report parse errors.

For a start, we will implement the example 7.1. Because writing a scanner, which splits the input into tokens, is a job in itself, see Scanning with Gperf and Flex, we will simply rely on the command line arguments. In other words, we use the shell as a scanner (or rather, we let the user fight with its shells to explain where start and end the tokens).

     %{ /* -*- C -*- */
     #include <stdio.h>
     #include <stdlib.h>
     #include <string.h>
     #include <error.h>
     
     static int yylex (void);
     static void yyerror (const char *);
     %}
     %output="brackens-1.c"
     %%
     expr: '(' expr ')'
         | 'N'
         ;
     %%
     /* The current argument. */
     static const char **args = NULL;
     
     static int
     yylex (void)
     {
       /* No token stands for end of file. */
       if (!*++args)
         return 0;
       /* Parens stand for themselves.  `N' denotes a number. */
       if (strlen (*args) == 1 && strchr ("()N", **args))
         return **args;
       /* An error. */
       error (EXIT_FAILURE, 0, "invalid argument: %s", *args);
       /* Pacify GCC that knows ERROR may return. */
       return -1;
     }
     
     void
     yyerror (const char *msg)
     {
       error (EXIT_FAILURE, 0, "%s", msg);
     }
     
     int
     main (int argc, const char *argv[])
     {
       args = argv;
       return yyparse ();
     }

     Example 7.18: brackens-1.y -- Nested Parentheses Checker

Which we compile and run:

     $ bison brackens-1.y
     $ gcc -Wall brackens-1.c -o brackens-1
     $ ./brackens-1 '(' 'N' ')'
     $ ./brackens-1 '(' 'n' ')'
     error-->./brackens-1: invalid argument: n
     $ ./brackens-1 '(' '(' 'N' ')' ')'
     $ ./brackens-1 '(' '(' 'N' ')' ')' ')'
     error-->./brackens-1: parse error

It works quite well! Unfortunately, when given an invalid input is it quite hard to find out where the actual error is. Fortunately you can ask Bison parsers to provide more precise information by defining YYERROR_VERBOSE to 1. You can also use the Bison directive %debug which introduces a variable, yydebug, to make your parser verbose:

     %{ /* -*- C -*- */
     #include <stdio.h>
     #include <stdlib.h>
     #include <string.h>
     #include <error.h>
     
     static int yylex (void);
     static void yyyerror (const char *msg);
     #define YYERROR_VERBOSE 1
     %}
     %debug
     %output="brackens-2.c"
     %%
     expr: '(' expr ')'
         | 'N'
         ;
     %%
     int
     main (int argc, const char *argv[])
     {
       yydebug = getenv ("YYDEBUG") ? 1 : 0;
       args = argv;
       return yyparse ();
     }

     Example 7.19: brackens-2.y -- Nested Parentheses Checker

which, when run, produces:

     $ ./brackens-2 '(' '(' 'N' ')' ')' ')'
     ./brackens-2: parse error, unexpected ')', expecting $

Wow! The error message is much better! But it is still not clear which one of the arguments is provoking the error. Asking traces from the parser gives some indication:

     $ YYDEBUG=1 ./brackens-2 '(' '(' 'N' ')' ')' ')'
     Starting parse
     Entering state 0
     Reading a token: Next token is 40 ('(')
     Shifting token 40 ('('), Entering state 1
     Reading a token: Next token is 40 ('(')
     Shifting token 40 ('('), Entering state 1
     Reading a token: Next token is 78 ('N')
     Shifting token 78 ('N'), Entering state 2
     Reducing via rule 2 (line 32), 'N'  -> expr
     state stack now 0 1 1
     Entering state 3
     Reading a token: Next token is 41 (')')
     Shifting token 41 (')'), Entering state 4
     Reducing via rule 1 (line 32), '(' expr ')'  -> expr
     state stack now 0 1
     Entering state 3
     Reading a token: Next token is 41 (')')
     Shifting token 41 (')'), Entering state 4
     Reducing via rule 1 (line 32), '(' expr ')'  -> expr
     state stack now 0
     Entering state 5
     Reading a token: Next token is 41 (')')
     error-->./brackens-2: parse error, unexpected ')', expecting $

The reader is invited to compare these traces with the "execution" by hand of the example 7.3: it is just the same!

Two things are worth remembering. First, always try to produce precise error messages, since once an error is diagnosed, the user still has to locate it in order to fix it. I, for one, don't like tools that merely report the line where the error occurred, because often several very similar expressions within that line can be responsible for the error. In that case, I often split my line into severals, recompile, find the culprit, educate it, and join again these lines... And second, never write dummy printf to check that your parser works: it is a pure waste of time, since Bison does this on command, and produces extremely readable traces. Take some time to read the example above, and in particular spot on the one hand side the tokens and the shifts and on the other hand the reductions and the rules. But forget about "states", for the time being.