Noeud:Using Actions, Noeud « Next »:, Noeud « Previous »:Simple Uses of Bison, Noeud « Up »:Parsing



Using Actions

Of course real applications need to process the input, not merely to validate it as in the previous section. For instance, when processing arithmetics, one wants to compute the result, not just say "this is a valid/invalid arithmetical expression".

We stem on several problems. First of all, up to now we managed to use single characters as tokens: the code of the character is used as token number. But all the integers must be mapped to a single token type, say INTEGER.

Bison provide the %token directive to declare token types:

     %token INTEGER FLOAT STRING
     

Secondly, tokens such as numbers have a value. To this end, Bison provides a global variable, yylval, which the scanner must fill with the appropriate value. But imagine our application also had to deal with strings, or floats etc.: we need to be able to specify several value types, and associate a value type to a token type.

To declare the different value types to Bison, use the %union directive. For instance:

     %union
     {
       int   ival;
       float fval;
       char *sval;
     }
     

This results in the definition of the type "token value": yystype1. The scanner needs the token types and token value types: if given --defines bison creates foo.h which contains their definition. Alternatively, you may use the %defines directive.

Then map token types to value types:

     %token <ival> INTEGER
     %token <fval> FLOAT
     %token <sval> STRING
     

But if tokens have a value, then so have some nonterminals! For instance, if iexpr denotes an integer expression, then we must also specify that (i) it has a value, (ii) of type INTEGER. To this end, use %type:

     %type <ival> iexpr
     %type <fval> fexpr
     %type <sval> sexpr
     

We already emphasized the importance of traces: together with the report produced thanks to the option --verbose, they are your best partners to debug a parser. Bison lets you improve the traces of tokens, for instance to output token values in addition to token types. To use this feature, just define the following macro:

YYPRINT (File, Type, Value) Macro
Output on File a description of the Value of a token of Type.

For various technical reasons, I suggest the following pattern of use:

     %{
     ...
     #define YYPRINT(File, Type, Value) yyprint (File, Type, &Value)
     static void yyprint (FILE *file, int type, const yystype *value);
     ...
     %}
     %%
     ...
     %%
     static void
     yyprint (FILE *file, int type, const yystype *value)
     {
       switch (type)
         {
         case INTEGER:
           fprintf (file, " = %d", value->ival);
           break;
         case FLOAT:
           fprintf (file, " = %f", value->fval);
           break;
         case STRING:
           fprintf (file, " = \"%s\"", value->sval);
           break;
         }
     }
     

The most important difference between a mere validation of the input and an actual computation, is that we must be able to equip the parser with actions. For instance, once we have recognized that 1 + 2 is a INTEGER + INTEGER, we must be able to (i) compute the sum, which requires fetching the value of the first and third tokens, and (ii) "return" 3 as the result.

To associate an action with a rule, just put the action after the rule, between braces. To access the value of the various symbols of the right hand side of a rule, Bison provides pseudo-variables: $1 is the value of the first symbol, $2 is the second etc. Finally to "return" a value, or rather, to set the value of the left hand side symbol of the rule, use $$.

Therefore, a calculator typically includes:

     iexpr: iexpr '+' iexpr { $$ = $1 + $3 }
          | iexpr '-' iexpr { $$ = $1 - $3 }
          | iexpr '*' iexpr { $$ = $1 * $3 }
          | iexpr '/' iexpr { $$ = $1 / $3 }
          | INTEGER         { $$ = $1 }
          ;
     

Please, note that we used $3 to denote the value of the third symbol, even though the second, the operator, has no value.

Putting this all together leads to the following implementation of the example 7.15:

     %{ /* -*- C -*- */
     #include <stdio.h>
     #include <stdlib.h>
     #include <string.h>
     #include <error.h>
     #include "calc.h"
     
     #define YYERROR_VERBOSE 1
     #define yyerror(Msg) error (EXIT_FAILURE, 0, "%s", Msg)
     
     #define YYPRINT(File, Type, Value) yyprint (File, Type, &Value)
     static void yyprint (FILE *file, int type, const yystype *value);
     
     static int yylex (void);
     %}
     %debug
     %defines
     %output="calc.c"
     %union
     {
       int ival;
     }
     %token <ival> NUMBER
     %type  <ival> expr input
     
     %left '+' '-'
     %left '*' '/'
     %%
     input: expr         { printf ("%d\n", $1) }
          ;
     
     expr: expr '*' expr { $$ = $1 * $3 }
         | expr '/' expr { $$ = $1 / $3 }
         | expr '+' expr { $$ = $1 + $3 }
         | expr '-' expr { $$ = $1 - $3 }
         | '(' expr ')'  { $$ = $2 }
         | NUMBER        { $$ = $1 }
         ;
     %%
     /* The current argument. */
     static const char **args = NULL;
     
     static void
     yyprint (FILE *file, int type, const yystype *value)
     {
       switch (type)
         {
         case NUMBER:
           fprintf (file, " = %d", value->ival);
           break;
         }
     }
     
     static int
     yylex (void)
     {
       /* No token stands for end of file. */
       if (!*++args)
         return 0;
       /* Parens and operators stand for themselves. */
       if (strlen (*args) == 1 && strchr ("()+-*/", **args))
         return **args;
       /* Integers have a value. */
       if (strspn (*args, "0123456789") == strlen (*args))
         {
           yylval.ival = strtol (*args, NULL, 10);
           return NUMBER;
         }
       /* An error. */
       error (EXIT_FAILURE, 0, "invalid argument: %s", *args);
       /* Pacify GCC which knows ERROR may return. */
       return -1;
     }
     
     int
     main (int argc, const char *argv[])
     {
       yydebug = getenv ("YYDEBUG") ? 1 : 0;
       args = argv;
       return yyparse ();
     }
     
     Example 7.20: calc.y -- A Simple Integer Calculator
     

Its execution is satisfying:

     $ bison calc.y
     $ gcc calc.c -o calc
     $ ./calc 1 + 2 \* 3
     7
     $ ./calc 1 + 2 \* 3 \*
     error-->./calc: parse error, unexpected $, expecting NUMBER or '('
     $ YYDEBUG=1 ./calc 51
     Starting parse
     Entering state 0
     Reading a token: Next token is 257 (NUMBER = 51)
     Shifting token 257 (NUMBER), Entering state 1
     Reducing via rule 7 (line 56), NUMBER  -> expr
     state stack now 0
     Entering state 3
     Reading a token: Now at end of input.
     Reducing via rule 1 (line 48), expr  -> input
     51
     state stack now 0
     Entering state 14
     Now at end of input.
     Shifting token 0 ($), Entering state 15
     Now at end of input.
     

well, almost

     $ ./calc 1 / 0
     floating point exception  ./calc 1 / 0
     

Notes de bas de page

  1. Because of the compatibility with POSIX Yacc, the type YYSTYPE is also defined. For sake of inter Yacc portability, use the latter only. But for sake of your mental balance, use Bison only.