Noeud:Using Actions, Noeud « Next »:Advanced Use of Bison, Noeud « Previous »:Simple Uses of Bison, Noeud « Up »:Parsing
Of course real applications need to process the input, not merely to validate it as in the previous section. For instance, when processing arithmetics, one wants to compute the result, not just say "this is a valid/invalid arithmetical expression".
We stem on several problems. First of all, up to now we managed to use
single characters as tokens: the code of the character is used as token
number. But all the integers must be mapped to a single token type, say
INTEGER
.
Bison provide the %token
directive to declare token types:
%token INTEGER FLOAT STRING
Secondly, tokens such as numbers have a value. To this end, Bison
provides a global variable, yylval
, which the scanner must fill
with the appropriate value. But imagine our application also had to
deal with strings, or floats etc.: we need to be able to specify several
value types, and associate a value type to a token type.
To declare the different value types to Bison, use the %union
directive. For instance:
%union { int ival; float fval; char *sval; }
This results in the definition of the type "token value":
yystype
1. The scanner needs the token types and token value types: if given
--defines
bison
creates foo.h
which contains
their definition. Alternatively, you may use the %defines
directive.
Then map token types to value types:
%token <ival> INTEGER %token <fval> FLOAT %token <sval> STRING
But if tokens have a value, then so have some nonterminals! For
instance, if iexpr
denotes an integer expression, then we must
also specify that (i) it has a value, (ii) of type INTEGER
. To
this end, use %type
:
%type <ival> iexpr %type <fval> fexpr %type <sval> sexpr
We already emphasized the importance of traces: together with the report
produced thanks to the option --verbose
, they are your best
partners to debug a parser. Bison lets you improve the traces of
tokens, for instance to output token values in addition to token types.
To use this feature, just define the following macro:
YYPRINT (File, Type, Value) | Macro |
Output on File a description of the Value of a token of Type. |
For various technical reasons, I suggest the following pattern of use:
%{ ... #define YYPRINT(File, Type, Value) yyprint (File, Type, &Value) static void yyprint (FILE *file, int type, const yystype *value); ... %} %% ... %% static void yyprint (FILE *file, int type, const yystype *value) { switch (type) { case INTEGER: fprintf (file, " = %d", value->ival); break; case FLOAT: fprintf (file, " = %f", value->fval); break; case STRING: fprintf (file, " = \"%s\"", value->sval); break; } }
The most important difference between a mere validation of the input and
an actual computation, is that we must be able to equip the parser with
actions. For instance, once we have recognized that 1 + 2
is a
INTEGER + INTEGER
, we must be able to (i) compute the sum, which
requires fetching the value of the first and third tokens, and (ii)
"return" 3 as the result.
To associate an action with a rule, just put the action after the rule,
between braces. To access the value of the various symbols of the right
hand side of a rule, Bison provides pseudo-variables: $1
is the
value of the first symbol, $2
is the second etc. Finally to
"return" a value, or rather, to set the value of the left hand side
symbol of the rule, use $$
.
Therefore, a calculator typically includes:
iexpr: iexpr '+' iexpr { $$ = $1 + $3 } | iexpr '-' iexpr { $$ = $1 - $3 } | iexpr '*' iexpr { $$ = $1 * $3 } | iexpr '/' iexpr { $$ = $1 / $3 } | INTEGER { $$ = $1 } ;
Please, note that we used $3
to denote the value of the third
symbol, even though the second, the operator, has no value.
Putting this all together leads to the following implementation of the example 7.15:
%{ /* -*- C -*- */ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <error.h> #include "calc.h" #define YYERROR_VERBOSE 1 #define yyerror(Msg) error (EXIT_FAILURE, 0, "%s", Msg) #define YYPRINT(File, Type, Value) yyprint (File, Type, &Value) static void yyprint (FILE *file, int type, const yystype *value); static int yylex (void); %} %debug %defines %output="calc.c" %union { int ival; } %token <ival> NUMBER %type <ival> expr input %left '+' '-' %left '*' '/' %% input: expr { printf ("%d\n", $1) } ; expr: expr '*' expr { $$ = $1 * $3 } | expr '/' expr { $$ = $1 / $3 } | expr '+' expr { $$ = $1 + $3 } | expr '-' expr { $$ = $1 - $3 } | '(' expr ')' { $$ = $2 } | NUMBER { $$ = $1 } ; %% /* The current argument. */ static const char **args = NULL; static void yyprint (FILE *file, int type, const yystype *value) { switch (type) { case NUMBER: fprintf (file, " = %d", value->ival); break; } } static int yylex (void) { /* No token stands for end of file. */ if (!*++args) return 0; /* Parens and operators stand for themselves. */ if (strlen (*args) == 1 && strchr ("()+-*/", **args)) return **args; /* Integers have a value. */ if (strspn (*args, "0123456789") == strlen (*args)) { yylval.ival = strtol (*args, NULL, 10); return NUMBER; } /* An error. */ error (EXIT_FAILURE, 0, "invalid argument: %s", *args); /* Pacify GCC which knows ERROR may return. */ return -1; } int main (int argc, const char *argv[]) { yydebug = getenv ("YYDEBUG") ? 1 : 0; args = argv; return yyparse (); }
Example 7.20: calc.y
-- A Simple Integer Calculator
Its execution is satisfying:
$ bison calc.y $ gcc calc.c -o calc $ ./calc 1 + 2 \* 3 7 $ ./calc 1 + 2 \* 3 \* error-->./calc: parse error, unexpected $, expecting NUMBER or '(' $ YYDEBUG=1 ./calc 51 Starting parse Entering state 0 Reading a token: Next token is 257 (NUMBER = 51) Shifting token 257 (NUMBER), Entering state 1 Reducing via rule 7 (line 56), NUMBER -> expr state stack now 0 Entering state 3 Reading a token: Now at end of input. Reducing via rule 1 (line 48), expr -> input 51 state stack now 0 Entering state 14 Now at end of input. Shifting token 0 ($), Entering state 15 Now at end of input.
well, almost
$ ./calc 1 / 0 floating point exception ./calc 1 / 0