Noeud:The yleval Module, Noeud « Next »:, Noeud « Previous »:Advanced Use of Bison, Noeud « Up »:Parsing



The yleval Module

Our job is almost done, there only remains to create the leader: the M4 module which receives the expression from M4, launches the parser on it, and return the result or an error to M4. The most relevant bits of this file, yleval.c, are included below. The reader is also referred to yleval.h and ylscan.l, see Advanced Use of Flex.

     /*-----------------------------------------------------------------.
     | Record an error occurring at location LOC and described by MSG.  |
     `-----------------------------------------------------------------*/
     
     void
     yleval_error (const yyltype *loc, yleval_control_t *control,
                   const char *msg, ...)
     {
       va_list ap;
       /* Separate different error messages with a new line. */
       fflush (control->err);
       if (control->err_size)
         putc ('\n', control->err);
       LOCATION_PRINT (control->err, *loc);
       fprintf (control->err, ": ");
       va_start (ap, msg);
       vfprintf (control->err, msg, ap);
       va_end (ap);
     }
     

This first function, yleval_error, is used to report the errors. Since there can be several errors in a single expression, it uses a facility provided by the GNU C Library: pseudo streams which are in fact hooked onto plain char * buffers, grown on demand. We will see below how to initialize this stream, err, part of our control structure defined as:

     typedef struct yleval_control_s
     {
       /* To store the result. */
       number result;
     
       /* A string stream.  */
       FILE *err;
       char *err_str;
       size_t err_size;
     } yleval_control_t;
     

The following function computes the division, and performs some semantic checking. ylmod and ylpow are similar.

     /*------------------------------------------------------------------.
     | Compute NUMERATOR / DENOMINATOR, and store in RESULT.  On errors, |
     | produce an error message, and return FALSE.                       |
     `------------------------------------------------------------------*/
     
     boolean
     yldiv (yleval_control_t *control, const yyltype *loc,
            number *result, number numerator, number denominator)
     {
       if (!denominator)
         {
           yleval_error (loc, control, "Division by zero");
           *result = 0;
           return FALSE;
         }
     
       *result = numerator / denominator;
       return TRUE;
     }
     

Now, the conductor of the whole band, the builtin yleval is defined as follows. Its first part is dedicated to initialization, and to decoding of the optional arguments passed to the builtin:

     /*-----------------------------------.
     | yleval(EXPRESSION, [RADIX], [MIN]) |
     `-----------------------------------*/
     
     M4BUILTIN_HANDLER (yleval)
     {
       int radix = 10;
       int min = 1;
     
       /* Initialize the parser control structure, in particular
          open the string stream ERR. */
       yleval_control_t yleval_control;
       yleval_control.err =
         open_memstream (&yleval_control.err_str,
                         &yleval_control.err_size);
     
       /* Decode RADIX, reject invalid values. */
       if (argc >= 3 && !m4_numeric_arg (argc, argv, 2, &radix))
         return;
     
       if (radix <= 1 || radix > 36)
         {
           M4ERROR ((warning_status, 0,
                     _("Warning: %s: radix out of range: %d"),
                     M4ARG(0), radix));
           return;
         }
     
       /* Decode MIN, reject invalid values. */
       if (argc >= 4 && !m4_numeric_arg (argc, argv, 3, &min))
         return;
     
       if (min <= 0)
         {
           M4ERROR ((warning_status, 0,
                     _("Warning: %s: negative width: %d"),
                     M4ARG(0), min));
           return;
         }
     

Then it parses the expression, and outputs it result.

       /* Feed the scanner with the EXPRESSION. */
       yleval__scan_string (M4ARG (1));
       /* Launch the parser. */
       yleval_parse (&yleval_control);
     
       /* End the ERR stream.  If it is empty, the parsing is
          successful and we return the value, otherwise, we report
          the error messages. */
       fclose (yleval_control.err);
       if (!yleval_control.err_size)
         {
           numb_obstack (obs, yleval_control.result, radix, min);
         }
       else
         {
           M4ERROR ((warning_status, 0,
                     _("Warning: %s: %s: %s"),
                     M4ARG (0), yleval_control.err_str, M4ARG (1)));
           free (yleval_control.err_str);
         }
     }
     
     Example 7.21: yleval.y -- Builtin yleval (continued)
     


It is high time to play with our module! Here are a few sample runs:

     $ echo "yleval(1)" | m4 -M . -m yleval
     1
     

Good news, we seem to agree on the definition of 1,

     $ echo "yleval(1 + 2 * 3)" | m4 -M . -m yleval
     7
     

and on the precedence of multiplication over addition. Let's exercise big numbers, and the radix:

     $ echo "yleval(2 ** 2 ** 2 ** 2 - 1)" | m4 -M . -m yleval
     65535
     $ echo "yleval(2 ** 2 ** 2 ** 2 - 1, 2)" | m4 -M . -m yleval
     1111111111111111
     

How about tickling the parser with a few parse errors:

     $ echo "yleval(2 *** 2)" | m4 -M . -m yleval
     error-->m4: stdin: 1: Warning: yleval: 1.5: parse error, unexpected "*": 2 *** 2
     
     

Wow! Now, that's what I call an error message: the fifth character, in other words the third *, should not be there. Nevertheless, at this point you may wonder how the stars were grouped 1. Because you never know what the future is made of, I strongly suggest that you always equip your scanners and parsers with runtime switches to enable/disable tracing. This is the point of the following additional builtin, yldebugmode:

     /*-----------------------------.
     | yldebugmode([REQUEST1], ...) |
     `-----------------------------*/
     
     M4BUILTIN_HANDLER (yldebugmode)
     {
       /* Without arguments, return the current debug mode. */
       if (argc == 1)
         {
           m4_shipout_string (obs,
                              yleval__flex_debug ? "+scanner" : "-scanner", 0, TRUE);
           obstack_1grow (obs, ',');
           m4_shipout_string (obs,
                              yleval_debug ? "+parser" : "-parser", 0, TRUE);
         }
       else
         {
           int arg;
           for (arg = 1; arg < argc; ++arg)
             if (!strcmp (M4ARG (arg), "+scanner"))
               yleval__flex_debug = 1;
             else if (!strcmp (M4ARG (arg), "-scanner"))
               yleval__flex_debug = 0;
             else if (!strcmp (M4ARG (arg), "+parser"))
               yleval_debug = 1;
             else if (!strcmp (M4ARG (arg), "-parser"))
               yleval_debug = 0;
             else
               M4ERROR ((warning_status, 0,
                         _("%s: invalid debug flags: %s"),
                        M4ARG (0), M4ARG (arg)));
         }
     }
     

Applied to our previous example, it clearly demonstrates how 2 *** 2 is parsed2:

     $ echo "yldebugmode(+parser)yleval(2 *** 2)" | m4 -M . -m yleval
     error-->Starting parse
     error-->Entering state 0
     error-->Reducing via rule 1 (line 100),  -> @1
     error-->state stack now 0
     error-->Entering state 2
     error-->Reading a token: Next token is 257 ("number" (1.1) = 2)
     error-->Shifting token 257 ("number"), Entering state 4
     error-->Reducing via rule 3 (line 102), "number"  -> exp
     error-->state stack now 0 2
     error-->Entering state 10
     error-->Reading a token: Next token is 279 ("**" (1.3-4))
     error-->Shifting token 279 ("**"), Entering state 34
     error-->Reading a token: Next token is 275 ("*" (1.5))
     error-->Error: state stack now 0 2 10
     error-->Error: state stack now 0 2
     error-->Error: state stack now 0
     error-->m4: stdin: 1: Warning: yleval: 1.5: parse error, unexpected "*": 2 *** 2
     
     

which does confirm *** is split into ** and then *.


Notes de bas de page

  1. Actually you should already know that Flex chose the longest match first, therefore it returned ** and then *. Voir Looking for Tokens.

  2. The same attentive reader who was shocked by the concept of mid-rule actions, see Advanced Use of Bison, will notice the reduction of the invisible @1 symbol below.