Noeud:Simple Uses of Gperf, Noeud « Next »:, Noeud « Previous »:What Gperf is, Noeud « Up »:Scanning with Gperf



Simple Uses of Gperf

Gperf is a source generator, just as Flex, Bison, and others. It takes the list of your keywords as input, and produces a fast function recognizing them. As for Flex and Bison, the input syntax allows for a prologue, containing directives for gperf and possibly some user declarations and initializations, and an epilogue, typically additional functions:

     %{
       user-prologue
     %}
     gperf-directives
     %%
     keywords
     %%
     user-epilogue
     
     Example 6.2: Structure of a Gperf Input File
     

All the keywords are listed on separate lines. They do not need to be enclosed in double quotes, but if you intend to include special characters or commas, you may use the usual C string syntax. When run, gperf produces a C program on the standard output, including, in addition to your user-prologue and user-epilogue, two functions:

static unsigned int hash (char *string, unsigned int length) Fonction
Return an integer, named the key, characteristic of the length characters long C string.

const char * in_word_set (const char *string, unsigned int length) Fonction
If the C string, length character long, is one of the keywords, return a pointer to this keyword (i.e., not string, but the same content as string), otherwise return NULL.

For instance, the following simple Gperf input file is meant to recognize rude words, and to express its surprise on unknown words:

     %{  /* -*- C -*- */
     #include <stdio.h>
     #include <stdlib.h>
     %}
     %%
     sh*t
     f*k
     win*ows
     Huh? What the f*?
     %%
     int
     main (int argc, const char** argv)
     {
       for (--argc, ++argv; argc; --argc, ++argv)
         if (in_word_set (*argv))
           printf ("I don't like you saying `%s'.\n", *argv);
         else
           printf ("Huh? What the f* `%s'?\n", *argv);
       return 0;
     }
     
     Example 6.3: rude-1.gperf -- Recognizing Rude Words With Gperf
     

which we can try now:

     $ gperf rude-1.gperf >rude.c
     $ gcc -Wall -o rude rude.c
     $ ./rude 'Huh? What the f*?'
     Huh? What the f* `Huh? What the f*?'?
     

Huh? What the f* Huh? What the f* `Huh? What the f*?'?? It was supposed to recognize it!

You just fell into K&R, and it hurts. Our invocation of in_word_set above is wrong, we forgot to pass the length of the string, and since by default gperf produces K&R C, the compiler notices nothing output of Gperf here, what do you people think? It's roughly 100 lines, but I don't need them all. (FIXME: Pollux would like to see some actual output of Gperf here, what do you people think? It's roughly 100 lines, but I don't need them all..). As a consequence, never forget to pass --language=ANSI-C to gperf. Just to check the result on our broken source:

     $ gperf --language=ANSI-C rude.gperf >rude.c
     $ gcc -Wall -o rude rude.c
     error-->rude.c: In function `main':
     error-->rude.c:91: too few arguments to function `in_word_set'
     

If we fix our invocation, in_word_set (*argv, strlen (*argv)), then:

     $ gperf --language=ANSI-C rude-2.gperf >rude.c
     $ gcc -Wall -o rude rude.c
     $ ./rude 'Huh? What the f*?'
     I don't like you saying `Huh? What the f*?'.
     

To exercise it further:

     $ ./rude 'sh*t' 'dear' '%%'
     I don't like you saying `sh*t'.
     Huh? What the f* `dear'?
     I don't like you saying `%%'.
     

Huh? What the f* %%? You just fell into a bug in Gperf 2.7.2 which is a bit weak at parsing its input when the prologue includes solely user declarations, but no actual Gperf directive. You are unlikely to be bitten, but be aware of that problem.

But before going onto a more evolved example, let's browse some other features of Gperf.