Noeud:Using Gperf, Noeud Next :, Noeud Previous :Simple Uses of Gperf, Noeud Up :Scanning with Gperf

Using Gperf

This section presents the most significant options of Gperf, the reader is invited to read the documentation of Gperf, Perfect Hash Function Generator, for all the details.

Instead of just returning a unique representative of each keyword, Gperf can be used to retrieve data associated to it efficiently.

Specify that the gperf-prologue include the definition of a structure, and that each line of the keywords section is an instance of this structure. The values of the members are separated by a comma. The function in_word_set will then return a pointer to the corresponding struct. This struct should use its first member to store the keyword, and name it name.
-F initializers
When failing to produce a minimal table of keywords, and therefore falling back to a near-minimal table (voir What Gperf is), gperf introduces empty entries of this struct, leaving the non name part unspecified. This can trigger spurious compiler warnings, in particular with gcc's option -W. Quoting the GCC documentation:
Print extra warning messages for these events: ... An aggregate has an initializer which does not initialize all members. For example, the following code would cause such a warning, because x.h would be implicitly initialized to zero:
                    struct s { int f, g, h; };
                    struct s x = { 3, 4 };

Use this option to specify the initializers for the empty structure members following the keyword. The initializers should start with a comma.

Don't output the definition of the keywords' struct, it has been given only to describe it to Gperf. Use this option if your structure is defined elsewhere for the compiler.

By default gperf produces portable code, too portable actually: modern and useful features are avoided. The following options bring Gperf forward into the 21st century1.

Output ANSI C instead of some old forgotten dialects of the previous century. You may also output C++, but decency prevents us from mentioning the other options.
Don't be afraid to use const for internal tables. This is not only better style, it also helps some compilers to perform better optimization.
Output more C code than Cpp directives. In other words, use local enums for Gperf internal constants instead of global #define.
-S total-switch-statements
Instead of using internal tables, let the compiler perform the best job it can for your architecture by letting it face a gigantic switch. To assist compilers which are bad at long switches, you may specify the depth of nested switch via total-switch-statements. Using 1 is fine with GCC.

You may want to include several Gperf outputs within a single application or even a single compilation unit. Therefore you need to avoid multiple uses of the same symbols. We already described --enum, and its usefulness to avoid global #defines.

-H name
Specify the name of the hash function.
-H name
Specify the name of the in_word_set function.

Finally, many options allow to tune the hash function, see Options for changing the Algorithms employed by gperf, for the exhaustive list. The most important options are:

Use strncmp rather than strcmp. Using strcmp is the default because it is faster: it performs arithmetics operations on two items, the two strings, while strncmp additionally needs to maintain the length.
Handle collisions. When several keywords share the same hash value, they are said to collide. By default gperf fails if it didn't find any collision free table. With this option, the word is compared to all the keywords that share its hash value, which results in degraded performances. Helping Gperf to avoid the collisions is a better solution if speed is your concern.
-k positions
By default Gperf peeks only at the first and last characters of the keywords; override this default with the comma separated list of positions. Each position may be a number, an interval, $ to designate the last character of each keyword, or * to consider all the characters.

This option helps solving collisions.

Notes de bas de page

  1. Yet, in an effort to modernize, 8 bit characters are handled by default...