
TUGboat, Volume 0 (9999), No. 0 draft: August 24, 2024 13:52 ?5
At the time of this writing, the results of the
validation process are as follows. 79016 fonts are in-
spected. 2983 fonts are skipped because
tfm
doesn’t
support OFM or JFM yet. 770 fonts are found to be
non-compliant, which may seem quite a lot. On the
other hand, there are only 4 kinds of problems: 3 of
which are considered warnings, and only a single one
a truly unrecoverable error.
4.2 File overow
By far, the most common issue that
tfm-validate
nds is le overows, aecting 628 fonts. The TFM
standard mandates that the rst two bytes of a TFM
le encode the le’s length. A “le overow” warning
is signalled if the actual le’s length is greater than
expected. Note that
tfm
knows about the special
values 0, 9, and 11, denoting extended TFM les
(OFM or JFM), which are not supported yet.
Of course, when the declared le size disagrees
with the actual, there is no way to tell for sure
which (if any) is correct. However, absent any other
problem during parsing, the le containing a tail of
junk is much more likely than the rst two bytes
(only) being corrupted, hence a warning.
A quick test on a couple of such les seems
to conrm that hypothesis. We compiled a sample
document with them, and it appears that not only
T
E
X has no problem loading the fonts, the outputs
look normal as well. On top of that, let us mention
that
tftopl
adopts the same posture: it signals
the problem but otherwise just discards the junk
(Section 20 of tftopl).
Further investigation on the tails was inconclu-
sive. In particular we couldn’t gure out whether
some tails contain meaningful information rather
than just junk (a possible cause for le overows
could be padding to storage blocks). As a conse-
quence, the signalled warnings do not include the
tails’ content.
4.3 String overow
The situation is slightly dierent with the next kind
of problem we encountered, namely, padded string
overows, currently aecting 74 fonts.
A TFM le may contain two optional strings in
its header. The rst one, 40 bytes long, identies
the character coding scheme. The second one, 20
bytes long, is the font identier (font family name).
These strings are supposed to be in BCPL format.
In particular, the rst byte must contain the actual
length of the string.
tfm
signals a “padded string overow” warning
when a BCPL string is not padded with zeros. Doug
McKenna suggested
4
that padding a BCPL string
with zeros may not have always been a requirement,
as it was only added to
pltotf
in April 1983, for
version 1.3, that is, two years after its initial release
(Section 87 of
pltotf
). On the other hand, David
Fuchs mentioned padding with zeros as early as in
February 1981 [3].
Anyway, the decision as to whether a padded
string overow should be a warning or an error is even
simpler to make than in the case of a le overow.
Those strings are purely informative, they have no
impact on the font’s usability, so it does not hurt to
continue loading the font.
Besides, the padding area seems to have been
intentionally abused in the majority of the cases: a
lot of fonts contain “
Y&Y Inc
” in there, making their
origin quite clear. Because of that (and contrary to
le overows), the content of the padding area is
included in the warnings.
4.4 Spurious char info
The next problem we encountered (also a warning,
aecting 66 les) is a more obscure matter. TFM les
have a so-called “char info table” providing the actual
character metrics of the font. The table contains 4-
byte entries for the full range of characters from the
minimum character code (
bc
) to the maximum one
(
ec
). However, a font may also have “holes” in this
range, that is, undened characters for some codes
between bc and ec.
Undened characters must have a width of 0,
materialized by a width table index of 0 as well. The
spurious char info warning indicates that an entry for
a non-existent character is not completely zeroed out.
In the problematic char info entries that we found,
the third byte usually has a value of 1 (indicating
an index into a ligature or kerning program), and
sometimes a non-zero fourth byte (the actual index).
A possible explanation would have been the
existence of a so-called “boundary character” (also
an obscure matter in TFM) which is not required
to exist for real in the font, but upon inspection of
several problematic ones, this appears not to be the
case.
tftopl
completely ignores characters with a
width index of 0 (Section 78 of
tftopl
), and
pltotf
zeroes out non-existent characters (Section 74 of
pltotf
). All the more reasons to not consider this
problem a showstopper.
4.5 Fix word overow
Finally, this one is the only true error we encountered,
and it only aects two fonts:
ArevSans-Bold
, and
4
reference lost; could have been in a thread on texhax…
A large-scale format compliance checker for T
E
X Font Metrics