
TUGboat, Volume 0 (9999), No. 0 draft: July 25, 2023 10:00 903
area (the bottom half of the window). In addition to
the characters themselves, the paragraph’s bounding
box is drawn. The small arrows pointing upward
between characters represent the hyphenation points
at which the algorithm has decided not to break lines.
Finally, the unlled triangle to the right of line 2
indicates an intentionally overstretched line. This
means that the algorithm has decided on a scaling (of
glue) ratio which exceeds 1. Indeed, the Knuth-Plass
algorithm ran twice here, using a tolerance threshold
of 200 the second time. One can also observe that
the third line had to be hyphenated, which conrms
this is not the result of pass 1 of the algorithm.
Finally, one can see a small popup window near
the bottom-right corner of the typeset paragraph.
This is actually a “properties tooltip” which pops up
when the mouse is moved over a line, and provides
feedback on the line in question. In this particular
case, it indicates that the line is 280pt wide (the para-
graph’s width, as the line is properly justied), and is
stretched by a scaling factor of approximately 0.475.
Also, because the selected algorithm is the Knuth-
Plass one, the tooltip reports the line’s tness class,
badness, and local demerits. If we were to move the
mouse over the paragraph’s left margin, the tooltips
would advertise a number of global paragraph prop-
erties, such as the total demerits, the algorithm’s
pass number, and the number of remaining active
nodes at the end of execution.
Since we are talking about the Knuth-Plass
algorithm, note that this project does not aim at
providing an exact replica of it, nor of any other
currently available line-breaking algorithms (notably
Barnett [
2
] and Duncan [
6
]), nor of any future ones.
In fact, it is our opinion that what is called the
“Knuth-Plass algorithm” is actually not an algorithm
per se, but rather the combination of a typical short-
est path nding algorithm with a particular cost
function having the suitable properties for dynamic
programming optimization, all of this written in a
relatively low-level imperative language with perfor-
mance concerns of that time (the 1980s) in mind.
On the other hand, what we are interested in is
providing an exact replica of the algorithm’s logic.
Common Lisp is a much higher-level programming
language, and most performance concerns of the
time have long been obsoleted by the continuously
increasing computing power at hand (besides, perfor-
mance is rarely a top priority for an experimentation
platform). Consequently, our design and choice of
precise data structures diverge from the original. For
example, we actually provide two dierent implemen-
tations of the Knuth-Plass algorithm: one, close to
the original, equipped with the same dynamic pro-
gramming optimization, and another one based on
the exploration of a complete graph of solutions (not
the brute force and exhaustive 2
n
one, though!).
Another example where we dier from the origi-
nal is, again, motivated by demonstration and experi-
mentation. In the original Knuth-Plass, pass 1 of the
algorithm works on a non-hyphenated text (hyphen-
ation was considered too costly at the time). Only
if that fails does T
E
X hyphenate the text and try a
second pass (also with a dierent tolerance thresh-
old). In our case, we want to be able to display the
hyphenation clues every time, if so requested. Con-
sequently, the hyphenation process is implemented
as a global option (independent of the selected type-
setting algorithm), and pass 1 of the Knuth-Plass
algorithm may consequently run on an already hy-
phenated text, in which case it simply disregards the
hyphenation points as potential break points.
3 Software engineering
In the context where T
E
X is still one of the best
typesetting systems out there, but also one of the
oldest, we deem it important to say a word about
software engineering. It is a well-known fact that
the science of programming languages and paradigms
has evolved considerably over the years. Some people
have written about the virtues of a purely functional
approach to paragraph breaking in the past [
4
,
13
].
We, on the other hand, favor a more pragmatic than
theoretical approach. In particular, instead of having
a single paradigm (e.g., functional programming)
imposed on us, we prefer the freedom and exibility
provided by a multi-paradigm language [15].
Virtually any programming paradigm aims at in-
creasing both the code’s clarity and concision at the
same time. Table 1 provides a rough estimate of the
project’s size in LoC (Lines of Code), and clearly illus-
trates the benets of being multi-paradigm for con-
cision. Liang’s hyphenation algorithm [11] amounts
to 150 LoC. The 500 lines of “lineup” correspond
to the pre-processing of the source text, including
hyphenation, kerning, ligaturing, and glueing. The
currently available paragraph formatting algorithms
comprise between 150 and 450 LoC (each variant
Table 1: Rough estimate of ETAP’s size
LoC
GUI 800
Hyphenation 150
Lineup 500
Algorithms 150–450
Knuth-Plass 350 per variant
Interactive and real-time typesetting for demonstration and experimentation: ETAP