
DocEng ’24, August 20–23, 2024, San Jose, CA, USA Didier Verna
follows. e rst batch was typeset with T
E
X’s default seings, thus,
not handling similarities at all. e second batch was typeset by
maximizing the cost of similarity (similar demerits set to
10 000
).
Finally, the third batch was typeset by not only penalizing similari-
ties, but also disregarding adjacency problems (adjacent demerits
set to 0; see Section 3).
When we set the similar demerits to the maximum value, 48 (rst
experiment) to 50% (second one) of the problematic paragraphs are
“corrected”, in the sense that no similarities remain. In the cases
where a paragraph contains multiple similarities (up to four in
the Moby Dick experiment), the algorithm can sometimes only
reduce their number. In such a situation, the number of “improved”
paragraphs increases to 50 and 63% respectively. If we not only
penalize similarity, but also disregard adjacency problems, 53 to
66% of the problematic paragraphs are corrected, and a total of 57 to
73% are globally improved. ese gures clearly demonstrate that
similarity can be treated in an automated fashion up to a notable
proportion.
Note that completely discarding adjacent demerits is nonsensical
from an aesthetic point of view. More generally, just because the
algorithm nds a similarity-free layout does not mean that it will
necessarily look beer to a professional typographer. Further exper-
imentation is planned to address that (see Section 7). e purpose
of these two experiments is rather to evaluate the leeway we have
in similarity handling by studying extreme conditions, and again,
the gures clearly demonstrate that the problem can be addressed
in the vast majority of the cases.
6 CONCLUSION
We have proposed an extension to the
KP
algorithm that is able to
address similarity problems in paragraph justication. is exten-
sion is implemented in our open source platform
2
for typeseing
experimentation and demonstration [
18
,
19
] and could be incorpo-
rated into any T
E
X based or inspired system (alternative
*TeX
en-
gines, Boxes and Glue
3
, Typeset
4
, InDesign [
9
], etc.). is extension
is both simple and lightweight, so it is expected to have a negligible
impact on performance, should it be used in production. In fact, a
recent conversation with two people involved in LuaT
E
X conrms
that paragraph breaking is not a performance boleneck today. It is
also worth noting that this extension is backward-compatible with
T
E
X, in the sense that seing the similar demerits to 0 eectively
deactivates it.
7 PERSPECTIVES
Experimentation has demonstrated that treating similarity auto-
matically is both a worthy and achievable goal; Figure 1 illustrates
that. On the other hand, geing rid of similarities implies a nec-
essary trade o with other aesthetic criteria [
7
,
14
], for a result
the quality of which is ultimately in the eye of the typographer. In
the near future, we intend to work hand in hand with professional
typographers in order to nd a suitable default value for our similar
demerits, and also to gure out the acceptable trade os with the
other adjustable penalties and demerits in T
E
X.
2
https://github.com/didierverna/etap
3
https://boxesandglue.dev/
4
https://github.com/bramstein/typeset/
Another area of further experimentation is to not limit ourselves
to a constant (albeit adjustable) amount for similar demerits. We
could for example weight homeoarchies and homeoteleutons dier-
ently, penalize similarities proportionally to their length, or even
increase the cost of consecutive similarities in a non-linear fashion,
so as to penalize ladders more heavily. As a maer of fact, this idea
is also applicable to TeX’s original demerits (adjacent and double-
hyphen ones in particular), and we are already investigating in that
direction.
ACKNOWLEDGMENTS
e author would like to thank omas Savary, Hans Hagen, and
Mikael Sundqvist for some fruitful exchanges.
REFERENCES
[1]
Michael P. Barne. Computer Typeseing: Experiments and Prospects. M.I.T. Press,
Cambridge, Massachuses, USA, 1965.
[2]
Richard Bellman. e theory of dynamic programming. Bulletin of the American
Mathematical Society, 60(6):503–516, 1954.
[3]
Edsger W. Dijkstra. A note on two problems in connexion with graphs. Nu-
merische Mathematik, 1(1):269–271, December 1959. ISSN 0029-599X. doi:
10.1007/BF01386390. URL https://doi.org/10.1007/BF01386390.
[4]
Paul E.Justus. ere is more to typeseing than seing type. IEEE Transactions
on Professional Communication, PC-15(1):13–16, 1972. doi: 10.1109/TPC.1972.
6591969.
[5]
Peter Hart, Nils Nilsson, and Bertram Raphae. A formal basis for the heuristic
determination of minimum cost paths. IEEE Transactions on Systems Science and
Cybernetics, 4(2):100–107, 1968.
[6]
Tamir Hassan and Andrew Hunter. Knuth-Plass revisited: Flexible line-breaking
for automatic document layout. In Proceedings of the 2015 ACM Symposium
on Document Engineering, DocEng’15, page 17–20, Lausanne, Switzerland, 2015.
Association for Computing Machinery. ISBN 9781450333078. doi: 10.1145/
2682571.2797091.
[7]
Alex Holkner. Global Multiple Objective Line Breaking. PhD thesis, School of
Computer Science and Information Technology, RMIT University, Melbourne,
Australia, October 2006.
[8]
Nathan Hurst, Wilmot Li, and Kim Marrio. Review of automatic document
formaing. In Proceedings of the 2009 ACM Symposium on Document Engineering,
DocEng’09, page 99–108, Munich, Germany, 2009. Association for Computing
Machinery. ISBN 9781605585758. doi: 10.1145/1600193.1600217.
[9]
Eric A. Kenninga. Optimal line break determination. US Patent 6,510,441, January
2003.
[10]
Donald E. Knuth. e T
E
Xbook, volume A of Computers and Typeseing. Addison-
Wesley, MA, USA, 1986. ISBN 0201134470.
[11]
Donald E. Knuth. T
E
X: the Program, volume B of Computers and Typeseing.
Addison-Wesley, MA, USA, 1986. ISBN 0201134373.
[12]
Donald E. Knuth and Michael F. Plass. Breaking paragraphs into lines. Soware:
Practice and Experience, 11(11):1119–1184, 1981. doi: 10.1002/spe.4380111102.
[13]
Frank Mielbach. E-T
E
X: Guidelines for future T
E
X extensions – revisited. TUG-
boat, 34(1), 2013.
[14]
Peter Moulder and Kim Marrio. Learning how to trade o aesthetic criteria in
layout. In Proceedings of the 2012 ACM Symposium on Document Engineering, Do-
cEng’12, page 33–36, Paris, France, 2012. Association for Computing Machinery.
ISBN 9781450311168. doi: 10.1145/2361354.2361361.
[15]
Stephen R. Reimer. Manuscript studies: Medieval and erly modern. https://sites.
ualberta.ca/~sreimer/ms-course/course/scbl-err.htm, 1998.
[16]
R. P. Rich and A. G. Stone. Method for hyphenating at the end of a printed
line. Communications of the ACM, 8(7):444–445, July 1965. ISSN 00010782. doi:
10.1145/364995.365002.
[17]
H. T. ành. Micro-typographic extensions to the T
E
X typeseing system. TUG-
boat, 21(4), 2000.
[18]
Didier Verna. ETAP: Experimental typeseing algorithms platform. In 15th
European Lisp Symposium, pages 48–52, Porto, Portugal, March 2022. ISBN
9782955747469. doi: 10.5281/zenodo.6334248.
[19]
Didier Verna. Interactive and real-time typeseing for demonstration and ex-
perimentation: ETAP. In Barbara Beeton and Karl Berry, editors, TUGboat,
volume 44, pages 242–248. T
E
X Users Group, T
E
X Users Group, September 2023.
doi: 10.47397/tb/44-2/tb137verna-realtime.
[20]
Martin Litcheld West. Textual Criticism and Editorial Technique. B. G.Teubner,
Stugart, 1973.