TUGboat, Volume˜0˜(9999), No.˜0 1001
Star T
E
X: The Next Generation
Didier Verna
Abstract
While T
E
X is unanimously praised for its typesetting
capabilities, it is also regularly blamed for its poor
programmatic offerings. A macro-expansion system
is indeed far from the best choice in terms of general-
purpose programming. Several solutions have been
proposed to modernize T
E
X on the programming
side. All of them currently involve a heterogeneous
approach in which T
E
X is mixed with a full-blown pro-
gramming language. This paper advocates another,
homogeneous approach in which T
E
X is first rewrit-
ten in a modern language, Common Lisp, which
serves both at the core of the program and at the
scripting level. All programmatic macros of T
E
X are
hence rendered obsolete, as the underlying language
itself can be used for user-level programming.
Prologue
T
E
X
The [final] frontier.
These are the voyages,
Of a software enterprise.
Its continuing mission:
To explore new tokens,
To seek out a new life,
New forms of implementation. ..
1 Introduction
In 2010, I asked Donald Knuth why he chose to
design and implement T
E
X as a macro-expansion
system rather than as a full-blown, procedure-based,
programming language. His answer was twofold:
1.
He wanted something relatively simple for his
secretary who was not a computer scientist.
2.
The very limited computing resources at that
time practically mandated the use of something
much lighter than a complete programming lan-
guage.
The first part of the answer left me with a slight
feeling of skepticism. It remains to be seen that T
E
X
is simple to use. Although it probably is for simple
things, its programmatic capabilities are notoriously
tricky. The second part of the answer, on the other
hand, was both very convincing and arguably now
obsolete as well. Time has passed and the situation
today is very different from what it was 30 years
ago. The available computing power has grown expo-
nentially, and so have our overall skills in language
design and implementation.
Several ideas on how to modernize T
E
X already
exist. Some of them have actually been implemented.
In this paper, we present ours. The possible future
that we would like to see happening for T
E
X is some-
what different from the current direction(s) T
E
X’s
evolution has been following. In our view, modern-
izing T
E
X can start with grounding it in an old yet
very modern programming language: Common Lisp.
Section 2 clarifies what our notion of a “better”, or
“more modern” T
E
X is. Section 3 on the next page
presents several approaches sharing a similar goal.
Section 4 on the following page justifies the choice
of Common Lisp. Finally, section 5 on page˜1007
outlines the projected final shape of the project.
2 A better T
E
X
The T
E
X community has this particularity of being
a mix of two populations: people coming from the
world of typography, who love it for its beautiful
typesetting, and people coming from the world of
computer science, who appreciate it for its automa-
tion and similarity to a programming language. It is
true that the way T
E
X works is much closer to that
of a compiled language than a WYSIWYG editor.
In both worlds, T
E
X is unanimously acclaimed
for the quality of its typesetting. This shouldn’t
be surprising, as it has always been T
E
X’s primary
objective. The question of its programmatic capa-
bilities, however, is much more arguable. People
unfamiliar with programming in general easily ac-
knowledge that T
E
X’s programmatic interface is not
trivial to use. For people coming from the world of
computer science, it is even more obvious that T
E
X
is no match for a real programming language, partly
due to its macro-expansion nature.
Let us recall that T
E
X was not originally meant
to be a programming language. To quote its author,
Donald Knuth [8]:
I’m not going to design a programming lan-
guage; I want to have just a typesetting lan-
guage. [. . . ] In some sense I put in many of
T
E
X’s programming features only after kick-
ing and screaming.
From this perspective, it seems natural to con-
sider that a “better” T
E
X, today, would essentially
deliver the same typesetting quality from a more
modern programmatic interface. More precisely:
access to the typesetting subset of T
E
X’s primi-
tives should be provided with a consistent syntax
(in particular, no more need for \relax),
the programmatic API (
\def
,
\if
, etc.) should
be dropped in favor of support from a real pro-
gramming language,
Star T
E
X: The Next Generation
1002 TUGboat, Volume˜0˜(9999), No.˜0
the system should remain simple to use, at least
for simple things, just as T
E
X is today,
the system should remain highly extensible and
customizable, just as T
E
X is today,
and finally, preserving backward compatibility,
although not considered mandatory, should also
be considered.
3 Alternatives
The idea of grounding T
E
X into a real program-
ming language is not new. Let us mention some
such attempts that we are aware of.
eval4tex
1
and
sT
E
Xme
2
both use Scheme (another dialect of Lisp).
PerlT
E
X [
13
,
15
], as its name suggests, chooses Perl.
QaT
E
X/PyT
E
X [
4
] use Python, and finally, also as
its name suggests, LuaT
E
X
3
employs Lua.
These approaches, although motivated more or
less by the same general idea, work in very different
ways. Some of them wrap T
E
X in a programming
language by giving the language access to T
E
X’s in-
ternals. Others wrap a programming language in
T
E
X by allowing T
E
X to execute code (LuaT
E
X be-
longs to this category). Some do both. For instance,
sT
E
Xme ships with a extended Scheme engine that
can access T
E
X’s internals, as well as a modified T
E
X
engine that can evaluate Scheme code.
Some of them allow authors to write T
E
X macros
in a different language (PerlT
E
X lets you write T
E
X
macros in Perl). Others, like QaT
E
X, aim at com-
pletely getting rid of T
E
X macros so that all program-
matic functionality is written in another language
(Python, with PyT
E
X in that case). This particular
case is closer to what we have in mind.
Finally, some approaches use a synchronous
dual-process scheme in which both T
E
X and the
programming language of choice run in parallel, com-
municating either via standard input/output redi-
rection, or by file or socket I/O. That is the case of
sT
E
Xme and PerlT
E
X. Others, like
eval4tex
use a
multi-pass scheme instead. In a first pass, the “for-
eign” code is extracted and sent to the programming
language. The programming language in question
executes its code and sends it back to T
E
X. In the
final pass, T
E
X is left with only regular T
E
X macros
to process.
In spite of all these variations, it is worth stress-
ing that all these approaches have something in com-
mon: they are heterogeneous. They involve both a
programming language engine on one hand, and the
original, though possibly modified, T
E
X engine on
the other hand. Even LuaT
E
X which is somehow
1
http://www.ccs.neu.edu/home/dorai/eval4tex/
2
http://stexme.sourceforge.net/
3
http://www.luatex.org
more integrated than the other alternatives is built
like this: a Lua interpreter is embedded in a more
or less regular T
E
X, written in WEB and˜C.
The idea that we suggest in this paper is that
another, fully integrated approach is also possible.
In this approach, another programming language
would be used to completely rewrite T
E
X and pro-
vide the desired programmatic layer at the same
time. We are aware of at least one previous attempt
at a fully integrated approach. NTS,
4
the “New
Typesetting System” was supposed to be a complete
re-implementation of T
E
X in Java, but the project
was never widely adopted. We don’t think that this
project’s demise indicates in any way that the ap-
proach is doomed in general. On the contrary, the
remaining sections explain why we think that Com-
mon Lisp is a very good candidate for it.
4 Common Lisp: why?
Lisp is a very old language. It was invented by
John McCarthy in the late 1950s˜[
9
]. Contrary to
what many people seem to think however, being old
doesn’t imply being obsolete. In this case, it is a
synonym for being mature and modern. When Lisp
was invented, it was in fact way ahead of its time, to
the point that even its inventor hadn’t realized the
extent of his creation. A sign of this is the recent
incorporation of features that Lisp already provided
50 years ago into so-called “modern” programming
languages, such as C# with the addition of dynamic
types, or C
++
and Java with the addition of lambda
(anonymous) functions.
4.1 Standardization
Common Lisp, in particular, was standardized in
1994 [
1
] (it was in fact the first object-oriented lan-
guage to get an ANSI standard). Remember how
stability was important to Donald Knuth for T
E
X?
This means that even across different implementa-
tions (there are half a dozen or so), that the core of
the language will never change, and in fact hasn’t for
the last 20 years. Compare this to modern scripting
languages such as Python or Ruby, for which the
“standard” is essentially the current implementation
of the current version of the language written by the
author of the language.. .
The Common Lisp standard is fairly comprehen-
sive: it includes not only the language core but a large
library of functions. Of course, because the standard
is 20 years old, it lacks several things that are con-
sidered important today (such as a multi-threading
API). Every Common Lisp implementation provides
its own version of non-standard features, but again,
4
http://nts.tug.org
Didier Verna
TUGboat, Volume˜0˜(9999), No.˜0 1003
there are only half a dozen out there, and if one
chooses to stick to only one of them, then one gets
the same stability as for standardized features, or at
least backward-compatibility.
4.2 General purpose vs. scripting
Common Lisp also has this particularity of being
both a full-blown, general purpose, industrial scale
programming language and a scripting or extension
language at the same time. This is something that
cannot be said of most modern languages out there
but is nevertheless crucial in the fully integrated
approach that we are advocating. This was certainly
not the case of Java, in the now dead NTS project.
Common Lisp is indeed a full-blown, general pur-
pose, industrial scale programming language. It is
multi-paradigm (functional, object-oriented, impera-
tive, etc.), highly extensible (at both the syntactic
and semantic levels [
12
]), highly optimizable (no-
tably with static typing facilities [
19
,
20
]) and has a
plethora of libraries (such as Perl-compatible regu-
lar expressions, database access, web infrastructure,
foreign function interfaces, etc.). Today, millions of
lines of Common Lisp code are used in industrial
applications all over the world.
But Common Lisp is also a scripting language.
It is highly interactive (it comes with a REPL: a
Read Eval Print Loop), highly dynamic (with fea-
tures ranging from dynamic type checking to an em-
bedded JIT-compiler and debugger), highly reflexive
(something crucial for extensibility and customizabil-
ity) and at the same time easy to learn (notably out
of its minimalist syntax).
This last point constitutes the beginning of our
tour of the language. Remember that one of our
objectives is to provide a system just as simple as
T
E
X, at least for simple things. Below is a one minute
crash course on Lisp syntax.
Literals
Common Lisp provides literals such as
numbers (
1.254
) or strings (
"foobar"
). Literals
evaluate to themselves.
Symbols
Common Lisp provides symbols that can
be used to name functions or variables (possibly at
the same time).
pi
has the expected mathematical
value. identity is the identity function.
S-Expressions
Compound expressions are written
inside parentheses and denote function calls in prefix
notation. (+ 1 2) represents the sum of 1 and 2.
Quotation
In Common Lisp, everything has a
value. If you want to prevent evaluation, put a quote
in front of the expression. For instance,
’identity
is the symbol
identity
itself.
’(+ 1 2)
, instead of
being a function call, now represents the list of 3
elements: the symbol + and the numbers 1 and 2.
Definitions
To define a global variable, we can
write something like this:
(defvar devil-number 666)
To define a function:
(defun dbl (x) (* 2 x))
And that is the end of our Common Lisp crash
course. With that knowledge, you know practically
all there is to know about the language in order to
use it for simple things. The rest is a matter of know-
ing the names of standard (built-in) variables and
functions. In particular, this is certainly not more
complicated than learning the basic and inconsistent
T
E
X syntax (do you provide arguments in braces or
inline with a final
\relax
to be on the safe side?),
and it would also certainly be enough to write basic
L
A
T
E
X documents like this:
(document-class article)
...
(begin document)
(section "Title")
...
(end document)
4.3 Built-in paradigms
In a somewhat surprising way, Common Lisp pro-
vides programming paradigms, idioms or library-
based features that (L
A
)T
E
X also provides (or would
like to provide). This means that such features are
already here for us and would not need to be re-
implemented in a Lispy T
E
X. This section only
presents some of them; there are in fact many more.
4.3.1 key=value pairs
The success of key/value arguments in L
A
T
E
X is pro-
portional to their actual need: there are at least
a dozen packages providing this functionality, each
and every one of them with its own pros and cons.
Common Lisp provides a clean and straightforward
implementation of this for free in its function call pro-
tocol (the so-called “lambda lists”). The following
function takes one mandatory (positional) argument
and two optional keyword arguments, which are in
fact named, floating arguments:
(defun include-graphics
(file &key width height)
...)
It can be called with just the mandatory argument:
(include-graphics "image")
or with either or both keywords (prefixed with a
:
’):
(include-graphics "image" :height <value>)
Star T
E
X: The Next Generation
1004 TUGboat, Volume˜0˜(9999), No.˜0
Keyword arguments can have default values which
themselves may be dynamically computed.
4.3.2 Packages
L
A
T
E
X implements the notion of packages, essentially
a collection of macros stored in a file. The de facto
standard for this in Common Lisp is called ASDF
5
(Another System Definition Facility). Common Lisp
systems resemble L
A
T
E
X packages, only much more
evolved. For instance, systems are composed of a
hierarchy of files with customizable loading order,
automatic dependency tracking and recompilation
of obsolete object files (compare this to having only
interpreted macros) and much more.
4.3.3 Namespaces
A frequent rant about L
A
T
E
X is the lack of name-
spaces. Common Lisp provides a related concept
called packages (not to be confused with L
A
T
E
X pack-
ages). A package is a collection of symbols naming
functions, variables, or both. Packages have names
through which you access their symbols. Packages
may declare some symbols as public while the oth-
ers remain private. Suppose for instance that there
is a package named
ltx
, that implements
L
A
T
E
X 2
ε
,
and declares its function
document-class
as public.
From the outside, one would canonically refer to
this function as
ltx:document-class
. On the other
hand, the need to reference all such L
A
T
E
X symbols
explicitly in a document would probably be cumber-
some. In such a situation, one may use
use-package
,
in which case all public symbols become directly ac-
cessible. Using the
ltx
package makes it possible to
call the function
document-class
implicitly, without
the package name prefix.
4.3.4 Interactivity
As you likely know, T
E
X can be used in an interac-
tive fashion. The following example shows a sample
conversation between T
E
X and a user who mistypes
a macro name:
didier(s003)% tex
This is TeX, Version 3.1415926 [...]
**\relax
*\hule
! Undefined control sequence.
<*> \hule
? H
The control sequence at the end of the top
line of your error message was never \def’ed.
If you have misspelled it (e.g., ‘\hobx’),
5
http://www.common-lisp.net/project/asdf
type ‘I’ and the correct spelling
(e.g., ‘I\hbox’). Otherwise just continue,
and I’ll forget about whatever was undefined.
? I\hrule
*\bye
This kind of interaction pretty much resembles a
REPL which Common Lisp, like every interactive
language, provides out of the box. Where Com-
mon Lisp specifically comes into play is that every
Common Lisp application may embed an interactive
debugger for free, precisely useful for this kind of
error/recovery interaction. In Common Lisp, the
programmer has the ability to implement his own
recovery options (known as restarts in Lisp jargon)
without unwinding the stack. This is different and
much more powerful than regular catch/throw facili-
ties. If you are not interested in using the full-blown
debugger, you can implement a function for catching
errors and listing the available recovery options in a
bare 10 lines of code.
4.3.5 Dumping
Out of the historical concern for performance, T
E
X
has the ability to dump and reload so-called “format”
files, which saves a lot of parsing and interpretation
(cf. the
\dump
command). Given the increase in
computing power over the last 30 years, the question
of performance is admittedly much less critical than
it used to. Even today, though, performance is not a
concern that should be completely disregarded. For
example, compiling a lengthy Beamer presentation
with lots of animations can still be annoyingly slow.
Common Lisp happens to provide a dumping
feature out of the box. Although not part of the ANSI
standard, all Lisp compilers provide it. In SBCL
6
for
instance, the function
save-lisp-and-die
dumps
the whole global state (stack excepted) of the current
Lisp environment into a file that can later be quickly
reloaded with the
--core
command-line option. In-
stead of dumping a core image, it is also possible to
dump a fully functional standalone executable.
Because the dumping mechanism is accessible
to the end-user, interesting applications could be en-
visioned with very little programming, such as mid-
document dumping. By outputting to in-memory
strings instead of files, for example, a document au-
thor could be given the possibility to dump in the
middle of a large document’s processing. The result-
ing facility would be quite similar to
\includeonly
except that the whole document would be typeset
every time.
6
http://www.sbcl.org
Didier Verna
TUGboat, Volume˜0˜(9999), No.˜0 1005
4.3.6 Performance
Let us tackle the problem of performance from a more
general point of view. One frequent yet misinformed
argument against dynamic languages is: “they are
slow”. From that point of view, it may seem odd
to even begin to envision the reimplementation of a
program such as T
E
X in a dynamic language.
One first and frequent misconception about in-
teractive languages is that as soon as they provide
a REPL, they must be interpreted. This is in fact
not the case. Nowadays, many Common Lisp im-
plementations such as SBCL don’t even provide an
interpreter. Instead, the REPL has a JIT (Just In
Time) compiler which compiles the expressions you
type and only then executes them. To put this in per-
spective, compare the processes of interpreting T
E
X
macros by expansion and executing Lisp functions
compiled to machine code. . .
Yet, starting with the assumption that perfor-
mance should indeed be a concern (this is not even
necessarily the case), the argument of slowness may
still make some sense in specific cases. For exam-
ple, it is obvious that performing type checking at
run-time instead of at compile-time will induce a
cost in execution speed. In general however, this
argument, as-is, is meaningless. For starters, let us
not confuse “dynamic language” with “dynamically
typed language”. A dynamic language gives you a
lot of run-time expressive power, but that doesn’t
necessarily mean that you have to use all of it, or
that it is impossible to optimize things away.
Let us continue on type checking in order to
illustrate this. Look again at the definition for our
“double” function:
(defun dbl (x) (* 2 x))
This function will no doubt be relatively slow, be-
cause Common Lisp has a lot of things to do at
run-time. For starters, it needs to check that
x
is
indeed a number. Next, the multiplication needs to
be polymorphic because you don’t double an integer
the same way you double a float or a complex. That
is in fact not the whole story, but we will stop here.
On the other hand, consider now the following
version:
(defun dbl (x)
(declare (optimize (speed 3) (safety 0))
(type fixnum x)
(the fixnum (* 2 x)))
In this function, we request that the compiler op-
timizes for speed instead of safety. The result is
that the compiler will “trust” us and bypass all dy-
namic checks. Next, we actually provide static type
information.
x
is declared to be a
fixnum
(roughly
equivalent to integers in other languages) and so is
the result of the multiplication. This is important
because there is no guarantee that the double of an
integer remains the same-size integer. Consequently,
in general, Lisp would need to allocate a bignum to
store the result.
As it turns out, compiling this new version of
the function leads to only 5 lines of machine code.
The compiler is even clever enough to avoid using
the integer multiplication operator, but a left shift
instruction instead. What we get in this context is
in fact the behavior of a statically and weakly typed
language such as C. Consequently, it should not be
surprising that the level of performance we get out
of this is comparable to that of equivalent C code.
Recent experimental studies have demonstrated that
this is indeed the case [19, 20].
This particular example is also a nice illustration
of what we meant earlier by saying that Common
Lisp is both a full-blown, industrial scale, general
purpose programming language, and a scripting lan-
guage at the same time. When working at the script-
ing level, the first version of
dbl
is quick and good
enough. When working in the core of your applica-
tion however, you appreciate it when the language
provides a lot of tools (optimization ones notably)
to adjust your code to your specific needs.
4.4 Extensibility and customizability
Another aspect of the language well worth its own sec-
tion is its level of extensibility (adding new behavior)
and customizability (modifying existing behavior).
We know how important this is in the (L
A
)T
E
X world,
which is a complicated ball of intermixed threads all
interacting with each other [
21
], which wouldn’t be
possible without the level of intercession that T
E
X
macros offer. Similarly, at least part of the success
of LuaT
E
X is due to its ability to provide access to
T
E
X’s internals, so it seems that there is also a lot
of interest in this area.
4.4.1 Homoiconicity and reflection
Once again, Common Lisp is here to help. We men-
tioned earlier how the root of extensibility and cus-
tomizability in Common Lisp is its highly reflexive
nature. Reflection is usually decoupled into introspec-
tion (the ability to examine yourself) and intercession
(the ability to modify yourself).
In Lisp, reflection is supported in the most direct
and simple way one could think of. Remember the
expression
(+ 1 2)
, with or without evaluation? As
we said before, this expression can either represent
a call to the function “sum” with the arguments 1
and˜2, or the list of three elements: the symbol
+
and
Star T
E
X: The Next Generation
1006 TUGboat, Volume˜0˜(9999), No.˜0
the numbers 1 and˜2. What this really means is that
every piece of code, if not evaluated, can be seen as a
piece of data, and hence can be manipulated at will.
In fact, every piece of Lisp code is represented as a
list, which happens to be a user-level data structure.
This property of a programming language is known
as homoiconicity [5, 11].
Another important distinction in this notion is
structural vs. behavioral reflection [
10
,
17
]. While
structural reflection deals with providing a way to
reify a program, behavioral reflection deals with ac-
cessing the language itself. Lisp is one of the very
few languages to provide both kinds of reflection to
some extent, as we’ll now discuss.
4.4.2 Structural reflexivity
This section gives only a couple of examples of struc-
tural reflexivity, again, to demonstrate how some
well-known T
E
X idioms map to Common Lisp in a
straightforward way.
The functional nature of Lisp implies that func-
tions are first-class citizens in the language [
3
,
18
].
In general, this means that functions can be used
like any other object in the language. In particular,
this means that functions can be modified, stored in
variables, etc.
Storing a functional value in a variable will be
useful in order to implement a variant of the function
which needs to call the original one at some point.
This is equivalent to the following common T
E
X
idiom:
\let\oldfoo\foo
\def\foo{... \oldfoo ...}
Defining a function several times is simply equiv-
alent to overriding the previous definition(s). This
behavior matches that of
\def
or (more or less)
\renewcommand
. It is in fact more powerful for at
least two reasons:
1.
Since Common Lisp has a proper notion of scope
(and in fact provides both dynamic and lexi-
cal scoping, something that very few other lan-
guages can do), function or variable redefinition
can be performed at different scoping levels, not
only local or global.
2.
Because of its interactive nature, there is no
real distinction between functions defined in
a core image or executable and those defined
in the REPL and that consequently, they can
be redefined in the exact same way. Consider
what this really means for a minute: one could
redefine any function in the T
E
X program just
as easily as any T
E
X macro. . .
Finally, reflexivity in Common Lisp goes as far
as allowing both introspection and intercession at the
level of package internals. In most other languages, it
is simply not possible to access the so-called “private“
parts of a namespace, class, or whatever encapsula-
tion scheme is supported. In Common Lisp, remem-
ber that a public symbol is accessed by prefixing
its name with the name of the package and a colon
separator, for example: ltx:document-class.
It turns out that it is just as easy to both intro-
spect and intercede a package’s internals. One just
needs to use a double colon instead of a single one.
This is not unlike the
@
character convention used by
L
A
T
E
X, along with the macros
\makeatletter
and
\makeatother
. The double colon really is a warning
that you are prying on private property, but nothing
technically prevents you from doing so.
4.4.3 Behavioral reflexivity
Lisp goes even further by providing some level of
behavioral reflection as well.
Lisp macros (functions executed at compile-
time) provide a form of intercession at the compiler
level, allowing one to program language modifica-
tions in the language itself (what [
16
] calls a “homo-
geneous meta-programming system”, as opposed to
C
++
templates for instance, which are heterogeneous:
a different language).
CLOS [
2
,
6
], the Common Lisp Object System
is written in itself, on top of a so-called Meta-Object
Protocol, known as the MOP [
14
,
7
]. Using the CLOS
MOP permits intercession at the object system level,
allowing the programmer to modify the semantics of
the object-oriented layer.
Finally, it is also possible to extend the Lisp
syntax, which is a form of intercession at the parser
level, allowing to modify the language’s syntax di-
rectly. This is the only concrete example that we will
provide in this section, although a striking one. We
assume that the reader is familiar with T
E
X’s dou-
ble superscript syntax, allowing to denote characters
that are not normally printable.
Suppose that we are given a function called
^^-reader
which performs T
E
X’s double-superscript
syntax to character conversion (this function is 10
lines long). The following code effectively installs the
corresponding syntax in the Common Lisp reader:
(make-dispatch-macro-character #\^)
(set-dispatch-macro-character #\^ #\^
#’|^^-reader|)
The first line informs the Lisp reader that the
^
character is to be treated in a special way (do you
see a relationship to active characters and catcodes?).
The second line informs the Lisp reader that if two
Didier Verna
TUGboat, Volume˜0˜(9999), No.˜0 1007
such characters are encountered in a row, then, the
regular parser should stop and pass control to our
user-provided function. We can now verify that this
extended syntax works:
CL-USER> ^^M
#\Return
CL-USER> ^^00
#\Nul
This particular example is a bit simplified, but it
conveys the idea. By modifying the way the Lisp
parser behaves, we are able to modify the language
itself, not only the program we are executing. And
again, we are not very far from T
E
X’s notion of active
characters.
5 Common Lisp: how?
Section 2 on page˜1001 listed five objectives for a
modern reimplementation of T
E
X. Section 4 on
page˜1002 demonstrated how Common Lisp can help
fulfill three of these goals: providing real program-
ming capabilities, extensibility and customizability,
all of this while maintaining a relative ease of use.
This section is devoted to the last two objectives,
namely providing a more modern and consistent API
while at the same time (although not mandatory)
maintaining backward compatibility. In actuality,
this section provides a more concrete view of the
project itself. Although very little has been imple-
mented already, the project does have a name: TiCL
(the acronym for “T
E
X in Common Lisp”).
5.1 API
Mapping the typesetting T
E
X primitives (that is, the
non-programmatic ones) to Common Lisp would be
rather straightforward.
T
E
X parameters become Lisp variables. For
instance,
\badness
is represented by a Lisp
(global, dynamically scoped) variable badness.
T
E
X quantities become Lisp objects. The term
“object” is to be taken in a broad sense, that is,
depending on the exact requirements, either an
object of some class from the object system, of
some structure, or anything else. In Lisp, it is
customary to provide abstract constructor func-
tions following a specific naming scheme. For
instance, creating a T
E
X glue item could be done
with a call to a function such as make-glue:
(defun make-glue (b &key plus minus)
...)
Since functions like this are bound to be used
quite often, a syntactic shortcut may come in
handy, such as the rather idiomatic one follow-
ing, which also demonstrates the Lisp way to
set some variable to a specific value (but see
section 5.1.1):
(setf baselineskip
#g(b :plus x :minus y))
Obviously, every T
E
X primitive command be-
comes a Lisp function. Again, the point here is
to both simplify the syntax and make it more
consistent at the same time (no more
\relax
!).
Here are a couple of examples:
(input file)
(hbox material)
(hbox material :to dim)
(hbox material :spread dim)
The reader familiar with T
E
X will notice im-
mediately that the arguments in the last two
examples are in reverse order, compared to the
regular T
E
X versions. If this is really too much
to get accustomed to, variants are easy to im-
plement:
(hbox-to dim material)
(hbox-spread dim material)
Such syntactic variants are usually implemented
with Lisp macros, evaluated at compile-time, so
that there is no additional run-time cost.
5.1.1 Lisp-2
Let us go back to the
baselineskip
assignment
example for a minute:
(setf baselineskip <glue>)
This assignment may seem odd to a T
E
Xnician, who
is more accustomed to direct assignments such as
\baselineskip 10pt
In fact, T
E
X has this way of using the same macro
for both denoting its value and setting it.
We intentionally omitted one point in section 4.3
on page˜1003 in order to put it here: the fact that
Common Lisp is a “Lisp-2” (as opposed to Scheme
for instance, which is a Lisp-1). What this means is
that Common Lisp has 2 different namespaces for
functions and variables. In other words, the same
symbol can be used to refer to both a function and
a variable at the same time.
An interesting consequence of this is that it is
possible to define a function
baselineskip
the pur-
pose of which is to assign a value to the eponymous
variable
baselineskip
. Assuming this function ex-
ists, the above Lisp expression can hence be simplified
as follows:
(baselineskip <glue>) ; set it!
Again, this aspect of Common Lisp brings us even
closer to one of T
E
X’s idioms: that of quantity as-
signment.
Star T
E
X: The Next Generation
1008 TUGboat, Volume˜0˜(9999), No.˜0
Figure 1: TiCL architecture
5.2 Backward compatibility
The programmatic interface to the typesetting part of
T
E
X described in the previous section is what we call
“procedural T
E
X”. Along with the actual typesetting
engine, this mostly corresponds to T
E
X’s stomach
and bowels. On top of that, it is in fact possible
to design either completely new typesetting systems
or programmatic versions of Plain T
E
X, L
A
T
E
X, etc.
entirely in Lisp.
In order to maintain backward compatibility, we
also need to implement “traditional” T
E
X (still in
Lisp), that is, the surface layer consisting of both the
macro versions of Lispified T
E
X primitives, and the
rest of T
E
X’s programming API. This is mostly lo-
cated in T
E
X’s mouth and eyes. Figure 1 depicts this
architecture (disclaimer: this is an overly simplified,
extremely naive view; see section 5.3 for details). As
mentioned earlier, the whole point of this architecture
being implemented in Lisp, in terms of extensibility
and customizability, is that the user of TiCL can ba-
sically interfere at every level of the system, whether
by adding personal functions, rewriting built-in func-
tions or even internal typesetting algorithms.
Another advantage of this fully integrated ap-
proach is that it is actually quite simple to provide
“mixed” functionality, that is, using Common Lisp
code directly in an otherwise regular T
E
X source file.
The only requirement is an escape syntax allowing
Common Lisp to take over interpretation of Lisp
code, and insert the result back into the regular T
E
X
character stream. A prototype for this has already
been implemented. It simply consists in a Common
Lisp implementation of T
E
X’s eyes with an additional
bit of syntax: the appearance of two consecutive and
equal subscript characters in a regular T
E
X source
file will trigger the evaluation of the subsequent Lisp
expression. Since this syntax is invalid in T
E
X, it
cannot break any existing document.
Below is an example illustrating this idea (taken
from [
15
]). The Lisp function
ast
is used to define a
T
E
X macro
\asts
outputting a specified number of
asterisks.
\documentclass{article}
\newcommand\asts{}
__(defun ast (n)
(format nil "\\renewcommand\\asts{~A}"
(make-string n :initial-element #\*)))
\begin{document}
__(ast 10)
\asts
\end{document}
For the curious, our current implementation of T
E
X’s
eyes in Common Lisp is roughly 200 lines. The
support for the double subscript syntax (including
parsing it, reading the Lisp code, evaluating it and
inserting the result back into the regular T
E
X stream)
amounts to only 16 lines, that is, around 8% of the
total.
5.3 Expected problems
After all those “would” and “could”, let us get to a
more pessimistic (realistic?) view of the project. This
section sheds some darkness on the too-bright picture
we have drawn of TiCL until now. As mentioned
earlier, TiCL is in fact pretty much only an idea
at present. If this project ever comes to fruition, a
number of problems are expected.
5.3.1 A huge task
Completely reimplementing T
E
X is a huge task. This
is probably one of the reasons all alternative projects
Didier Verna
TUGboat, Volume˜0˜(9999), No.˜0 1009
(NTS excepted) chose the hybrid approach instead of
the fully integrated one. One thing that may help is
the existence of foreign function interfaces, notably
for the C language. The project could be developed
gradually by linking to the WEB/C implementation
of the not-yet-reimplemented parts.
5.3.2 Compatibility
Another question to consider is whether the T
E
X-
incompatible part would eventually be accepted by
the (or a new) community. This question is in fact
pertinent for L
A
T
E
X3 as well and we don’t have an
answer. The existence of a compatibility mode with
pluggable Lisp, as described in section 5.2 on the fac-
ing page, would probably help getting people accus-
tomed to the benefits of using Lisp, while remaining
in a reassuring context of traditional T
E
X.
5.3.3 T
E
X’s organs
Figure 1 on the preceding page was already pointed
out to be overly simplified. In reality, we know that
the T
E
X organs don’t constitute a pipeline. Some
levels from “down below” do affect the upper stages
which makes things much more complicated. In the
long run, this may imply that it would be impossi-
ble to have new programmatic typesetting systems
(accessing “procedural T
E
X” directly) work in con-
junction with traditional T
E
X. Currently, we don’t
know for sure, although we are aware of the fact that
this was one of the reasons the NTS project failed.
5.3.4 Mixed mode
Along those same lines, “mixed” mode, that is, the
ability to mix regular T
E
X macros with Lisp code is
expected to be tricky. If we want to provide more
interaction than just the double subscript syntax,
for example, the ability to define T
E
X macros in
Lisp with Lisp access to the macro’s arguments, we
are bound to encounter issues related to the time of
expansion.
This problem is in fact not related to Lisp at
all, but rather to any approach aiming to provide
the ability to define T
E
X macros in another language.
Section 4 of [
15
] describes these kinds of problems
in more detail.
5.3.5 Sandboxing
As with any scriptable application, the security prob-
lem is an important one. How much programmatic
access to the application itself, or its surrounding en-
vironment, are we willing to give the end-user? Some
versions of T
E
X are equipped with means to address
this issue (cf. the
-shell-escape
command-line op-
tion and the
\write18
command). Using a true,
comprehensive programming language with scripting
capabilities makes this problem even worse because
we have more than a couple of “macros” to control
access to. We have a whole stack of language fea-
tures to control, such as file I/O, operating system
interfaces, etc. Currently, we are not aware of any
sandboxing library already available for Lisp.
5.3.6 Intercession
This is in a similar vein as the sandboxing problem.
In [
21
], we were complaining about (rejoicing in?)
the huge intercession mess that the L
A
T
E
X world is.
Well, now you should really be afraid because we are
going to make things worse! Remember that any Lisp
executable granting scripting privileges to the end-
user effectively provides write-access to the whole
executable. This surely makes it trivial to extend or
customize the system. Whether this is a good thing
for the system in the long run, there is no way to
tell. After all, L
A
T
E
X is alive and kicking, in spite of
(because of?) its high intercession capabilities. . .
6 Conclusion
In this paper, we advocated a “homogeneous” ap-
proach to T
E
X modernization in which T
E
X itself is
first rewritten in a programming language serving
both at the core of the program and at the scripting
level. All programmatic macros of T
E
X are hence
rendered obsolete, as the underlying language itself
can be used for user-level programming. In a rather
puzzling way, our notion of “modernity” consists
of reimplementing a 30-year-old language using a
50-year-old one!
We demonstrated why we think that Common
Lisp is a very good choice for doing this. Common
Lisp is both a scripting language and a full-blown,
industrial scale, general-purpose programming lan-
guage at the same time. Common Lisp also provides
several paradigms or idioms that are actually quite
close to features that T
E
X either provides as well, or
would like to provide.
Will the TiCL project ever come to birth? That
is not sure at this point, as it will require a tremen-
dous effort. We would like to see it happening, of
course. Amongst all the current alternatives, LuaT
E
X
seems to be the only one vigorously alive and gaining
momentum.
Of course, the other project that needs to be
mentioned is L
A
T
E
X3. We do so only in an anecdotal
fashion because it does not make use of a real pro-
gramming language, but continues in the tradition
of building directly on top of T
E
X. Still, L
A
T
E
X3 is
not experimental anymore and is light years ahead
of what
L
A
T
E
X 2
ε
is. In particular, it is much better
Star T
E
X: The Next Generation
1010 TUGboat, Volume˜0˜(9999), No.˜0
than its ancestor in terms of syntax consistency and
programmatic capabilities. Nevertheless, since it is
still restricted to T
E
X’s macro-expansion world, ev-
ery time a new programming paradigm is needed, it
will have to be implemented manually.
To sum up, the “big T
E
X modernization plan”
currently seems to follow two different paths: stay-
ing strictly in the T
E
X area or creating hybrids of
T
E
X and another real programming language. The
approach we would like to suggest with TiCL is a
third one: a homogeneous approach in which the im-
plementation language of T
E
X is also the one which
serves at the scripting level. Will this project see the
light of day? Can these three approaches co-exist in
the long term? Only the future will tell. . .
Epilogue
These were the voyages,
Of a software enterprise.
Its continuing mission:
To explore new tokens,
To seek out a new life,
New forms of implementation.
To \textbf{go},
Where no T
E
X has gone before!
References
[1]
Common Lisp. American National Standard:
Programming Language. ANSI X3.226:1994
(R1999), 1994.
[2]
Daniel˜G. Bobrow, Linda˜G. DeMichiel,
Richard˜P. Gabriel, Sonya˜E. Keene, Gregor
Kiczales, and David˜A. Moon. Common Lisp
Object System specification. ACM SIGPLAN
Notices, 23(SI):1–142, 1988.
[3]
Rod Burstall. Christopher Strachey
Understanding programming languages. Higher
Order Symbolic Computation, 13(1-2):51–55, 2000.
[4]
Jonathan Fine. T
E
X forever! In Proceedings
EuroT
E
X, pages 140–149, Pont-`a-Mousson,
France, 2005. DANTE e.V.
[5]
Alan˜C. Kay. The Reactive Engine. PhD thesis,
University of Hamburg, 1969.
[6]
Sonya˜E. Keene. Object-Oriented Programming in
Common Lisp: A Programmer’s Guide to CLOS.
Addison-Wesley, 1989.
[7]
Gregor˜J. Kiczales, Jim des Rivi`eres, and
Daniel˜G. Bobrow. The Art of the Metaobject
Protocol. MIT Press, Cambridge, MA, 1991.
[8]
Donald˜E. Knuth. Digital Typography. CSLI
Lecture Notes. CSLI, September 1998.
[9]
John MacCarthy. Recursive functions of symbolic
expressions and their computation by machine,
part I. Communications of the ACM, 3:184–195,
1960. Online version at
http://www-formal.
stanford.edu/jmc/recursive.html.
[10]
Patty Maes. Concepts and experiments in
computational reflection. In OOPSLA. ACM,
December 1987.
[11]
M.˜Douglas McIlroy. Macro instruction extensions
of compiler languages. Commun. ACM, 3:214–220,
April 1960.
[12]
Marjan Mernik, editor. Formal and Practical
Aspects of Domain Specific Languages: Recent
Developments, chapter˜1. IGI Global, 2012.
[13]
Andrew Mertz and William Slough. Programming
with PerlT
E
X. TUGboat, 28(3):354–362, 2007.
[14]
Andreas Paepcke. User-level language crafting
Introducing the CLOS metaobject protocol.
In Andreas Paepcke, editor, Object-Oriented
Programming: The CLOS Perspective, chapter˜3,
pages 65–99. MIT Press, 1993. Downloadable
version at
http://infolab.stanford.edu/
~
paepcke/shared-documents/mopintro.ps.
[15]
Scott Pakin. PerlT
E
X: Defining L
A
T
E
X macros
using Perl. TUGboat, 25(2):150–159, 2004.
[16]
Tim Sheard. Accomplishments and research
challenges in meta-programming. In Walid
Taha, editor, Semantics, Applications, and
Implementation of Program Generation, volume
2196 of Lecture Notes in Computer Science, pages
2–44. Springer, 2001.
[17]
Brian˜C. Smith. Reflection and semantics in
Lisp. In Symposium on Principles of Programming
Languages, pages 23–35. ACM, 1984.
[18]
J.E. Stoy and Christopher Strachey. OS6
An experimental operating system for a small
computer. Part 2: Input/output and filing
system. The Computer Journal, 15(3):195–203,
1972.
[19]
Didier Verna. Beating C in scientific computing
applications. In Third European Lisp Workshop at
Ecoop, Nantes, France, July 2006.
[20]
Didier Verna. CLOS efficiency:instantiation. In
International Lisp Conference, pages 76–90, MIT,
Cambridge, Massachusetts, USA, March 2009.
Association of Lisp Users.
[21]
Didier Verna. Classes, styles, conflicts:
The biological realm of L
A
T
E
X. TUGboat,
31(2):162–172, 2010.
http://tug.org/TUGboat/
tb31-2/tb98verna.pdf.
Didier Verna
EPITA / LRDE
14-16 rue Voltaire
94276 Le Kremlin-Bicˆetre Cedex
France
didier (at) lrde dot epita dot fr
http://www.lrde.epita.fr/~didier
Didier Verna