Parallelizing ickref

Didier Verna

EPITA

Research and Development Laborator y

Le Kremlin-Bicêtre, France

didier@lrde.epita.fr

ABSTRACT

Quickref is a global documentation project for Common Lisp soft-

ware. It builds a website containing reference manuals for Quicklisp

libraries. Each library is rst compile d, loaded, and introspected.

From the collected information, a Texinfo le is generated, which is

then processed into Html. Because of the large number of libraries

in Quicklisp, doing this sequentially may require several hours of

processing. We report on our experiments parallelizing Quickref.

Experimental data on the morphology of Quicklisp libraries has

been collected. Based on this data, we are able to propose a number

of parallelization schemes that reduce the total processing time by

a factor of 3.8 to 4.5, depending on the exact situation.

CCS CONCEPTS

• Computing methodologies → Parallel algorithms

;

• Soft-

ware and its engineering → Software performance

; Software

libraries and repositories;

• Applie d computing →

Hypertext

languages;

KEYWORDS

Parallelization, Multi-Threading, Software Performance, Software

Documentation, Typesetting

ACM Reference Format:

Didier Verna. 2019. Parallelizing Quickref. In Proceedings of the 12th European

Lisp Symposium (ELS’19). ACM, New York, NY, USA, 8 pages. https://doi.

org/10.5281/zenodo.2632534

1 INTRODUCTION

Quickref is a global documentation project for Common Lisp [

]

software. It builds a website containing reference manuals for li-

braries available in Quicklisp

. Quickref is freely available

, so

anyone can create a local version of the documentation website for

personal use, but we also maintain a public website documenting

the whole Quicklisp world

Quickref itself is actually not much more than a layer of integra-

tion “glue” cadencing the inter-operation of four external software

components (see Section 2). Until recently, it essentially consisted

in a big loop, iterating over every Quicklisp library, and sequentially

executing all the steps required in producing the corresponding

https://www.quicklisp.org

https://gitlab.common-lisp.net/quickref

https://quickref.common-lisp.net

ELS’19, April 01–02 2019, Genova, Italy

This is the author’s version of the work. It is posted here for your personal use. Not

for redistribution. The denitive Version of Record was published in Proceedings of the

12th European Lisp Symposium (ELS’19), https://doi.org/10.5281/zenodo.2632534.

reference manual, from downloading the library to actually pro-

ducing an Html le. Because Quicklisp is quite large (it currently

provides more than 1700 libraries), this process could take between

1h30 and 7 hours on our test machine, depending on the exact

conditions. Even if 7 hours, the worst case scenario, still ts nicely

into one night of batch processing, it is worth trying to improve

the performance of the system, notably by means of parallelism.

The purpose of this article is to report on this work.

Section 2 describes the tool-chain involved in the generation of

the reference manuals and some important characteristics of the

involved software components. Section 3 presents the experimental

conditions and the various congurations used to perform timing

measurements. Section 4 provides and analyzes some preliminary

global measurements, giving us a general idea of what to expect.

Section 5 proposes dierent parallel solutions, each one coming

with its pros and cons. Finally, after the conclusion, an extensive

discussion is proposed in Section 7.

2 QUICKREF TOOL-CHAIN

Figure 1 depicts the typical reference manual production pipeline

used by Quickref, for a library named foo.

(1)

Quicklisp is rst used to make sure the library is installed up-

front. This is done by calling

ql-dist:ensure-installed

and results in the presence of a directory for that library (a

release in Quicklisp terms) in the Quicklisp directory tree.

Currently, Quickref only considers one system per library,

called the primar y system. How exactly this system is com-

puted is unimportant for this paper.

(2)

Declt

is then run on the primary system to generate the doc-

umentation. Declt is another library of ours, written 5 years

before Quickref, but with that kind of application in mind

right from the start. In particular, it is for that reason that

the documentation generated by Declt is in an intermediate

format called Texinfo.

(3)

The Texinfo le is nally processed into Html. Texinfo

is the GNU ocial documentation format. There are two

main reasons why this format was chosen when Declt was

originally written. First, it is particularly well suited to tech-

nical documentation. More importantly, it is designed as an

abstract, intermediate format from which human-readable

documentation can in turn be generated in many dierent

forms (Pdf and Html notably).

Quickref essentially runs this pipeline on every available library

(it currently has the ability to limit itself to what is already installed,

https://www.lrde.epita.fr/~didier/software/lisp/misc.php#declt

https://www.gnu.org/software/texinfo/

ELS’19, April 01–02 2019, Genova, Italy Didier Verna

Quicklisp

foo/

Declt

foo.texi

Makeinfo

foo.html

Figure 1: Reference Manual Generation

Main thread, External Process

or process the whole Quicklisp world). Some important remarks

need to be made about this process.

First of all, Declt works by introspection: it uses Asdf

’s

load-

system

function to load the system being processed, which may

involve compiling and loading some dependencies as well as the

system itself. It then introspects the system to collect documen-

tation items, notably by way of the

sb-introspect

facility from

Sbcl

. Given the size of Quicklisp, it would be unreasonable to

load almost two thousand libraries in a single Lisp image. For that

reason, Quickref doesn’t actually run Declt directly, but instead

uses uiop:run-program to fork an Sbcl script to do it.

Similarly,

makeinfo

(

texi2any

in fact), the program used to

convert the Texinfo les to Html, is an external program written

in Perl (with some parts in C), not a Lisp library. Thus, here again,

uiop:run-program

is used to fork a

makeinfo

process out of the

Quickref Lisp instance.

In light of these remarks, the reader must keep in mind the

following points.

•

Declt and Texinfo are treated as monolithic black boxes (in

fact, Asdf as well), that is, we don’t attempt to alter their

operation. Any attempt at parallelization will hence boil

down to scheduling the

declt

and

makeinfo

processes in

a specic way. Thus, when we speak of “threads” in the re-

mainder of this paper, we actually mean small Lisp functions

that essentially fork external processes and wait for them,

in a loop.

•

Because running Declt may require the compilation of Lisp

components, possibly including dependencies, and because

dierent libraries may share the same dependencies, there is

an inherent concurrency problem in writing the compilation

les. Care must hence be taken to protect against those

potentially concurrent accesses when nee ded.

3 EXPERIMENTAL CONDITIONS

3.1 Environment

All b enchmarks reported in this paper were collected on a De-

bian Gnu/Linux

system, version 9.6 “Stretch”. Quicklisp currently

requires Debian 9 for testing, and advertises the list of required

foreign dep endencies. This environment hence guarantees that as

many libraries as possible could b e handled. All foreign dependen-

cies were pre-installed, most of them already packaged by Debian.

We used Sbcl 1.4.0, cloned from its Git repository and manually

compiled with

--fancy

(which, among other things, activates multi-

threading).

The computer used was a Dell Precision T1600, equipped with

16 GB

of RAM, a

120 GB

mechanical hard drive and an Intel Xeon

E3-1245 processor. This processor has 4 hyper-threaded cores[

https://common-lisp.net/project/asdf/

http://www.sbcl.org/

http://www.debian.org

so that 8 threads are actually available. Note that as of version 2.4,

the Linux kernel is aware of hyper-threading. Debian 9 includes

version 4.9. Although the various timings reported in this paper

were collected from single runs (as opposed to averaging several

ones), the machine was freshly rebooted in single-user mode, in

order to avoid non-deterministic operating system or hardware

side-eects as much as possible.

For the Quickref tool-chain, the following software components

were used: Declt 2.4.1, Makeinfo (

texi2any

) 6.5, and an up-to-date

Quicklisp version 2019-01-07. This version of Quicklisp contains

1719 libraries. It is worth noting that Quickref currently fails on

less than 2% of those libraries, for various reasons: dependency

problems (foreign or not), compilation problems, Declt problems

etc. These issues are out of the scope of this paper, so we simply

ltered out the problematic libraries in our reports.

3.2 Conguration

While Quickref is primarily meant to build a complete documenta-

tion website for Quicklisp, a number of options are available, which

need to be taken into account in our experiments.

3.2.1 Libraries and updating policy. By default, Quickref attempts

to globally update Quicklisp before processing, which is the right

thing to do for the public website. Individual users, however, also

have the possibility to create a local website for their personal

working environment only. To this end, Quickref makes it possible

to only consider the libraries already installed on the local machine

(instead of the whole Quicklisp world), and also to avoid updating

those if that is unwanted. As a consequence, and depending on the

exact situation, documenting a library with Quickref may or may

not lead to downloading some code, and may or may not trigger

some Lisp compilation (dependencies included) before actually

loading and introspecting it.

3.2.2 Cache policy. On several occasions, we observed problems

related to the compilation of common dependencies. One typical

problem is when two libraries dep end on a third one, and that

dependency needs to be compiled in two dierent ways. A global

compilation cache, as provided by Asdf by default is bound to

fail. Another problem (which, at least in our opinion, should be

regarded as a bug) is when the compilation or loading of a library

leads to global side-eects on the top-level environment. The latest

example we saw is that of

common-lisp-stat

, globally changing

the reader’s default oat format from

single-float

double-

float

[

]. This kind of behavior is bound to cause problems, espe-

cially when almost two thousand libraries are involved. Because of

that, Quickref now has an option for making Asdf use a dierent,

local, compilation cache for every documented library.

3.2.3 Scenarios. In order to take into account all those dierent

options, we have experimented on 3 dierent situations.

Parallelizing ickref ELS’19, April 01–02 2019, Genova, Italy

Makeinfo

40!%

Declt

41!%

3!%

Compilation

16!%

Figure 2: Time Distribution w/ compilation

(1)

All the libraries are already compiled, so Declt just needs

to load them. This is the best-case scenario. Also, note that

whether the compilation cache is global or local doesn’t

matter here.

(2)

The libraries are not compiled, but a global compilation cache

is used, so that no redundant processing occurs. This should

be regarded as the intermediate, most frequent scenario.

(3)

The libraries are not compiled, and local compilation caches

are used. This is the absolute worst-case scenario.

Note that regardless of the scenario, we always process the en-

tirety of Quicklisp (modulo the failures), and the 1719 libraries in

question are already downloaded. Network connectivity is consid-

ered to o uctuating to be included in benchmarks, and besides,

including it would hinder the idea of experimenting in single-user

mode, in order to be as deterministic as possible. Under those con-

ditions, we measured that in the original, sequential version of

Quickref, scenario 1 takes 1h 27m, scenario 2 takes 1h 51m, and

scenario 3 takes 7h 01m.

4 PRELIMINARY ANALYSIS

In order to get a general idea on the behavior of the dierent soft-

ware components involved, we p erformed a set of preliminary

measurements, which we partially report and analyze in this sec-

tion. We separately collected Asdf load/compile times, and Declt

and Makeinfo processing times for every Quicklisp library. As of

this writing, the code used to collect that experimental data is avail-

able in the

benchmark

branch of the Quickref repository. The data

itself is also publicly available

4.1 Time Distribution

Figures 2 and 3 depict the time distribution for scenarios 2 and

1 respectively, that is, with a global compilation cache, with or

without compilation. The actual values were obtained by summing

the ones collected individually for each library, but they conrm the

global timings reported at the end of section 3, which were obtained

in another run of the scenarios. Namely, compilation takes 16m

49s, loading requires 03m 22s, Declt needs 43m 19s, and Makeinfo

runs for 43m 06s. Note that the only measured redundancy here

is in Asdf load times. Inde ed, compilation, Declt, and Makeinfo

processing occur only once per library. However, the load time

https://github.com/didierverna/quickref-benchmarks

Makeinfo

48!%

Declt

48!%

4!%

Figure 3: Time Distribution w/o compilation

Makeinfo

10!%

Declt

10!%

1!%

Compilation

79!%

Figure 4: Time Distribution w/ separate compilation

measurements include not only the libraries themselves, but also

their dependencies, such that the actual measure for one library

corresponds to the numb er of times it appears as a dependency,

plus one.

The important information we get is that Declt and Makeinfo

require practically the same amount of time to run in total. By

summing Asdf time and Declt time, we see that in scenario 1 (no

compilation required), Texinfo generation takes 52% of the time,

versus 48% for Html generation. In scenario 2, Texinfo generation

takes 60% of the time, versus 40% for Html generation. We did not

collect numbers for scenario 3 (separate compilation directories for

every library), but we can reconstruct them quite easily. Indeed, the

Makeinfo, Declt, and Asdf load times are the same. The remainder

of the 7h 01m, which amounts to 5h 32m, is thus devoted to (re-

dundant) compilation. In this scenario, shown in Figure 4, Texinfo

generation takes 90% of the time while Html generation involves

only the other 10%.

4.2 Time Shapes

Figures 5 and 6 provide two dierent views on the Declt processing

real times. The rst one displays the timings on a logarithmic scale,

library per library (the libraries appear by lexicographic order on the

X axis). The second one provides a histogram of the same data, with

logarithmic scale on both axis. The actual numbers unimportant.

What is important, on the other hand, is the general shape and

characteristics of the data distribution.

The rst remark is the ver y wide range of processing times. They

spread from approximately 1s to 5m 19s. The second remark is that

ELS’19, April 01–02 2019, Genova, Italy Didier Verna

100

1000

0 200 400 600 800 1000 1200 1400 1600

Texinfo generation / Declt real time (seconds)

Libraries

Figure 5: Declt real time per library

0.1

100

1000

10000

1 10 100

Number of libraries per 1/2 seconds intervals

Texinfo generation / Declt real time (seconds)

Figure 6: Declt real time histogram

most of the processing times are short compared to the maximum

value. 75% of the libraries are processed in less than 1.5s, 92% in

less than 2s, and 96% under 2.5s. Only 20 libraries require more

than 5s for processing. Unfortunately, and this is the third remark,

we cannot really take advantage of this knowledge in our parallel

design, because there isn’t any actual probability law underlying

this data distribution. The average time is 1.67s, the median is of

1.23s, but the standard deviation is 7.99, which is meaningless, given

the fact that all our timings are positive. In fact, the reason for this

is that we cannot discard the “aberrant” values as experimental

accidents, because they aren’t: they are reproducible. For example,

the library taking more than 5 minutes of Declt processing is

lisp-

interface-library

. Further investigation shows that no matter

how many times we repeat the Declt run, the timing will remain

within that order of magnitude. Thus, when a thread is busy running

Declt on that library, it will stay busy for around 5 minutes, a time

during which 180 average libraries could be processed. That is 10%

of Quicklisp!

We have conducted the same analysis for Asdf compile & load

times, Asdf load times only (pre-compiled), and Makeinfo process-

ing times. We do not report the results here. Suce to say that in

every case, we note the same kind of morphology: very wide range

of values, high concentration of smaller values with a small, yet

undiscardable number of “aberrations”.

5 PARALLEL SOLUTIONS

As of this writing, the parallel solutions presented below are all

implemented in a Quickref subsystem, automatically loaded when

Sbcl has multi-threading supp ort compiled in. They may be dy-

namically selected and parametrized (e.g. number of threads for

each task) through a set of keywords to the main Quickref entry

point (the build function).

5.1 Solution 1

The rst proposed solution is presented in Figure 7. This solution

uses only two threads, and takes advantage of the natural sequenc-

ing of operations to establish a shared buer of Texinfo les between

Declt and Makeinfo. The main thread builds the Texinfo les se-

quentially (in any order). The second thread waits for them, grabs

them (possibly by batches, emptying the shared buer in one shot),

and converts them to Html.

5.1.1 Advantages. This solution is very simple to implement. There

is only one shared resource: the buer of Texinfo les. Only two

threads are required, so it can work on older CPU’s (e.g. dual-core

without hyper-threading), or be less demanding on an otherwise

busy or shared computer. Because the libraries are processed se-

quentially by Declt, no concurrent compilation occurs, so this solu-

tion may be used in either of our three scenarios.

5.1.2 Drawbacks. This solution’s strengths ow from the same

well as its weaknesses. Because only two threads are used, it will

not take full advantage of the available resources. Besides, we know

from Section 4.1 that depending on the scenario, Texinfo processing

takes 48%, 40%, or only 10% of the time. This means that the Html

thread will in general be waiting more than working.

5.1.3 Exp erimentation. Experimentation with this algorithm con-

rms our analysis. Scenario one (no compilation) now takes 48m

30s instead of 1h 27m (almost twice as fast). Scenario 2 (global cache

policy) takes 1h 05m instead of 1h 51m (we save roughly 40% of the

time). Scenario 3 (local cache) takes 6h 22m instead of 7h 01m (we

save around 10% of the time).

Note that because the Texinfo les are completely independent

of each other and have no dependencies, it is straightforward to

add more threads for Texinfo processing (the exact same function

may be spawned multiple times). This, however, would be useless,

as Texinfo processing is not where most of the time is spent. Again,

experimentation also conrms this. Only in the next solutions will

it become protable to parallelize Html generation.

5.2 Solution 2

Solution 2, presented in Figure 8 is a logical extension to solution 1.

This time, the main process spawns several threads building Texinfo

les in parallel, and several others generating the Html ones. As

Parallelizing ickref ELS’19, April 01–02 2019, Genova, Italy

Libraries Declt Texinfo Files Makeinfo HTML Files

Figure 7: Solution 1

Main thread, Html thread

Libraries Declt

Declt

Texinfo Files

Makeinfo

HTML Files

Figure 8: Solution 2

Declt threads, Html threads

before, a shared buer of Texinfo les is used, but compared to

solution 1, there are some notable dierences or complications.

•

Because multiple Declt threads exist, the original pool of

libraries now becomes a shared resource. The Declt threads

must hence synchronize on it as well as on the shared buer

of Texinfo les.

•

Contrary to solution 1, the Makeinfo threads must not empty

the buer at once, grabbing a whole batch of Texinfo les

to process. Indeed, we have learnt from Section 4.2 that

the time distribution is not homogeneous, and that some

libraries take an extremely long time to process, compared

to the average. Thus, if we were to grab batches of Texinfo

les, we would risk an accumulation eect, whereby multiple

“short” Texinfo les would be blocked behind a “long” one,

essentially re-sequentializing Html generation.

•

For the exact same reason, it would seem ill-advised to simply

split the initial pool of libraries into as many Declt threads as

there are, and let them process their own batch sequentially.

Instead, each thread will just process one library at the time.

5.2.1 Advantages. This solution will let us ne-tune the number

of threads devoted to each task, depending on the machine at hand,

or the scenario involved.

5.2.2 Drawbacks. Because the libraries are processed in parallel

by Declt, in no particular order, concurrent compilation of common

dependencies may occur. This solution can thus be safely used in

scenarios 1 and 3 only.

5.2.3 Experimentation. Fortied by the time distribution reported

in Section 4.1, we were able to ne-tune the number of threads in

this solution to match our expectations. For scenario 1 (no compila-

tion), the best results are achieved with the same number of threads

for Declt and Makeinfo, specically 4 and 4, corresponding to the

hyper-threaded quad-core hardware conguration used in the ex-

periments. It now takes 21m 47s to complete scenario one, which

corresponds roughly to 25% of the original 1h 27m. For scenario 3

(local cache), the best results are achieved with 8 Declt threads and

2 Makeinfo threads. It now takes 1h 51m to complete scenario 3,

which corresponds roughly to 26% of the original 7h 01m.

1 2 3 4 5 6 7 8 9 10 11

Number of libraries

Batches of standalone libraries (rst to last)

460

305

237

268

178

162

Figure 9: Library batches

5.3 Solution 3

Solution 3 is a renement of solution 2 aiming at making it work

with scenario 2 (global compilation cache). Remember that the com-

plication comes from two libraries sharing common dependencies.

Any attempt at loading them in parallel could result in the simulta-

neous compilation of the same dependency, followed by concurrent

writing of the same fasl le. In order to prevent this, we must en-

sure that libraries processed in parallel by Declt do not have any

dependencies in common, or only already compiled ones. We call

those standalone libraries.

This problem is of course closely related to that of topological

sorting[

], with the exception that we don’t need full serialization.

On the contrary, we want to retrieve batches of standalone libraries

for parallel processing. The proposed solution is quite simple. First,

we build a dependency graph of the libraries. The leaves in this

graph do not have any dependencies, so they can be processe d

in parallel. We collect them; they constitute our rst batch. We

remove them from the graph, which leads to a new set of leaves,

constituting the second batch. We repeat this process until the

graph is exhausted. For the curious, Figure 9 shows those batches

ELS’19, April 01–02 2019, Genova, Italy Didier Verna

Library Batch

Batch 1

Batch 2

Declt

Figure 10: Solution 3, stage 1

Main thread, Declt threads

in the current Quicklisp distribution. We got 11 batches of 460 to

only 1 libraries, from rst to last.

In order to adapt solution 2 to this new scheme, a new shared

buer is created (see Figure 10). The Declt threads pick libraries

from it instead of from the original libraries pool. The main thread

sends successive batches of standalone libraries to this buer, and

waits for them to have been exhausted before sending the next

batch in. The rest of solution 2 is unchanged (in particular, the

Html generation code can be re-used without modication).

5.3.1 Advantages. At the expense of a slightly more complicated

synchronization logic, this solution may be used in any of our 3

scenarios. In the current status of Quicklisp, the dependency graph

is relatively small (less than two thousand nodes), which means that

the additional computation time required to handle it is negligible

compared to the 21m 47s of our current most optimistic situation.

5.3.2 Drawbacks. Before sending the next batch in, the main thread

must wait for all libraries in the current batch to have been entirely

processed by a Declt thread; not just have been picked up by one

of them. At a rst glance, this may not appear as a serious issue

because we only have 11 batches and a few threads handling them.

However, remember again from Section 4.2 that some libraries will

take a very long time to process. If, for example, such a “long”

library is part of a small batch, the batch will be quickly emptied,

and all Declt threads will essentially become dormant until the

“long” library is treated. This is yet another form of accumulation

eect that can potentially hinder the parallelization.

5.3.3 Experimentation. Because the time required to maintain the

dependency graph is negligible, this solution is not expected to

make much dierence in scenarios 1 (no compilation) and 3 (local

cache), as it would boil down to handling the libraries in a dierent

order. For scenario 2, the best result was obtained with an equal

number of threads for Declt and Makeinfo, namely, 4 of each (again,

corresponding to the hyper-threaded quad-core hardware cong-

uration used in the experiments). There, the overall computation

time fell down to 29m 21s, that is, 26% of the original sequential

time. Given the time distribution in Figure 2, we also tried matching

that proportion, for example with 5 Declt threads and 3 Makeinfo

ones. We only got similar (inconclusive) result only diering by

less than 5%.

6 CONCLUSION

As mentioned in the introduction, the absolute worst case scenario

for Quickref, which is to build the complete Quicklisp documenta-

tion from scratch, takes around 7 hours on our test machine. Even

if such a duration may appear reasonable for batch processing,

we still believe that parallelization is not a vain endeavor. First of

all, the ability to use Quickref interactively (creating for example

one’s own local documentation website) makes it worth improving

its eciency as much as possible. Secondly, Quicklisp itself is an

ever-growing repositor y (monthly updates usually add at least a

dozen new libraries to the pool), and so is the time to generate the

documentation for it.

In this paper, we have devised a set of parallel algorithms, and

experimented with them in dierent scenarios corresponding to

the typical use-cases of Quickref. On our test machine, we were

able to reduce the required processing time roughly by a factor of

4 compared to the naive sequential version, which is already quite

satisfactory. The absolute worst-case scenario fell under 2 hours,

and the most frequent one under half an hour. For all that, and in

spite of the fact that gracefully handling concurrency is always

a tricky business, our parallel solutions remain quite simple. The

implementation of solution 3, for example, requires only 3 shared

resources (2 buers and a counter), 2 mutexes and 3 condition

variables. It was implemented directly with Sbcl’s multi-threading

layer, without resorting to higher level libraries.

This work also lead us to perform various preliminary measure-

ments and analysis on Common Lisp libraries (compilation and

load time, Declt and Makeinfo run time, dependency graphs, etc.).

As mentioned before, the collected experimental data and their in-

terpretation is publicly available. We think this data could be useful

for other projects, and we already know for a fact that the current

Texinfo maintainers are interested. Only a small part of those re-

sults have been presented in this paper. We are condent the rest

will be extremely useful for future renements. Indeed, there are

still many things that can be done to improve the situation even

more.

7 DISCUSSION & PERSPECTIVES

7.1 Alternative Solution

Yet another, alternative, parallel solution exists, depicted in its en-

tirety in Figure 11. This solution consists in processing the libraries

in parallel, yet, without breaking the Declt / makeinfo chain. Mul-

tiple threads (8 would probably be an appropriate number on our

test machine) pick libraries to process, and sequentially run Declt

followed by Makeinfo on them. As solution 2 (Section 5.2), this

algorithm can be made to work on scenarios 1 and 3 only, or, as

solution 3 (Section 5.3) can be combined with library batches in

order to also work on scenario 2. This is what Figure 11 depicts. In

the future, and mostly out of curiosity, we may experiment with

this solution.

Note however that we don’t expect it to make much dierence

compared to solution 3. In solution 3, we have indeed fewer threads

picking libraries up for Declt processing, but on the other hand,

these threads also return more quickly to the library pool / batch,

since they are not in charge of Makeinfo. In fact, our gut feeling is

Parallelizing ickref ELS’19, April 01–02 2019, Genova, Italy

Library Batch

Batch 1

Batch 2

Declt

Makeinfo

HTML Files

Figure 11: Alternative Solution

Main thread, Declt & Makeinfo threads

that solution 3 may remain slightly better, as it is probably more

gentle on the overall waiting times.

7.2 Dependency Management Issues

Dependency management, required by solution 3, is a relatively

fragile mechanism. Currently, we base our knowledge of depen-

dencies on static information provided by Quicklisp directly, more

specically, the

required-systems

slot from the

ql-dist:system

class. This information is based on Asdf’s

depends-on

defsystem-

depends-on

, and also comes from observing the state of the envi-

ronment before and after loading the library.

The reliability of that information is somehow relative, however.

Any inaccuracy in that information can potentially lead to a cor-

rupted dependency graph, in turn risking unprotected concurrent

compilation. Here is, for example, one such scenario, reported by

the author of Quicklisp. This problem is currently known to aect

a couple of libraries.

Consider systems A and B, where A requires B to build. When

Quicklisp test-builds A, the A prerequisites are built in such a

way that B also successfully builds to satisfy A’s requirements.

But when Quicklisp test-builds B on its own, the environment is

dierent in a way that precludes B from building. In that case,

the metadata in Quicklisp species that A requires B, but B is

not listed at all, because it does not build on its own.

7.3 Library Ordering Renements

In Section 5.3, we introduced the idea of library batches, that is, sets

of libraries the loading of which wouldn’t entail any compilation

conicts, and we mentioned the need to wait for batch exhaustion

before sending in the next one. This requirement, which is a lim-

itation, actually comes from the fact that only static information

(namely, the dependency relations) is used to create the batches in

question.

It is however possible to rene this idea. Indeed, the most perti-

nent information for us is not that library 1 depends on library 2, but

that the compilation of library 2 is over. In other words: concurrent

compilation is problematic; concurrent loading is not.

In order to improve on solution 3, we hence need one additional

piece of (run-time) information: we need to be notied when a Declt

thread has nished processing a library. The renement can then

go as follows. We create the same dependency graph as before, and

also initialize the library queue with the rst batch as before (the

libraries with no dependencies), but this time, without removing

them from the graph. From now on, as soon as a library is done

processing by a Declt thread, we remove it from the graph. Any

new leaf in the graph stemming from that removal can then safely

be pushed immediately to the library queue.

An even better renement would be to not wait for Declt to nish

processing, but only for Asdf to nish compiling (this would require

a communication channel between the main thread and the external

Declt process though). This renement has not been implemented

yet, but it is a high priority, as we expect a somewhat substantial

gain from it. Note also that it can be used in the alternative solution

proposed in Section 7.1.

Currently, our dependency graph is implemented as a hash ta-

ble of adjacency lists[

]: the hash keys are the library names, and

the hash values are the lists of dependencies. Another possible

(and classical) implementation consists in using an adjacency ma-

trix[

], a potentially more compact representation. Whether one

representation would be more benecial than the other is currently

unknown. In particular, more investigation on the dependencies

morphology should be conducted, notably to discover whether the

adjacency matrix would risk being sparse or not (very likely). In

any case, the choice of representation is not expected to have much

impact on the performance, again, because the dependency graph

is relatively small (less than two thousand nodes), in front of at

least 20 minutes of total processing.

7.4 CPU vs. I/O Consumption

While the performance improvements obtained from solution 1 are

to be expected, getting only an improvement factor of 4 in solution

2 or 3 may appear somewhat surprising, even disappointing, espe-

cially since our test machine has 8 virtual cores (4 hyper-threaded

actual cores). Of course, a factor of 8 would be unrealistic. Studies

(from Intel or otherwise) have shown that an improvement of 30%

is not unreasonable to expect from hyper-threading[

]. The

problem we have here is the fact that both Declt and Makeinfo are

treated as monolithic black boxes, so we don’t have any control on

their CPU vs. I/O consumption, an otherwise important aspect of

parallelization.

Declt works in two stages: rst, an abstract in-memory repre-

sentation of the documentation is constructed by introspecting

the library. Next, the Texinfo le is generated from that abstract

representation. The rst stage is CPU-intensive, the second one is

ELS’19, April 01–02 2019, Genova, Italy Didier Verna

−3

−2

−1

0 200 400 600 800 1000 1200 1400 1600

Declt / Makeinfo real time ratios (log)

Libraries

Figure 12: Declt / Makeinfo comparative timings

more pressing on I/O. On top of that, remember that the library

needs to be loaded by Asdf rst, possibly with some compilation.

This will also entail several CPU or I/O intensive phases.

It turns out that Makeinfo works in a similar fashion, at least for

Html production (for Pdf, T

X is used). The rst stage reads the

Texinfo le into an abstract in-memory representation. This stage is

written in C, with a Perl interface. Then, the abstract representation

is altered in various ways, and the Html le is nally created. This

last stage is entirely done in Perl, and (according to the current

Texinfo maintainers) is probably much slower than the previous

ones.

Having no control over these dierent processing phases is un-

fortunate, and is likely to be the cause of the “25% threshold” that

we seem to reach. It may very well happen, for instance, that regard-

less of the number of threads that we have, they all end up in an I/O

phase at the same time, essentially waiting on the same disk, while

subsequent CPU-intensive computation could have been started.

Improving that situation would require cracking the Declt and

Makeinfo “black boxes” open, possibly even the Asdf one, in order

to introduce parallelism at a lower level. Although we already have

some ideas about this, it would be a completely dierent project.

7.5 Scheduling

In addition to the points raised in the previous sections (the idea of

library ordering in particular), the general question of scheduling

could be raise d. For example, we could get inspiration from the

operating system theory, and think of improving things by mini-

mizing the waiting time in the various queues, as the SJF (Shortest

Job First) does in process scheduling[

]. The problem here is to

get a notion of what makes for the complexity (hence the time) of

the various tasks. Very preliminary investigation gives a somewhat

pessimistic impression. For example, the collected experimental

data shows that there is no correlation between Declt and Makeinfo

processing time (see Figure 12). For some libraries, Declt takes more

time than Makeinfo; for some others, it is the other way around,

etc. In the worst case scenario, we would need to run Quickref and

collect the data in question (which we did for this article), and use

it on the next Quicklisp release, hoping that the situation didn’t

change too much.

7.6 SSD Technology

Finally, note that the validity of the present study is highly de-

pendent on the fact that our test machine was equipped with a

traditional, mechanical hard drive. We haven’t had the opportunity

to experiment with SSD (Solid State Drive[

]) technology yet, but

their dramatically quicker access time and lower latency[

] is very

likely to redene the parameters of our study.

ACKNOWLEDGMENTS

The author would like to thank Zach Beane for his feedback on how

Quicklisp handles dependencies across libraries, and Karl Berry,

Gavin Smith, and Patrice Dumas for their insight into the inner

workings of the Texinfo system.

REFERENCES

[1]

Norman Biggs. Algebraic Graph Theory. Cambridge University Press, 1993.

Denition 2.1, p. 7.

[2]

Shawn D. Casey. How to determine the eectiveness of hyper-threading tech-

nology with an application. Intel Technology Journal, 6(1), 2011.

[3]

Thomas Cormen, Charles E. Leiserson, Ronald L. Rivest, and Cliord Stein. In-

troduction to Algorithms. The MIT Press, 2009.

[4]

Neal Ekker, Tom Coughlin, and Jim Handy. An introduction to solid state storage.

SNIA White Paper, January 2009.

[5]

Vamsee Kasavajhala. Solid state drive vs. hard disk drive price and performance

study. Dell Technical White Paper, May 2011.

[6]

D. E. Knuth. The Art of Computer Programming, volume 1. Addison-Wesley, 1968.

Section 2.2.3.

[7]

Deborah T. Marr, Frank Binns, David L. Hill, Glenn Hinton, David A. Koufaty,

Alan J. Miller, and Michael Upton. Hyper-threading technology architecture and

microarchitecture. Intel Technology Journal, 6(1), February 2002.

[8]

Andrew S. Tannenbaum and Herbert Bos. Modern Operating Systems. Pearson,

4th edition, 2014. Section 2.4.

[9]

Ansi. American National Standard: Programming Language – Common Lisp.

ANSI X3.226:1994 (R1999), 1994.

[10]

Didier Verna. Standard i/o syntax, and the robustness principle. https://www.di-

dierverna.net/blog/index.php?post/2017/10/27/Standard-IO-syntax-and-the-

Robustness-Principle, November 2017. Blog Entry.

[11]

Duc Vianney. Hyper-threading speeds linux. https://www.ibm.com/

developerworks/library/l-htl/index.html, 2003.