The ickref Cohort
Didier Verna
EPITA, LRE
Le Kremlin-Bicêtre, France
didier@lrde.epita.fr
ABSTRACT
e internal architecture of Declt, our reference manual generator
for Common Lisp libraries, is currently evolving towards a three-
stage pipeline in which the information gathered for documentation
purposes is rst reied into a formalized set of object-oriented data
structures. A side-eect of this evolution is the ability to dump that
information for purposes other than documentation. We demon-
strate this ability applied to the complete icklisp ecosystem. e
resulting “cohort” includes more than half a million programmatic
denitions, and can be used to gain insight into the morphology of
Common Lisp soware.
CCS CONCEPTS
Information systems
Information extraction; Presenta-
tion of retrieval results; Soware and its engineering
Soware libraries and repositories.
KEYWORDS
Information Extraction, Soware Analysis, Morphological Statistics
ACM Reference Format:
Didier Verna. 2024. e ickref Cohort. In Proceedings of the 17th European
Lisp Symposium (ELS’24). ACM, New York, NY, USA, 4 pages. https://doi.
org/10.5281/zenodo.10947962
1 INTRODUCTION
Cohort: a group of individuals having a statistical
factor (such as age or class membership) in common in
a demographic study.
e Meriam-Webster Dictionary
a
, denition 2.b.
a
https://www.merriam-webster.com/dictionary/cohort
1.1 Context
Declt is a reference manual generator for Common Lisp libraries.
e project started in 2010, leading to a rst stable release in 2013 [
2
].
Four years later, the ickref project was born [
1
,
4
6
] (at the time,
Declt was at version 2.3 [
3
]). ickref runs Declt over the whole
icklisp
1
repository and oers a website
2
, currently aggregat-
ing more than two thousand reference manuals for Common Lisp
libraries.
1
https://www.quicklisp.org/
2
https://quickref.common-lisp.net
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).
ELS’24, May 6–7 2024, Vienna, Austria
© 2024 Copyright held by the owner/author(s).
ACM ISBN 978-2-9557474-8-3
https://doi.org/10.5281/zenodo.10947962
Declt runs by loading an ASDF system into memory and intro-
specting its contents. Because it is unrealistic to load the complete
set of icklisp libraries into a single Lisp environment, icklisp
runs Declt as a separate process for each library. e unfortunate
consequence is that the information gathered by Declt is not di-
rectly available to the ickref instance. Under those conditions, it
remains easy to build a library index (by sorting the listing of the
generated reference manuals directory), but it is for instance less
straightforward to build an author index, as the author information,
extracted from each ASDF system, needs to survive each and every
Declt run.
Originally, Declt was designed to generate reference manuals
in GNU Texinfo
3
, an intermediate format suitable for soware
documentation, which can in turn be converted into a number
of user-readable ones such as HTML, PDF, etc. Hence its name:
Documentation Extractor from Common Lisp to Texinfo
Over the years, there has been some pressure to extend Declt’s
rendering capabilities to other output formats (including HTML
without the Texinfo intermediary). is led to an architecture over-
haul, which is ongoing.
1.2 e Declt Pipeline
e goal is to implement Declt as a three-stage pipeline, as depicted
in Figure 1. Declt’s historical entry point, the
declt
function, trig-
gers the whole pipeline, but for a more advanced usage, each stage
of the pipeline is meant to be accessible separately and directly via
its own entry point function.
(1)
e rst stage of the pipeline is called the assessment stage.
At this stage, Declt loads the library and introspects the Lisp
environment in order to extract the pertinent information.
is information is stored in a so-called report.
(2)
e second stage of the pipeline is called the assembly stage.
At this stage, Declt organizes the information provided by a
report in a specic way. e result is called a script. A script
begins to look like a properly organized reference manual,
but is still independent from the nal output format.
(3)
Finally, the third stage of the pipeline is called the typeset-
ting stage. At this stage, Declt renders a script to a le by
typeseing its contents in a specic documentation format.
In 2022, we released version 4.0b1 of Declt, marking the achieve-
ment of stage 1 of the pipeline [
7
]. Declt now provides a function
called
assess
, which takes an ASDF system name as argument,
loads the corresponding library, introspects it, and creates the re-
port. e rest of the pipeline, which is not yet implemented, is
wrapped in a temporary function called
declt-1
, going directly
from a report to a Texinfo le.
3
https://www.gnu.org/software/texinfo/
ELS’24, May 6–7 2024, Vienna, Austria Didier Verna
System
Assessment
Assembly
Typeseing
Manual
report
(declt system ...)
(assess system ...)
(declt-1 report ...)
Figure 1: e Declt Pipeline
e rst direct benet of this evolution is the ability for ickref
to build an author index le in a much simpler and robust way.
Instead of calling the global
declt
function, ickref now triggers
Declt in two steps. First, it calls the
assess
function to get a handle
on the generated report, and then continues with
declt-1
. In the
meantime however, the library’s contact information is extracted
from the report and dumped into a specic le. Once ickref has
nished processing the full set of icklisp libraries, it loads back
all the contact information for all the libraries to create the index.
e funny thing is that once this was implemented, it quickly
occurred to us that Declt reports, now in a stable format, could be
fully dumped into les and used for all sorts of purposes other than
documentation. In fact, it is relatively easy to “hijack” the ickref
infrastructure in order to dump Declt reports for the whole set of
icklisp libraries, eectively creating a cohort of programmatic
denitions.
In the following sections, we describe a preliminary cohort im-
plementation, which currently contains more than half a million
entities, and is already publicly accessible. Additionally, we show
how such a cohort can be used to gain insight into the current
shape of Lisp soware.
2 DECLT REPORTS
A Declt report is a data structure containing general information
about a library (authors, license, copyright, etc.), and a at list of
the discovered ASDF components and programmatic denitions
(packages, variables, functions, classes, etc.).
2.1 Denitions
Denitions are themselves reied in an object-oriented fashion
which is described in the Declt User Manual
4
. An excerpt of the
denitions hierarchy is given in Figure 2.
For documentation purposes, the information provided by each
kind of denition is as exhaustive as introspection permits. Most of
them point back to the original Lisp object, can access the object’s
docstring if any, etc. On top of that, the assessment stage nalizes a
library’s denitions list by constructing an extensive set of cross-
references (denitions pointing to denitions) that will eventually
lead to internal hyperlinks in the generated reference manual.
For example, a generic function denition contains a list of
method denitions (not raw method objects; pointers to the corre-
sponding method denitions), a reference to it’s method combina-
tion denition, but also a reference to a setf expander using this
function for access, and a list of (short form) setf expanders using
this function for update, if applicable.
4
https://www.lrde.epita.fr/~didier/software/lisp/declt/bibliography/
denition
symbol
ASDF
package
funcoid
varoids
classoids
setfable-funcoid
methods
method combinations
function
compiler macros
types
setf expanders
macro
ordinary-function generic-function
Figure 2: Denitions Hierarchy Excerpt
2.2 Dumping
As mentioned before, Declt reports were originally used by ickref
only to dump library author information, so as to build an author in-
dex aerwards. When the idea of a full cohort emerged, we decided
to evaluate the potential usefulness of the idea by rst creating a
quick cohort prototype.
To this aim, the current prototype only dumps an incomplete and
simplied version of Declt reports, that is, without performing true
serialization. Pointers to the original Lisp objects can of course not
be preserved in the dump. Cross-references between denitions are
not currently preserved either, and only a few interesting aributes
of each denition kind are retained, with some amount of pre-
processing for subsequent statistical analysis.
Figure 3 provides an excerpt from the dump of Declt’s own re-
port. e contents should be mostly self-explanatory. Programmatic
denitions start by a keyword denoting the denition kind, and
name. Docstrings are replaced by their length, and cross-references
by their number.
Such a simple dump already provides enough information to
perform all sorts of interesting morphological studies on the 2000+
The ickref Cohort ELS’24, May 6–7 2024, Vienna, Austria
("net.didierverna.declt"
(:CONTACTS 1)
...
(:SYSTEM "net.didierverna.declt.assess"
:DOCSTRING 44 :DEPENDENCIES 2 :CHILDREN 2
:DEFSYSTEM-DEPENDENCIES 0)
...
(:PACKAGE "NET.DIDIERVERNA.DECLT.ASSESS"
:DOCSTRING 39
:EXTERNAL-SYMBOLS 169 :INTERNAL-SYMBOLS 119
:USE-LIST 2 :USED-BY-LIST 1)
...
(:CLASS "GENERIC-FUNCTION-DEFINITION"
:DOCSTRING 154
:DIRECT-SUPERCLASSES 1 :DIRECT-SUBCLASSES 1
:DIRECT-METHODS 11
:DIRECT-SLOTS 3)
...
(:GENERIC-FUNCTION "DOCUMENT"
:DOCSTRING 45 :MEHTODS 39)
...)
Figure 3: Declt Dump Excerpt
libraries available in icklisp, as will be exemplied in the next
section.
3 QUICKREF COHORT ANALYSIS
e current (beta) version of ickref dumps Declt reports, as
shown in the previous section, for every icklisp library. e
resulting cohort (containing more than half a million programmatic
denitions) is available for download from the website
5
. In order
to demonstrate its potential usefulness, ickref also performs a
number of example statistical computations on the cohort, and
generates subsequent plots, also visible on the website. Some of
them are reproduced below.
3.1 Symbols Morphology
Figure 4 presents the histogram of symbol names lengths in ick-
lisp, showing a peak at 11 characters, but also going as far as 135
characters for a single symbol name. Two other plots, not included
in this article but visible on the website, show that most composed
symbols have a cardinality (the number of
com-po-nents
) of 1, 2,
or 3. e longest symbol appears to have 13 components. Most
symbol components are 4 characters long, although one symbol
(with a cardinality of 2) has a 126 characters long component. In
fact, it is the very same symbol that is 135 characters long in total.
3.2 Documentation Shape
Another interesting area of investigation is the current state of
Lisp documentation. Figure 5 shows the percentage of documented
denitions per kind. For most types of programmatic entities, only
20 to 40% get a docstring. Slightly above this range are method
combinations: half of them seem documented. On the other hand,
5
https://quickref.common-lisp.net/cohort/
0
5000
10000
15000
20000
25000
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95100105110115120125130135
Symbol Names Lengths
Symbols number
Symbol name length
Figure 4: Symbol Names Lengths Histogram
0
20
40
60
80
100
Documentation percentages
Constant
Special
Ordinary Function
Macro
Generic Function
Method
Method Combination
Class
Slot
Condition
Setf Expander
Compiler Macro
Type
Documented denitions percentage
Programmatic denitions
Figure 5: Documentation Percentages
Lisp programmers seem to disregard the documentation capabilities
of methods, slots, setf expanders and compiler macros.
3.3 Classoid Proles
As a nal example of cohort analysis, Figure 6 presents the average
number of direct slots, methods, parents, and children for structures,
classes, and conditions. e most striking element in this plot is the
average number of direct methods on classes, a lile more than 6,
which is much higher than on structures or conditions. It also seems
that the multiple inheritance capability of classes and conditions
is not used extensively, as the average number of parents remains
only slightly above 1 (of course, it is exactly 1 for structures). Finally,
we can see that the average number of direct slots is signicantly
higher in structures than in classes (and even more so in conditions),
probably because of slot inheritance. Indeed, we can also see, by
looking at the average number of children, that subclassing is more
frequent than “substructuring”.
ELS’24, May 6–7 2024, Vienna, Austria Didier Verna
0
1
2
3
4
5
6
7
8
Structures Classes Conditions
Slots
Methods
Parents
Children
Slots
Methods
Parents
Children
Slots
Methods
Parents
Children
Average number of direct
Figure 6: Aggregative Data Structure Averages
4 PERSPECTIVES
e ickref cohort is currently a proof of concept, but we hope
that the existence of a free database of more that half a million
programmatic (and ASDF) entities will trigger some interest. Sec-
tion 3 provided a glimpse at what can be done with it in terms of
statistical analysis, but we’re eager to hear about other potential
use cases.
e cohort is essentially a collection of Declt reports, presented
one way or another. Because of that, it makes sense to equip Declt
itself with some cohort manipulation ability. For example, it could
be interesting for a Lisp programmer to analyze their own (and
only their own) library / libraries in a way similar to what was
described in Section 3. We denitely are interested in doing so. We
plan on extending Declt along these lines in a near future. In such
a case, Declt could even manipulate actual reports (Lisp objects)
rather than their dumped form.
In order to make the whole ickref cohort truly usable, the
next step is to stabilize the format used for dumping Declt reports.
Contrary to the current format illustrated in section 2.2, Declt
reports should be preserved as much as possible in order to not
impose any limit on potential applications. In particular, no pre-
computation should be performed prior to dumping and cross-
references between denitions should be preserved.
On the other hand, some parts of the reports need not (in fact,
should not) be preserved in the dump. We want the ability to ma-
nipulate reports without the corresponding libraries being loaded
in memory. is means that the actual Lisp objects correspond-
ing to each denition (whether programmatic or ASDF) should be
excluded from the dump.
All in all, it seems that what we are talking about here is some
kind of serialization, a topic on which we currently have no ex-
perience. Consequently, we’re eager to get some advice on that
maer.
REFERENCES
[1]
Antoine Hacquard and Didier Verna. A corpus processing and analysis pipeline
for ickref. In 14th European Lisp Symposium, pages 27–35, Online, May 2021.
ISBN 9782955747452. doi: 10.5281/zenodo.4714443.
[2]
Didier Verna. Declt 1.0 is out. https://www.didierverna.net/blog/index.php?post/
2013/08/24/Declt-1.0-is-out, August 2013. Blog entry.
[3]
Didier Verna. Declt 2.3 ”Robert April” is out. https://www.didierverna.net/blog/
index.php?post/2017/10/16/Declt-2.2-Christopher-Pike-is-out, October 2017. Blog
entry.
[4]
Didier Verna. Announcing ickref: a global documentation project for Common
Lisp. https://www.didierverna.net/blog/index.php?post/2017/12/13/Announcing-
Quickref%3A-a-global-documentation-project-for-Common-Lisp, December 2017.
Blog entry.
[5]
Didier Verna. Parallelizing ickref. In 12th European Lisp Symposium, pages
89–96, Genova, Italy, April 2019. ISBN 9782955747438. doi: 10.5281/zenodo.
2632534.
[6]
Didier Verna. ickref: Common Lisp reference documentation as a stress test for
Texinfo. In Barbara Beeton and Karl Berry, editors, TUGboat, volume 40, pages
119–125. T
E
X Users Group, T
E
X Users Group, September 2019.
[7]
Didier Verna. Declt 4.0 beta 1 ”William Riker” is released. https://www.didierverna.
net/blog/index.php?post/2022/05/10/Declt-4.0-beta-1-William-Riker-is-released,
May 2022. Blog entry.