Programmatic Manipulation of Common Lisp Type
Specifiers
Jim E. Newton
jnewton@lrde.epita.fr
Didier Verna
didier@lrde.epita.fr
Maximilien Colange
maximilien.colange@lrde.epita.fr
EPITA/LRDE
14-16 rue Voltaire
F-94270 Le Kremlin-Bicêtre
France
ABSTRACT
In this article we contrast the use of the s-expression with the
BDD (Binary Decision Diagram) as a data structure for pro-
grammatically manipulating Common Lisp type specifiers.
The s-expression is the de facto standard surface syntax and
also programmatic representation of the type specifier, but
the BDD data structure offers advantages: most notably,
type equivalence checks using s-expressions can be computa-
tionally intensive, whereas the type equivalence check using
BDDs is a check for object identity. As an implementation
and performance experiment, we define the notion of max-
imal disjoint type decomposition, and discuss implementa-
tions of algorithms to compute it: a brute force iteration,
and as a tree reduction. The experimental implementations
represent type specifiers by both aforementioned data struc-
tures, and we compare the performance observed in each
approach.
CCS Concepts
Theory of computation Ñ Data structures design
and analysis; Type theory; Computing methodologies
Ñ Representation of Boolean functions; Mathematics
of computing Ñ Graph algorithms;
1. INTRODUCTION
Common Lisp programs which manipulate type specifiers
have traditionally used s-expressions as the programmatic
representations of types, as described in the Common Lisp
specification [4, Section 4.2.3]. Such choice of internal data
structure offers advantages such as homoiconicity, making
the internal representation human readable in simple cases,
and making programmatic manipulation intuitive, as well
as enabling the direct use of built-in Common Lisp func-
tions such as typep and subtypep. However, this approach
does present some challenges. Such programs often make
use of ad-hoc logic reducers—attempting to convert types
ELS ’10 April 3–4, 2016, Brussels, Belgium
ACM ISBN 978-1-4503-2138-9.
DOI:
to canonical form. These reducers can be complicated and
difficult to debug. In addition run-time decisions about type
equivalence and subtyping can suffer performance problems.
In this article we present an alternative internal represen-
tation for Common Lisp types: the Binary Decision Dia-
gram (BDD) [6, 2]. BDDs have interesting characteristics
such as representational equality; i.e. it can be arranged
that equivalent expressions or equivalent sub-expressions are
represented by the same object (eq). While techniques to
implement BDDs with these properties are well documented,
an attempt apply the techniques directly to the Common
Lisp type system encounters obstacles which we analyze and
document in this article.
In order to compare performance characteristics of the two
data structure approaches, we have constructed a problem
called Maximal Disjoint Type Decomposition (MDTD): de-
composing a given set of potentially overlapping types into
a set of disjoint types. Although MDTD is interesting in its
own right, we do not attempt, in this paper, to motivate in
detail the applications or implications of the problem. We
consider such development and motivation a matter of fu-
ture research. Our use of the MDTD problem in this article
is primarily a performance comparison vehicle.
We present two algorithms to compute the MDTD, and
separately implement the algorithms with both data struc-
tures s-expressions and BDDs (4 implementations in total).
Finally, we report performance characteristics of the four
algorithms implemented in Common Lisp.
Key contributions of this article are:
A description of how to extended known BDD related
implementation techniques to represent Common Lisp
types and facility type based calculations.
Performance comparison of algorithms using traditional
s-expression based type specifiers vs. using the BDD
data structure.
A graph based algorithm for reducing the computa-
tional complexity of MDTD.
2. DISJOINT TYPE DECOMPOSITION
In presenting the problem of decomposing a set of over-
lapping types into non-overlapping subtypes, we start with
an example intended to convey an intuition of the problem.
We continue by defining precisely what we intend to calcu-
late. Then in sections 2.1 and 2.2 we present two different
algorithms for performing that calculation.
A
1
1
A
2
2
A
3
3
4
5
6
7
A
4
8
A
5
9
A
6
10
A
7
11
A
8
12
13
Figure 1: Example Venn Diagram
Disjoint Set Derived Expression
X
1
A
1
X A
2
X A
3
X A
4
X A
6
X A
8
X
2
A
2
X A
3
X A
4
X
3
A
2
X A
3
X A
4
X
4
A
3
X A
2
X A
4
X
5
A
2
X A
3
X A
4
X
6
A
2
X A
4
X A
3
X
7
A
3
X A
4
X A
2
X
8
A
4
X A
2
X A
3
X A
8
X
9
A
5
X
10
A
6
X
11
A
7
X
12
A
8
X A
4
X
13
A
4
X A
8
X A
5
Figure 2: Disjoint Decomposition of Sets from Fig-
ure 1
In the Venn diagram in Figure 1, V tA
1
, A
2
, ..., A
8
u. We
wish to construct logical combinations of those sets to form
as many mutually disjoint subsets as possible. The result-
ing decomposition should have the same union as the original
set. The maximal disjoint decomposition D tX
1
, X
2
, ..., X
13
u
of V is shown in Figure 2.
Notation 1. We use the symbol, K, to indicate the dis-
joint relation between sets. I.e., we take A K B to mean
A X B H. We also say A M B to mean A X B H.
Notation 2. We use the notation, A Ă B, (A Ą B) to
indicate that A is either a strict subset (superset) of B or is
equal to B.
Definition 1. Let U be a set and V be a set of subsets of
U. The Boolean closure of V , denoted
p
V , is the (smallest)
super-set of V such that α, β P
p
V ùñ tα X β, α X βu Ă
p
V .
Definition 2. Let U be a set, and let V and D be finite
sets of non-empty subsets of U . D is said to be a disjoint
decomposition of V , if the elements of D are mutually dis-
joint, D Ă
p
V , and
Ť
XPD
X
Ť
APV
A. If no larger set ful-
fills those properties, D is said to be the maximal disjoint
decomposition of V .
We claim without proof that there exists a unique maxi-
mal disjoint decomposition of a given V . A more complete
discussion and formal proof are available [12].
The MDTD problem: Given a set U and a set of sub-
sets thereof, V tA
1
, A
2
, ..., A
M
u, suppose that for each
pair pA
i
, A
j
q, we know which of the relations hold: A
i
Ă A
j
,
A
i
Ą A
j
, A
i
K A
j
. We would like to compute the maximal
disjoint decomposition of V .
In Common Lisp, a type is a set of (potential) values [4,
Section Type], so it makes sense to consider the maximal
disjoint decomposition of a set of types.
2.1 The RTE Algorithm
We first encountered the MDTD problem in our previous
work on regular type expressions (RTE) [13]. The following
algorithm was the one presented in that paper, where we
pointed that the algorithm suffers from significant perfor-
mance issues. Performance issues aside, a notable feature of
the RTE version of the MDTD algorithm is that it easily fits
in 40 lines of Common Lisp code, so it is easy to implement
and easy to understand.
1. Let U be the set of sets. Let V denote the set of
disjoint sets, initially D H.
2. Identify all the sets which are disjoint from each other
and from all the other sets. (Opn
2
q search) Remove
these sets from U and collect them in D.
3. If possible, choose X and Y , for which X M Y .
4. Remove X and Y from U , and add any of X XY , XzY ,
and Y zX which are non-empty. I.e.,
U Ð pUztX, Y uq Y ptX X Y, XzY, Y zXuztHuq
5. Repeat steps 2 through 4 until U H, at which point
we have collected all the disjoint sets in D.
2.2 The graph based algorithm
One of the sources of inefficiency of the algorithm ex-
plained in Section 2.1 is at each iteration of the loop, an
Opn
2
q search is made to find sets which are disjoint from
all remaining sets. This search can be partially obviated
if we employ a little extra book-keeping. The fact to real-
ize is that if X K A and X K B, then we know a priori
that X K A X B, X K AzB, X K BzA. This knowledge
eliminates some of useless operations.
This algorithm is semantically similar to the algorithm
shown in Section 2.1, but rather than relying on Common
Lisp primitives to make decisions about connectivity of types,
it initializes a graph representing the initial relationships,
and thereafter manipulates the graph maintaining connec-
tivity information. This algorithm is more complicated in
terms of lines of code, 250 lines of Common Lisp code as
opposed to 40 lines for the algorithm in Section 2.1.
Figure 3 shows a graph representing the topology (con-
nectedness) of the diagram shown in Figure 1. Nodes
1
,
2
,
...
8
in Figure 3 correspond respective to A
1
, A
2
, ... A
8
in
Figure 1. Blue arrows correspond to subset relations, point-
ing from subset to superset, and green lines correspond to
other non-disjoint relations.
To construct this graph first eliminate duplicate sets. I.e.,
if X Ă Y and X Ą Y , then discard either X or Y . It is
necessary to consider each pair pX, Y q of sets, Opn
2
q loop.
7 2
3
4
1
8
5
6
Figure 3: Topology graph
If X Ă Y , draw a blue arrow X Ñ Y
Else if X Ą Y , draw a blue arrow X Ð Y
Else if X M Y , draw green line between X and Y .
If it cannot be determined whether X Ă Y , assume
the worst case, that they are non-disjoint, and draw
green line between X and Y .
The algorithm proceeds by breaking the green and blue
connections, in explicit ways until all the nodes become iso-
lated. There are two cases to consider. Repeat alternatively
applying both tests until all the nodes become isolated.
2.2.1 Subset relation
Before
5 0 Y 1
2
4 X 3
After
5 0 Y 1
2
4 X 3
Node Re-labeled Boolean expression
X X
Y Y X X
Figure 4: Subset before and after mutation
A blue arrow from X to Y may be eliminated if X has no
blue arrow pointing to it, in which case Y must be relabeled
as Y X X as indicated in Figure 4.
Figure 4 illustrates this mutation. Node
Y
may have
other connections, including blue arrows pointing to it or
from it, and green lines connected to it. However node
X
has no blue arrows pointing to it; although it may have other
blue arrows pointing away from it.
If X touches (via a green line) any sibling nodes, i.e. any
other node that shares Y as super-class, then the blue arrow
is converted to a green line. In the before image of Figure 4
there is a blue arrow from
3
to
Y
and in the after image
this arrow has been converted to a green line.
Before
0 X Y
3
1
2
4
After
0 X
3
Y 1
2
Z
4
Node Re-labeled Boolean expression
X X X Y
Y X X Y
Z X X Y
Figure 5: Touching connections before and after mu-
tation
2.2.2 Touching connections
A green line connecting X and Y may be eliminated if
neither X nor Y has a blue arrow pointing to it. Conse-
quently, X and Y must be relabeled and a new node must
be added to the graph as indicated in Figure 5. The figure
illustrates the step of breaking such a connection between
nodes
X
and
Y
by introducing the node
Z
.
Construct blue arrows from this node, Z, to all the nodes
which either X or Y points to (union). Construct green
lines from Z to all nodes which both X and Y connect to
(intersection). If this process results in two nodes connected
both by green and blue, omit the green line.
3. TYPE SPECIFIER MANIPULATION
To correctly implement the MDTD by either strategy de-
scribed above, we need operators to test for type-equality,
type disjoint-ness, subtype-ness, and type-emptiness. Given
a subtype predicate, the other predicates can be constructed.
The emptiness check: A H ðñ A Ă H. The dis-
joint check: A K B ðñ A X B Ă H. Type equivalence
A B ðñ A Ă B and B Ă A.
Common Lisp has a flexible type calculus making type
specifiers human readable and also related computation pos-
sible. Even with certain limitations, s-expressions are an
intuitive data structure for programmatic manipulation of
type specifiers in analyzing and reasoning about types.
If T1 and T2 are Common Lisp type specifiers, the type
specifier (and T1 T2) designates the intersection of the types.
Likewise (and T1 (not T2)) is the type difference. The empty
type and the universal type are designated by nil and t re-
spectively. The subtypep function serves as the subtype
predicate. Consequently (subtypep ’(and T1 T2) nil) com-
putes whether T1 and T2 are disjoint.
There is an important caveat however. The subtypep
function is not always able to determine whether the named
types have a subtype relationship [5]. In such a case, subtypep
returns nil as its second value. This situation occurs most
notably in the cases involving the satisfies type specifier.
For example, to determine whether the (satisfies F) type
is empty, it would be necessary to solve the halting problem,
finding values for which the function F returns true.
As a simple example of how the Common Lisp program-
mer might manipulate s-expression based type specifiers,
consider the following problem. In SBCL 1.3.0, the expres-
sion (subtypep ’(member :x :y) ’keyword) returns nil,nil,
rather than t,t. Although this is compliant behavior, the
result is unsatisfying, because clearly both :x and :y are
elements of the keyword type. By manipulating the type
specifier s-expressions, the user can implement a smarter
version of subtypep to better handle this particular case.
Regrettably, the user cannot force the system to use this
smarter version internally.
( de fun s m a r t e r - s ubtypep (t1 t2 )
( m u ltiple-value-bind ( T1 <= T2 OK ) ( subty p ep t1 t2 )
( cond
( OK
( val ues T1 <= T2 t ))
;; ( eql obj ) or ( me mber obj1 ... )
(( t ype p t1 ’( c ons (mem ber eql me mber )))
( val ues ( ever y #( lamb da ( obj )
( ty pep obj t2 ))
( cdr t1 ))
t ))
(t
( val ues nil nil )))))
As mentioned above, programs manipulating s-expression
based type specifiers can easily compose type intersections,
unions, and relative complements as part of reasoning algo-
rithms. Consequently, the resulting programmatically com-
puted type specifiers may become deeply nested, resulting
in type specifiers which may be confusing in terms of hu-
man readability and debuggability. The following program-
matically generated type specifier is perfectly reasonable for
programmatic use, but confusing if it appears in an error
message, or if the developer encounters it while debugging.
( or
( or ( and (and num b er ( not bi gnum ))
( not (or fix num ( or bit ( eql -1 ) ))))
( and (and ( and numb er ( not bi gnum ))
( not (or fix num ( or bit ( eql -1 )))))
( not (or fix num ( or bit ( eql -1 ) )))))
( and (and ( and numb er ( not bi gnum ))
( not (or fix num ( or bit ( eql -1 ) ))))
( not (or fix num ( or bit ( eql -1 ) )))))
This somewhat obfuscated type specifier is semantically
equivalent to the more humanly readable form (and number
(not bignum) (not fixnum)). Moreover, it is possible to write
a Common Lisp function to simplify many complex type
specifiers to simpler form.
There is a second reason apart from human readability
which motivates reduction of type specifiers to canonical
form. The problem arises when we wish to programmati-
cally determine whether two s-expressions specify the same
type, or in particular when a given type specifier specifies
the nil type. Sometimes this question can be answered by
calls to subtypep as in (and (subtypep T1 T2) (subtypep
T2 T1)). However, as mentioned earlier, subtypep is al-
lowed to return nil,nil in some situations, rendering this
approach futile in many cases. If, on the other hand, two
type specifiers can be reduced to the same canonical form,
we can conclude that the specified types are equal.
We have implemented such a function, reduce-lisp-type.
It does a good job of reducing the given type specifier toward
a canonical form, by repeatedly recursively descending the
expression, re-writing sub-expressions, incrementally mov-
ing the expression toward a fixed point. We choose to con-
vert the expression to a disjunctive normal form, e.g., (or
(and (not a) b) (and a b (not c))). The reduction proce-
dure follows the models presented by Sussman and Abel-
son [1, p. 108] and Norvig [14, ch. 8].
4. BINARY DECISION DIAGRAMS
A challenge using s-expressions for programmatic repre-
sentation of type specifiers is the need to after-the-fact re-
duce complex type specifiers to a canonical form. This re-
duction can be computationally intense, and difficult to im-
plement correctly. We present here a data structure called
the Binary Decision Diagram (BDD) [6, 2], which obviates
much of the need to reduce to canonical form because it
maintains a canonical form by design. Before looking at
how the BDD can be used to represent Common Lisp type
specifiers, we first look at how BDDs are used tradition-
ally to represent Boolean equations. Thereafter, we explain
how this traditional treatment can be enhanced to represent
Common Lisp types.
4.1 Representing Boolean expressions
Andersen [3] summarized many of the algorithms for ef-
ficiently manipulating BDDs. Not least important in An-
dersen’s discussion is how to use a hash table and dedicated
constructor function to eliminate redundancy within a single
BDD and within an interrelated set of BDDs. The result of
Andersen’s approach is that if you attempt to construct two
BDDs to represent two semantically equivalent but syntac-
tically different Boolean expressions, then the two resulting
BDDs are pointers to the same object.
A1
A2
1 A3
1 0
A3
0 1
Figure 6: BDD for pA
1
^A
2
q_pA
1
^ A
2
^A
3
q_p A
1
^ A
3
q
Figure 6 shows an example BDD illustrating a function of
three Boolean variables: A
1
, A
2
, and A
3
. To reconstruct the
DNF (disjunctive normal form), collect the paths from the
root node, A
1
, to a leaf node of 1, ignoring paths terminated
by 0. When the right child is traversed, the Boolean com-
plement ( ) of the label on the node is collected (e.g. A
3
),
and when the left child is traversed the non-inverted parent
is collected. Interpret each path as a conjunctive clause, and
form a disjunction of the conjunctive clauses. In the figure
the three paths from A
1
to 1 identify the three conjunctive
clauses pA
1
^ A
2
q, pA
1
^ A
2
^ A
3
q, and p A
1
^ A
3
q.
4.2 Representing types
Castagna [7] explains the connection of BDDs to type
theoretical calculations, and provides straightforward algo-
rithms for implementing set operations (intersection, union,
relative complement) of types using BDDs. The general re-
cursive algorithms for computing the BDDs which represent
the common Boolean algebra operators are straightforward.
Let B, B
1
, and B
2
denote BDDs, B
1
pif a
1
C
1
D
1
q and
B
2
pif a
2
C
2
D
2
q.
C
1
, C
2
, D
1
, and D
2
represent BDDs. The a
1
and a
2
are
intended to represent type names, but for the definition to
work it is only necessary that they represent labels which
are order-able. We would eventually like the labels to ac-
commodate Common Lisp type type names, but this is not
immediately possible.
The formulas for pB
1
_ B
2
q, pB
1
^ B
2
q, and pB
1
z B
2
q are
similar to each other. If ˝ P t_, ^, zu, then
B
1
˝ B
2
$
&
%
pif a
1
pC
1
˝ C
2
q pD
1
˝ D
2
qq for a
1
a
2
pif a
1
pC
1
˝ B
2
q pD
1
˝ B
2
qq for a
1
ă a
2
pif a
2
pB
1
˝ C
2
q pB
1
˝ D
2
qq for a
1
ą a
2
There are several special cases, the first three of which
serve as termination conditions for the recursive algorithms.
pt _ Bq and pB _ tq reduce to t.
pnil ^ Bq, pB ^ nilq, and pBztq reduce to nil.
pt ^ Bq, pB ^ tq, pnil _ Bq, and pB _ nilq reduce to B.
pt z pif a B
1
B
2
qq reduces to pif a ptzB
1
q ptzB
2
qq.
4.3 Representing Common Lisp types
We have implemented the BDD data structure as a set of
Clos classes. In particular, there is one leaf-level Clos class
for an internal tree node, and one singleton class/instance
for each of the two possible leaf nodes, true and false.
The label of the BDD contains a Common Lisp type name,
and the logical combinators (and, or, and not) are repre-
sented implicitly in the structure of the BDD.
A disadvantage BDDs present when compared to s-expressions
as presented in Section 3 is the loss of homoiconicity. Whereas,
s-expression based type-specifiers may appear in-line in the
Common Lisp code, BDDs may not.
A remarkable fact about this representation is that any
two logically equivalent Boolean expressions have exactly
the same BDD structural representation, provided the node
labels are consistently, totally ordered. Andersen[3] provides
a proof for this claim. For example, the expression from Fig-
ure 6, pA
1
^A
2
q_pA
1
^ A
2
^A
3
q_p A
1
^ A
3
q is equivalent
to pp A
1
_ A
2
q^p A
1
_A
2
_ A
3
q^pA
1
_A
3
qq. So they
both have the same shape as shown in the Figure 6. How-
ever, if we naïvely substitute Common Lisp type names for
Boolean variables in the BDD representation as suggested
by Castagna, we find that this equivalence relation does not
hold in many cases related to subtype relations in the Com-
mon Lisp type system.
An example is that the Common Lisp two types (and
(not arithmetic-error) array (not base-string)) vs.
(and array (not base-string)) are equivalent, but the
naïvely constructed BDDs are different:
arithmetic-error
nil array
base-string
nil t
nil
vs.
array
base-string
nil t
nil
.
In order to assure the minimum number of BDD alloca-
tions possible, and thus ensure that BDDs which represent
equivalent types are actually represented by the same BDD,
the suggestion by Andersen [3] is to intercept the BDD con-
structor function. This constructor should assure that it
never returns two BDD which are semantically equivalent
but not eq.
4.4 Canonicalization
Several checks are in place to reduce the total number
of BDDs allocated, and to help assure that two equivalent
Common Lisp types result in the same BDD. The following
sections, 4.4.1 through 4.4.5 detail the operations which we
found necessary to handle in the BDD construction function
in order to assure that equivalent Common Lisp type spec-
ifiers result in identical BDDs. The first two come directly
from Andersen’s work. The remaining are our contribution,
and are the cases we found necessary to implement in order
to enhance BDDs to be compatible with the Common Lisp
type system.
4.4.1 Equal right and left children
An optimization noted by Andersen is that if the left and
right children are identical then simply return one of them,
without allocating a new BDD [3].
4.4.2 Caching BDDs
Another optimization noted by Andersen is that whenever
a new BDD is allocated, an entry is made into a hash table so
that the next time a request is made with the exactly same
label, left child, and right child, the already allocated BDD is
returned. We associate each new BDD with a unique integer,
and create a hash key which is a list (a triple) of the type
specifier (the label) followed by two integers corresponding
to the left and right children. We use a Common Lisp equal
hash table for this storage, although we’d like to investigate
whether creating a more specific hash function specific to
our key might be more efficient.
4.4.3 Reduction in the presence of subtypes
Since the nodes of the BDD represent Common Lisp types,
other specific optimizations are made. The cases include sit-
uations where types are related to each other in certain ways:
subtype, supertype, and disjoint types. In particular there
are 12 optimization cases, detailed in Table 1. Each of these
optimizations follows a similar pattern: when constructing
a BDD with label X, search in either the left or right child
to find a BDD,
Y
L R
. If X and Y have a particular rela-
tion, different for each of the 12 cases, then the
Y
L R
BDD
reduces either to L or R. Two cases, 5 and 7, are further
illustrated below.
Case Child to search Relation Reduction
1 X.lef t X K Y Y Ñ Y.right
2 X.lef t X K Y Y Ñ Y.lef t
3 X.rig ht X K Y Y Ñ Y.right
4 X.rig ht X K Y Y Ñ Y.lef t
5 X.rig ht X Ą Y Y Ñ Y.right
6 X.rig ht X Ą Y Y Ñ Y.lef t
7 X.lef t X Ą Y Y Ñ Y.right
8 X.lef t X Ą Y Y Ñ Y.lef t
9 X.lef t X Ă Y Y Ñ Y.lef t
10 X.lef t X Ă Y Y Ñ Y.right
11 X.rig ht X Ă Y Y Ñ Y.lef t
12 X.rig ht X Ă Y Y Ñ Y.right
Table 1: BDD optimizations
Case 5: If X Ą Y and
Y
L R
appears in X.right, then
Y
L R
reduces to R. E.g., integer Ă number; if X number
and Y integer; thus
number
A B
integer
L R
C
Ñ
number
A B
R
C
.
Case 7: If X Ą Y and
Y
L R
appears in X.lef t, then
Y
L R
reduces to R. E.g., integer Ă string; if X string
and Y integer; thus
string
A
C
integer
L R
B
Ñ
string
A
C
R
B
.
4.4.4 Reduction to child
The list of reductions described in Section 4.4.3 fails to
apply in cases where the root node itself needs to be elimi-
nated. For example, since vector Ă array we would like the
following reductions:
array
vector
t nil
nil
Ñ
vector
t nil
.
The solution which we have implemented is that before
constructing a new BDD, we first ask whether the resulting
BDD is type-equivalent to either the left or right children
using the subtypep function. If so, we simply return the
appropriate child without allocating the parent BDD. The
expense of this type-equivalence is mitigated by the mem-
oization. Thereafter, the result is in the hash table, and it
will be discovered as discussed in Section 4.4.2.
4.4.5 More complex type relations
There are a few more cases which are not covered by the
above optimizations. Consider the following BDD:
integer
nil ratio
nil rational
t nil
This represents the type (and (not integer) (not ratio)
rational), but in Common Lisp rational is identical to
(or integer ratio), which means (and (not integer) (not
ratio) rational) is the empty type. For this reason, as a
last resort before allocating a new BDD, we check, using the
Common Lisp function subtypep, whether the type specifier
specifies the nil or t type. Again this check is expensive,
but the expense is mitigated in that the result is cached.
5. MDTD IN COMMON LISP
When attempting to implement the algorithms discussed
in Sections 2.1 and 2.2 the developer finds it necessary to
choose a data structure to represent type specifiers. Which
ever data structure is chosen, the program must calculate
type intersections, unions, and relative complements and
type equivalence checks and checks for the empty type. As
discussed in Section 3, s-expressions (i.e. lists and symbols)
is a valid choice of data structure and the aforementioned
operations may be implemented as list constructions and
calls to the subtypep predicate.
array
vector
nil t
number
t nil
Figure 7: BDD representing (or number (and array
(not vector))
As introduced in Section 4, another choice of data struc-
ture is the BDD. Using the BDD data structure along with
the algorithms described in Section 4 we can efficiently rep-
resent and manipulate Common Lisp type specifiers. We
may programmatically represent Common Lisp types largely
independent of the actual type specifier representation. For
example the following two type specifiers denote the same set
of values: (or number (and array (not vector))) and (not
(and (not number) (or (not array) vector))), and are both
represented by the BDD shown in Figure 5. Moreover,
unions, intersections, and relative complements of Common
Lisp type specifiers can be calculated using the reduction
BDD manipulation rules also explained in Section 4.
We have made comparisons of the two algorithms de-
scribed in Sections 2.1, 2.2. One implementation of each
uses s-expressions, one implementation of each uses BDDs.
Some results of the analysis can be seen in Section 6.
Using BDDs in these algorithms allows certain checks to
be made more easily than with the s-expression approach.
For example, two types are equal if they are the same object
(pointer comparison, eq). A type is empty if it is identi-
cally the empty type (pointer comparison). Finally, given
two types (represented by BDDs), the subtype check can be
made using the following function:
( de fun b d d -subtype p ( b d d-su b bdd-s u p er )
( eq * bdd- f alse *
( bdd-an d - n ot bdd - sub bd d -supe r )))
This implementation of bdd-subtype should not be in-
terpreted to mean that we have obviated the need for the
Common Lisp subtypep function. In fact, subtypep, is still
useful in constructing the BDD itself. However, once the
BDDs have been constructed, and cached, subtype checks
may at that point avoid calls to subtypep, which in some
cases might otherwise be more compute intensive.
6. PERFORMANCE OF MDTD
Sections 2.1 and 2.2 explained two different algorithms
for calculating type decomposition. We look here at some
performance characteristics of the two algorithms. The al-
gorithms from Section 2.1 and Section 2.2 were tested us-
ing both the Common Lisp type specifier s-expression as
data structure and also using the BDD data structure as
described in Section 5. Figures 9 and 8 contrast the four
effective algorithms in terms of execution time vs sample
size.
We attempted to plot the results many different ways:
time as a function of input size, number of disjoint sets in the
input, number of new types generated in the output. Some
of these plots are available in the technical report [12]. The
plot which we found heuristically to show the strongest vi-
sual correlation was calculation time vs the integer product
of the number of given input types multiplied by the num-
ber of calculated output types. E.g., if the algorithm takes a
list of 5 type specifiers and computes 3 disjoint types in 0.1
seconds, the graph contains a point at (15,0.1). Although
we don’t claim to completely understand why this particular
plotting strategy shows better correlation than the others we
tried, it does seem that all the algorithms begin a Opn
2
q loop
by iterating over the given set of types which is incremen-
tally converted to the output types, so the algorithms in
some sense finish by iterating over the output types. More
research is needed to better understand the correlation.
6.1 Performance Test Setup
The type specifiers used in Figure 9 are those designating
all the subtypes of fixnum such as. (member 2 6 7 9) and
(member 1 2 8 10). The type specifiers used in Figure 8
are those designating a randomly selected set of subtypes of
cl:number and cl:condition together with programmati-
cally generated logical combinations thereof such as (and
number (not bit)) and (or real type-error).
10
0
10
1
10
2
10
3
10
4
10
´3
10
´2
10
´1
10
0
10
1
Size
Time
DECOMPOSE-TYPES
DECOMPOSE-TYPES-GRAPH
BDD-DECOMPOSE-TYPES
DECOMPOSE-TYPES-BDD-GRAPH
Figure 8: Combinations of number and condition
The performance tests comprise starting with a list of ran-
domly selected type specifiers from a pool, calling each of
the four functions to calculate the disjoint decomposition,
and recording the time of each calculation. We have plot-
10
1
10
2
10
´3
10
´2
10
´1
10
0
10
1
Size
Time
Figure 9: Subtypes of fixnum
ted in Figures 9 and 8 the results of the runs which took
less than 30 seconds to complete. This omission does not
in any way effect the presentation of which algorithms were
the fastest on each test.
The tests were performed on a MacBook 2 GHz Intel Core
i7 processor with 16GB 1600 MHz DDR3 memory, and using
SBCL 1.3.0 ANSI Common Lisp.
6.2 Analysis of Performance Tests
There is no clear winner for small sample sizes. But it
seems the tree based algorithms do very well on large sample
sizes. This is not surprising, as the graph based algorithm
was designed with the intent to reduce the number of passes,
and take advantage of subtype and disjointness information.
Often the better performing of the graph based algorithms
is the BDD based one as shown in Figure 8. However there
is a notable exception shown in Figures 9 where graph algo-
rithm using s-expressions performs best.
7. RELATED WORK
Computing a disjoint decomposition when permitted to
look into the sets has been referred to as union find [15, 9].
MDTD differs in that we decompose the set without knowl-
edge of the specific elements; i.e. we are not permitted to
iterate over or visit the individual elements. The correspon-
dence of types to sets and subtypes to subsets thereof is also
treated extensively in the theory of semantic subtyping [8].
BDDs have been used in electronic circuit generation[?],
verification, symbolic model checking[?], and type system
models such as in XDuce [10]. None of these sources dis-
cusses how to extend the BDD representation to support
subtypes.
Decision tree techniques are useful in the efficient com-
pilation of pattern matching constructs in functional lan-
guages[?]. An important concern in pattern matching com-
pilation is finding the best ordering of the variables which is
known to be NP-hard. However, when using BDDs to repre-
sent Common Lisp type specifiers, we obtain representation
(pointer) equality, simply by using a consistent ordering;
finding the best ordering is not necessary for our applica-
tion.
8. CONCLUSION AND FUTURE WORK
The results of the performance testing in Section 6 lead us
to believe that the BDD as data structure for representing
Common Lisp type specifiers is promising, but there is still
work to do, especially in identifying heuristics to predict its
performance relative to more traditional approaches.
It is known that algorithms using BDD data structure
tend to trade space for speed. Castagna [7] suggests a lazy
version of the BDD data structure which may reduce the
memory footprint, which would have a positive effect on the
BDD based algorithms. We have spent only a few weeks
optimizing our BDD implementation based on the Ander-
sen’s description [3], whereas the CUDD [16] developers have
spent many years of research optimizing their algorithms.
Certainly our BDD algorithm can be made more efficient
using techniques of CUDD or others.
Although, we do not attempt, in this paper, to motivate
in detail the applications or implications of MDTD, we sus-
pect there may be a connection between the problem, and
efficient compilation of type-case and its use in improving
pattern matching capabilities of Common Lisp. We con-
sider such development and motivation a matter of future
research.
An immediate priority in our research is to formally prove
the correctness of our algorithms, most notably the graph
decomposition algorithm from Section 2.2. Experimentation
leads us to believe that the graph algorithm always termi-
nates with the correct answer, nevertheless we admit there
may be exotic cases which cause deadlock or other errors.
It has also been observed that in the algorithm explained
in section 2.2 that the convergence rate varies depending on
the order the reduction operations are performed. We do not
yet have enough data to characterize this dependence. Fur-
thermore, the order to break connections in the algorithm
in Section 2.2. It is clear that many different strategies are
possible, (1) break busiest connections first, (2) break con-
nections with the fewest dependencies, (3) random order,
(4) closest to top of tree, etc. These are all areas of ongoing
research.
We plan to investigate whether there are other applica-
tions MDTD outside the Common Lisp type system. We
hope the user of Castagna’s techniques [7] on type systems
with semantic subtyping may benefit from the optimizations
we have discussed.
A potential application with Common Lisp is improving
the subtypep implementation itself, which is known to be
slow in some cases. Section 5 gave a BDD specific implemen-
tation of bdd-subtypep. We intend to investigate whether
existing Common Lisp implementations could use our tech-
nique to represent type specifiers in their inferencing en-
gines, and thereby make some subtype checks more efficient.
9. REFERENCES
[1] H. Abelson and G. J. Sussman. Structure and
Interpretation of Computer Programs. MIT Press,
Cambridge, MA, USA, 2nd edition, 1996.
[2] S. B. Akers. Binary decision diagrams. IEEE Trans.
Comput., 27(6):509–516, June 1978.
[3] H. R. Andersen. An introduction to binary decision
diagrams. Technical report, Course Notes on the
WWW, 1999.
[4] Ansi. American National Standard: Programming
Language Common Lisp. ANSI X3.226:1994
(R1999), 1994.
[5] H. G. Baker. A decision procedure for Common Lisp’s
SUBTYPEP predicate. Lisp and Symbolic
Computation, 5(3):157–190, 1992.
[6] R. E. Bryant. Graph-based algorithms for boolean
function manipulation. IEEE Transactions on
Computers, 35:677–691, August 1986.
[7] G. Castagna. Covariance and contravariance: a fresh
look at an old issue. Technical report, CNRS, 2016.
[8] G. Castagna and A. Frisch. A gentle introduction to
semantic subtyping. In Proceedings of the 7th ACM
SIGPLAN International Conference on Principles and
Practice of Declarative Programming, PPDP ’05,
pages 198–199, New York, NY, USA, 2005. ACM.
[9] B. A. Galler and M. J. Fisher. An improved
equivalence algorithm. Commununication of the ACM,
7(5):301–303, may 1964.
[10] H. Hosoya, J. Vouillon, and B. C. Pierce. Regular
expression types for XML. ACM Trans. Program.
Lang. Syst., 27(1):46–90, Jan. 2005.
[11] J. Newton. Report: Efficient dynamic type checking of
heterogeneous sequences. Technical report,
EPITA/LRDE, 2016.
[12] J. Newton. Analysis of algorithms calculating the
maximal disjoint decomposition of a set. Technical
report, EPITA/LRDE, 2017.
[13] J. Newton, A. Demaille, and D. Verna. Type-Checking
of Heterogeneous Sequences in Common Lisp. In
European Lisp Symposium, Kraków, Poland, May
2016.
[14] P. Norvig. Paradigms of Artificial Intelligence
Programming: Case Studies in Common Lisp. Morgan
Kaufmann, 1992.
[15] M. M. A. Patwary, J. R. S. Blair, and F. Manne.
Experiments on union-find algorithms for the
disjoint-set data structure. In P. Festa, editor,
Proceedings of 9th International Symposium on
Experimental Algorithms (SEA’10), volume 6049 of
Lecture Notes in Computer Science, pages 411–423.
Springer, 2010.
[16] F. Somenzi. CUDD: BDD package, University of
Colorado, Boulder.
http://vlsi.colorado.edu/~fabio/CUDD/.