Programmatic Manipulation of Common Lisp Type

Speciﬁers

Jim E. Newton

jnewton@lrde.epita.fr

Didier Verna

didier@lrde.epita.fr

Maximilien Colange

maximilien.colange@lrde.epita.fr

EPITA/LRDE

14-16 rue Voltaire

F-94270 Le Kremlin-Bicêtre

France

ABSTRACT

In this article we contrast the use of the s-expression with the

BDD (Binary Decision Diagram) as a data structure for pro-

grammatically manipulating Common Lisp type speciﬁers.

The s-expression is the de facto standard surface syntax and

also programmatic representation of the type speciﬁer, but

the BDD data structure oﬀers advantages: most notably,

type equivalence checks using s-expressions can be computa-

tionally intensive, whereas the type equivalence check using

BDDs is a check for object identity. As an implementation

and performance experiment, we deﬁne the notion of max-

imal disjoint type decomposition, and discuss implementa-

tions of algorithms to compute it: a brute force iteration,

and as a tree reduction. The experimental implementations

represent type speciﬁers by both aforementioned data struc-

tures, and we compare the performance observed in each

approach.

CCS Concepts

•Theory of computation Ñ Data structures design

and analysis; Type theory; •Computing methodologies

Ñ Representation of Boolean functions; •Mathematics

of computing Ñ Graph algorithms;

1. INTRODUCTION

Common Lisp programs which manipulate type speciﬁers

have traditionally used s-expressions as the programmatic

representations of types, as described in the Common Lisp

speciﬁcation [4, Section 4.2.3]. Such choice of internal data

structure oﬀers advantages such as homoiconicity, making

the internal representation human readable in simple cases,

and making programmatic manipulation intuitive, as well

as enabling the direct use of built-in Common Lisp func-

tions such as typep and subtypep. However, this approach

does present some challenges. Such programs often make

use of ad-hoc logic reducers—attempting to convert types

ELS ’10 April 3–4, 2016, Brussels, Belgium

ACM ISBN 978-1-4503-2138-9.

DOI:

to canonical form. These reducers can be complicated and

diﬃcult to debug. In addition run-time decisions about type

equivalence and subtyping can suﬀer performance problems.

In this article we present an alternative internal represen-

tation for Common Lisp types: the Binary Decision Dia-

gram (BDD) [6, 2]. BDDs have interesting characteristics

such as representational equality; i.e. it can be arranged

that equivalent expressions or equivalent sub-expressions are

represented by the same object (eq). While techniques to

implement BDDs with these properties are well documented,

an attempt apply the techniques directly to the Common

Lisp type system encounters obstacles which we analyze and

document in this article.

In order to compare performance characteristics of the two

data structure approaches, we have constructed a problem

called Maximal Disjoint Type Decomposition (MDTD): de-

composing a given set of potentially overlapping types into

a set of disjoint types. Although MDTD is interesting in its

own right, we do not attempt, in this paper, to motivate in

detail the applications or implications of the problem. We

consider such development and motivation a matter of fu-

ture research. Our use of the MDTD problem in this article

is primarily a performance comparison vehicle.

We present two algorithms to compute the MDTD, and

separately implement the algorithms with both data struc-

tures s-expressions and BDDs (4 implementations in total).

Finally, we report performance characteristics of the four

algorithms implemented in Common Lisp.

Key contributions of this article are:

‚ A description of how to extended known BDD related

implementation techniques to represent Common Lisp

types and facility type based calculations.

‚ Performance comparison of algorithms using traditional

s-expression based type speciﬁers vs. using the BDD

data structure.

‚ A graph based algorithm for reducing the computa-

tional complexity of MDTD.

2. DISJOINT TYPE DECOMPOSITION

In presenting the problem of decomposing a set of over-

lapping types into non-overlapping subtypes, we start with

an example intended to convey an intuition of the problem.

We continue by deﬁning precisely what we intend to calcu-

late. Then in sections 2.1 and 2.2 we present two diﬀerent

algorithms for performing that calculation.

Figure 1: Example Venn Diagram

Disjoint Set Derived Expression

X A

Figure 2: Disjoint Decomposition of Sets from Fig-

ure 1

In the Venn diagram in Figure 1, V “ tA

, A

, ..., A

u. We

wish to construct logical combinations of those sets to form

as many mutually disjoint subsets as possible. The result-

ing decomposition should have the same union as the original

set. The maximal disjoint decomposition D “ tX

, X

, ..., X

of V is shown in Figure 2.

Notation 1. We use the symbol, K, to indicate the dis-

joint relation between sets. I.e., we take A K B to mean

A X B “ H. We also say A M B to mean A X B ‰ H.

Notation 2. We use the notation, A Ă B, (A Ą B) to

indicate that A is either a strict subset (superset) of B or is

equal to B.

Definition 1. Let U be a set and V be a set of subsets of

U. The Boolean closure of V , denoted

V , is the (smallest)

super-set of V such that α, β P

V ùñ tα X β, α X βu Ă

V .

Definition 2. Let U be a set, and let V and D be ﬁnite

sets of non-empty subsets of U . D is said to be a disjoint

decomposition of V , if the elements of D are mutually dis-

joint, D Ă

V , and

XPD

X “

APV

A. If no larger set ful-

ﬁlls those properties, D is said to be the maximal disjoint

decomposition of V .

We claim without proof that there exists a unique maxi-

mal disjoint decomposition of a given V . A more complete

discussion and formal proof are available [12].

The MDTD problem: Given a set U and a set of sub-

sets thereof, V “ tA

, A

, ..., A

u, suppose that for each

pair pA

, A

q, we know which of the relations hold: A

Ă A

Ą A

, A

K A

. We would like to compute the maximal

disjoint decomposition of V .

In Common Lisp, a type is a set of (potential) values [4,

Section Type], so it makes sense to consider the maximal

disjoint decomposition of a set of types.

2.1 The RTE Algorithm

We ﬁrst encountered the MDTD problem in our previous

work on regular type expressions (RTE) [13]. The following

algorithm was the one presented in that paper, where we

pointed that the algorithm suﬀers from signiﬁcant perfor-

mance issues. Performance issues aside, a notable feature of

the RTE version of the MDTD algorithm is that it easily ﬁts

in 40 lines of Common Lisp code, so it is easy to implement

and easy to understand.

1. Let U be the set of sets. Let V denote the set of

disjoint sets, initially D “ H.

2. Identify all the sets which are disjoint from each other

and from all the other sets. (Opn

q search) Remove

these sets from U and collect them in D.

3. If possible, choose X and Y , for which X M Y .

4. Remove X and Y from U , and add any of X XY , XzY ,

and Y zX which are non-empty. I.e.,

U Ð pUztX, Y uq Y ptX X Y, XzY, Y zXuztHuq

5. Repeat steps 2 through 4 until U “ H, at which point

we have collected all the disjoint sets in D.

2.2 The graph based algorithm

One of the sources of ineﬃciency of the algorithm ex-

plained in Section 2.1 is at each iteration of the loop, an

Opn

q search is made to ﬁnd sets which are disjoint from

all remaining sets. This search can be partially obviated

if we employ a little extra book-keeping. The fact to real-

ize is that if X K A and X K B, then we know a priori

that X K A X B, X K AzB, X K BzA. This knowledge

eliminates some of useless operations.

This algorithm is semantically similar to the algorithm

shown in Section 2.1, but rather than relying on Common

Lisp primitives to make decisions about connectivity of types,

it initializes a graph representing the initial relationships,

and thereafter manipulates the graph maintaining connec-

tivity information. This algorithm is more complicated in

terms of lines of code, 250 lines of Common Lisp code as

opposed to 40 lines for the algorithm in Section 2.1.

Figure 3 shows a graph representing the topology (con-

nectedness) of the diagram shown in Figure 1. Nodes

...

in Figure 3 correspond respective to A

, A

, ... A

Figure 1. Blue arrows correspond to subset relations, point-

ing from subset to superset, and green lines correspond to

other non-disjoint relations.

To construct this graph ﬁrst eliminate duplicate sets. I.e.,

if X Ă Y and X Ą Y , then discard either X or Y . It is

necessary to consider each pair pX, Y q of sets, Opn

q loop.

7 2

Figure 3: Topology graph

‚ If X Ă Y , draw a blue arrow X Ñ Y

‚ Else if X Ą Y , draw a blue arrow X Ð Y

‚ Else if X M Y , draw green line between X and Y .

‚ If it cannot be determined whether X Ă Y , assume

the worst case, that they are non-disjoint, and draw

green line between X and Y .

The algorithm proceeds by breaking the green and blue

connections, in explicit ways until all the nodes become iso-

lated. There are two cases to consider. Repeat alternatively

applying both tests until all the nodes become isolated.

2.2.1 Subset relation

Before

5 0 Y 1

4 X 3

After

5 0 Y 1

4 X 3

Node Re-labeled Boolean expression

X X

Y Y X X

Figure 4: Subset before and after mutation

A blue arrow from X to Y may be eliminated if X has no

blue arrow pointing to it, in which case Y must be relabeled

as Y X X as indicated in Figure 4.

Figure 4 illustrates this mutation. Node

may have

other connections, including blue arrows pointing to it or

from it, and green lines connected to it. However node

has no blue arrows pointing to it; although it may have other

blue arrows pointing away from it.

If X touches (via a green line) any sibling nodes, i.e. any

other node that shares Y as super-class, then the blue arrow

is converted to a green line. In the before image of Figure 4

there is a blue arrow from

and in the after image

this arrow has been converted to a green line.

Before

0 X Y

After

0 X

Y 1

Node Re-labeled Boolean expression

X X X Y

Y X X Y

Z X X Y

Figure 5: Touching connections before and after mu-

tation

2.2.2 Touching connections

A green line connecting X and Y may be eliminated if

neither X nor Y has a blue arrow pointing to it. Conse-

quently, X and Y must be relabeled and a new node must

be added to the graph as indicated in Figure 5. The ﬁgure

illustrates the step of breaking such a connection between

nodes

and

by introducing the node

Construct blue arrows from this node, Z, to all the nodes

which either X or Y points to (union). Construct green

lines from Z to all nodes which both X and Y connect to

(intersection). If this process results in two nodes connected

both by green and blue, omit the green line.

3. TYPE SPECIFIER MANIPULATION

To correctly implement the MDTD by either strategy de-

scribed above, we need operators to test for type-equality,

type disjoint-ness, subtype-ness, and type-emptiness. Given

a subtype predicate, the other predicates can be constructed.

The emptiness check: A “ H ðñ A Ă H. The dis-

joint check: A K B ðñ A X B Ă H. Type equivalence

A “ B ðñ A Ă B and B Ă A.

Common Lisp has a ﬂexible type calculus making type

speciﬁers human readable and also related computation pos-

sible. Even with certain limitations, s-expressions are an

intuitive data structure for programmatic manipulation of

type speciﬁers in analyzing and reasoning about types.

If T1 and T2 are Common Lisp type speciﬁers, the type

speciﬁer (and T1 T2) designates the intersection of the types.

Likewise (and T1 (not T2)) is the type diﬀerence. The empty

type and the universal type are designated by nil and t re-

spectively. The subtypep function serves as the subtype

predicate. Consequently (subtypep ’(and T1 T2) nil) com-

putes whether T1 and T2 are disjoint.

There is an important caveat however. The subtypep

function is not always able to determine whether the named

types have a subtype relationship [5]. In such a case, subtypep

returns nil as its second value. This situation occurs most

notably in the cases involving the satisfies type speciﬁer.

For example, to determine whether the (satisfies F) type

is empty, it would be necessary to solve the halting problem,

ﬁnding values for which the function F returns true.

As a simple example of how the Common Lisp program-

mer might manipulate s-expression based type speciﬁers,

consider the following problem. In SBCL 1.3.0, the expres-

sion (subtypep ’(member :x :y) ’keyword) returns nil,nil,

rather than t,t. Although this is compliant behavior, the

result is unsatisfying, because clearly both :x and :y are

elements of the keyword type. By manipulating the type

speciﬁer s-expressions, the user can implement a smarter

version of subtypep to better handle this particular case.

Regrettably, the user cannot force the system to use this

smarter version internally.

( de fun s m a r t e r - s ubtypep (t1 t2 )

( m u ltiple-value-bind ( T1 <= T2 OK ) ( subty p ep t1 t2 )

( cond

( OK

( val ues T1 <= T2 t ))

;; ( eql obj ) or ( me mber obj1 ... )

(( t ype p t1 ’( c ons (mem ber eql me mber )))

( val ues ( ever y #’( lamb da ( obj )

( ty pep obj t2 ))

( cdr t1 ))

t ))

( val ues nil nil )))))

As mentioned above, programs manipulating s-expression

based type speciﬁers can easily compose type intersections,

unions, and relative complements as part of reasoning algo-

rithms. Consequently, the resulting programmatically com-

puted type speciﬁers may become deeply nested, resulting

in type speciﬁers which may be confusing in terms of hu-

man readability and debuggability. The following program-

matically generated type speciﬁer is perfectly reasonable for

programmatic use, but confusing if it appears in an error

message, or if the developer encounters it while debugging.

( or

( or ( and (and num b er ( not bi gnum ))

( not (or fix num ( or bit ( eql -1 ) ))))

( and (and ( and numb er ( not bi gnum ))

( not (or fix num ( or bit ( eql -1 )))))

( not (or fix num ( or bit ( eql -1 ) )))))

( and (and ( and numb er ( not bi gnum ))

( not (or fix num ( or bit ( eql -1 ) ))))

( not (or fix num ( or bit ( eql -1 ) )))))

This somewhat obfuscated type speciﬁer is semantically

equivalent to the more humanly readable form (and number

(not bignum) (not fixnum)). Moreover, it is possible to write

a Common Lisp function to simplify many complex type

speciﬁers to simpler form.

There is a second reason apart from human readability

which motivates reduction of type speciﬁers to canonical

form. The problem arises when we wish to programmati-

cally determine whether two s-expressions specify the same

type, or in particular when a given type speciﬁer speciﬁes

the nil type. Sometimes this question can be answered by

calls to subtypep as in (and (subtypep T1 T2) (subtypep

T2 T1)). However, as mentioned earlier, subtypep is al-

lowed to return nil,nil in some situations, rendering this

approach futile in many cases. If, on the other hand, two

type speciﬁers can be reduced to the same canonical form,

we can conclude that the speciﬁed types are equal.

We have implemented such a function, reduce-lisp-type.

It does a good job of reducing the given type speciﬁer toward

a canonical form, by repeatedly recursively descending the

expression, re-writing sub-expressions, incrementally mov-

ing the expression toward a ﬁxed point. We choose to con-

vert the expression to a disjunctive normal form, e.g., (or

(and (not a) b) (and a b (not c))). The reduction proce-

dure follows the models presented by Sussman and Abel-

son [1, p. 108] and Norvig [14, ch. 8].

4. BINARY DECISION DIAGRAMS

A challenge using s-expressions for programmatic repre-

sentation of type speciﬁers is the need to after-the-fact re-

duce complex type speciﬁers to a canonical form. This re-

duction can be computationally intense, and diﬃcult to im-

plement correctly. We present here a data structure called

the Binary Decision Diagram (BDD) [6, 2], which obviates

much of the need to reduce to canonical form because it

maintains a canonical form by design. Before looking at

how the BDD can be used to represent Common Lisp type

speciﬁers, we ﬁrst look at how BDDs are used tradition-

ally to represent Boolean equations. Thereafter, we explain

how this traditional treatment can be enhanced to represent

Common Lisp types.

4.1 Representing Boolean expressions

Andersen [3] summarized many of the algorithms for ef-

ﬁciently manipulating BDDs. Not least important in An-

dersen’s discussion is how to use a hash table and dedicated

constructor function to eliminate redundancy within a single

BDD and within an interrelated set of BDDs. The result of

Andersen’s approach is that if you attempt to construct two

BDDs to represent two semantically equivalent but syntac-

tically diﬀerent Boolean expressions, then the two resulting

BDDs are pointers to the same object.

1 A3

1 0

0 1

Figure 6: BDD for pA

q_pA

^ A

q_p A

^ A

Figure 6 shows an example BDD illustrating a function of

three Boolean variables: A

, A

, and A

. To reconstruct the

DNF (disjunctive normal form), collect the paths from the

root node, A

, to a leaf node of 1, ignoring paths terminated

by 0. When the right child is traversed, the Boolean com-

plement ( ) of the label on the node is collected (e.g. A

and when the left child is traversed the non-inverted parent

is collected. Interpret each path as a conjunctive clause, and

form a disjunction of the conjunctive clauses. In the ﬁgure

the three paths from A

to 1 identify the three conjunctive

clauses pA

^ A

q, pA

^ A

q, and p A

^ A

4.2 Representing types

Castagna [7] explains the connection of BDDs to type

theoretical calculations, and provides straightforward algo-

rithms for implementing set operations (intersection, union,

relative complement) of types using BDDs. The general re-

cursive algorithms for computing the BDDs which represent

the common Boolean algebra operators are straightforward.

Let B, B

, and B

denote BDDs, B

“ pif a

q and

“ pif a

, C

, D

, and D

represent BDDs. The a

and a

are

intended to represent type names, but for the deﬁnition to

work it is only necessary that they represent labels which

are order-able. We would eventually like the labels to ac-

commodate Common Lisp type type names, but this is not

immediately possible.

The formulas for pB

_ B

q, pB

^ B

q, and pB

z B

q are

similar to each other. If ˝ P t_, ^, zu, then

˝ B

“

’

pif a

˝ C

q pD

˝ D

qq for a

“ a

pif a

˝ B

q pD

˝ B

qq for a

ă a

pif a

˝ C

q pB

˝ D

qq for a

ą a

There are several special cases, the ﬁrst three of which

serve as termination conditions for the recursive algorithms.

‚ pt _ Bq and pB _ tq reduce to t.

‚ pnil ^ Bq, pB ^ nilq, and pBztq reduce to nil.

‚ pt ^ Bq, pB ^ tq, pnil _ Bq, and pB _ nilq reduce to B.

‚ pt z pif a B

qq reduces to pif a ptzB

q ptzB

qq.

4.3 Representing Common Lisp types

We have implemented the BDD data structure as a set of

Clos classes. In particular, there is one leaf-level Clos class

for an internal tree node, and one singleton class/instance

for each of the two possible leaf nodes, true and false.

The label of the BDD contains a Common Lisp type name,

and the logical combinators (and, or, and not) are repre-

sented implicitly in the structure of the BDD.

A disadvantage BDDs present when compared to s-expressions

as presented in Section 3 is the loss of homoiconicity. Whereas,

s-expression based type-speciﬁers may appear in-line in the

Common Lisp code, BDDs may not.

A remarkable fact about this representation is that any

two logically equivalent Boolean expressions have exactly

the same BDD structural representation, provided the node

labels are consistently, totally ordered. Andersen[3] provides

a proof for this claim. For example, the expression from Fig-

ure 6, pA

q_pA

^ A

q_p A

^ A

q is equivalent

to pp A

_ A

q^p A

_ A

q^pA

qq. So they

both have the same shape as shown in the Figure 6. How-

ever, if we naïvely substitute Common Lisp type names for

Boolean variables in the BDD representation as suggested

by Castagna, we ﬁnd that this equivalence relation does not

hold in many cases related to subtype relations in the Com-

mon Lisp type system.

An example is that the Common Lisp two types (and

(not arithmetic-error) array (not base-string)) vs.

(and array (not base-string)) are equivalent, but the

naïvely constructed BDDs are diﬀerent:

arithmetic-error

nil array

base-string

nil t

nil

vs.

array

base-string

nil t

nil

In order to assure the minimum number of BDD alloca-

tions possible, and thus ensure that BDDs which represent

equivalent types are actually represented by the same BDD,

the suggestion by Andersen [3] is to intercept the BDD con-

structor function. This constructor should assure that it

never returns two BDD which are semantically equivalent

but not eq.

4.4 Canonicalization

Several checks are in place to reduce the total number

of BDDs allocated, and to help assure that two equivalent

Common Lisp types result in the same BDD. The following

sections, 4.4.1 through 4.4.5 detail the operations which we

found necessary to handle in the BDD construction function

in order to assure that equivalent Common Lisp type spec-

iﬁers result in identical BDDs. The ﬁrst two come directly

from Andersen’s work. The remaining are our contribution,

and are the cases we found necessary to implement in order

to enhance BDDs to be compatible with the Common Lisp

type system.

4.4.1 Equal right and left children

An optimization noted by Andersen is that if the left and

right children are identical then simply return one of them,

without allocating a new BDD [3].

4.4.2 Caching BDDs

Another optimization noted by Andersen is that whenever

a new BDD is allocated, an entry is made into a hash table so

that the next time a request is made with the exactly same

label, left child, and right child, the already allocated BDD is

returned. We associate each new BDD with a unique integer,

and create a hash key which is a list (a triple) of the type

speciﬁer (the label) followed by two integers corresponding

to the left and right children. We use a Common Lisp equal

hash table for this storage, although we’d like to investigate

whether creating a more speciﬁc hash function speciﬁc to

our key might be more eﬃcient.

4.4.3 Reduction in the presence of subtypes

Since the nodes of the BDD represent Common Lisp types,

other speciﬁc optimizations are made. The cases include sit-

uations where types are related to each other in certain ways:

subtype, supertype, and disjoint types. In particular there

are 12 optimization cases, detailed in Table 1. Each of these

optimizations follows a similar pattern: when constructing

a BDD with label X, search in either the left or right child

to ﬁnd a BDD,

L R

. If X and Y have a particular rela-

tion, diﬀerent for each of the 12 cases, then the

L R

BDD

reduces either to L or R. Two cases, 5 and 7, are further

illustrated below.

Case Child to search Relation Reduction

1 X.lef t X K Y Y Ñ Y.right

2 X.lef t X K Y Y Ñ Y.lef t

3 X.rig ht X K Y Y Ñ Y.right

4 X.rig ht X K Y Y Ñ Y.lef t

5 X.rig ht X Ą Y Y Ñ Y.right

6 X.rig ht X Ą Y Y Ñ Y.lef t

7 X.lef t X Ą Y Y Ñ Y.right

8 X.lef t X Ą Y Y Ñ Y.lef t

9 X.lef t X Ă Y Y Ñ Y.lef t

10 X.lef t X Ă Y Y Ñ Y.right

11 X.rig ht X Ă Y Y Ñ Y.lef t

12 X.rig ht X Ă Y Y Ñ Y.right

Table 1: BDD optimizations

Case 5: If X Ą Y and

L R

appears in X.right, then

L R

reduces to R. E.g., integer Ă number; if X “ number

and Y “ integer; thus

number

A B

integer

L R

number

A B

Case 7: If X Ą Y and

L R

appears in X.lef t, then

L R

reduces to R. E.g., integer Ă string; if X “ string

and Y “ integer; thus

string

integer

L R

string

4.4.4 Reduction to child

The list of reductions described in Section 4.4.3 fails to

apply in cases where the root node itself needs to be elimi-

nated. For example, since vector Ă array we would like the

following reductions:

array

vector

t nil

nil

vector

t nil

The solution which we have implemented is that before

constructing a new BDD, we ﬁrst ask whether the resulting

BDD is type-equivalent to either the left or right children

using the subtypep function. If so, we simply return the

appropriate child without allocating the parent BDD. The

expense of this type-equivalence is mitigated by the mem-

oization. Thereafter, the result is in the hash table, and it

will be discovered as discussed in Section 4.4.2.

4.4.5 More complex type relations

There are a few more cases which are not covered by the

above optimizations. Consider the following BDD:

integer

nil ratio

nil rational

t nil

This represents the type (and (not integer) (not ratio)

rational), but in Common Lisp rational is identical to

(or integer ratio), which means (and (not integer) (not

ratio) rational) is the empty type. For this reason, as a

last resort before allocating a new BDD, we check, using the

Common Lisp function subtypep, whether the type speciﬁer

speciﬁes the nil or t type. Again this check is expensive,

but the expense is mitigated in that the result is cached.

5. MDTD IN COMMON LISP

When attempting to implement the algorithms discussed

in Sections 2.1 and 2.2 the developer ﬁnds it necessary to

choose a data structure to represent type speciﬁers. Which

ever data structure is chosen, the program must calculate

type intersections, unions, and relative complements and

type equivalence checks and checks for the empty type. As

discussed in Section 3, s-expressions (i.e. lists and symbols)

is a valid choice of data structure and the aforementioned

operations may be implemented as list constructions and

calls to the subtypep predicate.

array

vector

nil t

number

t nil

Figure 7: BDD representing (or number (and array

(not vector))

As introduced in Section 4, another choice of data struc-

ture is the BDD. Using the BDD data structure along with

the algorithms described in Section 4 we can eﬃciently rep-

resent and manipulate Common Lisp type speciﬁers. We

may programmatically represent Common Lisp types largely

independent of the actual type speciﬁer representation. For

example the following two type speciﬁers denote the same set

of values: (or number (and array (not vector))) and (not

(and (not number) (or (not array) vector))), and are both

represented by the BDD shown in Figure 5. Moreover,

unions, intersections, and relative complements of Common

Lisp type speciﬁers can be calculated using the reduction

BDD manipulation rules also explained in Section 4.

We have made comparisons of the two algorithms de-

scribed in Sections 2.1, 2.2. One implementation of each

uses s-expressions, one implementation of each uses BDDs.

Some results of the analysis can be seen in Section 6.

Using BDDs in these algorithms allows certain checks to

be made more easily than with the s-expression approach.

For example, two types are equal if they are the same object

(pointer comparison, eq). A type is empty if it is identi-

cally the empty type (pointer comparison). Finally, given

two types (represented by BDDs), the subtype check can be

made using the following function:

( de fun b d d -subtype p ( b d d-su b bdd-s u p er )

( eq * bdd- f alse *

( bdd-an d - n ot bdd - sub bd d -supe r )))

This implementation of bdd-subtype should not be in-

terpreted to mean that we have obviated the need for the

Common Lisp subtypep function. In fact, subtypep, is still

useful in constructing the BDD itself. However, once the

BDDs have been constructed, and cached, subtype checks

may at that point avoid calls to subtypep, which in some

cases might otherwise be more compute intensive.

6. PERFORMANCE OF MDTD

Sections 2.1 and 2.2 explained two diﬀerent algorithms

for calculating type decomposition. We look here at some

performance characteristics of the two algorithms. The al-

gorithms from Section 2.1 and Section 2.2 were tested us-

ing both the Common Lisp type speciﬁer s-expression as

data structure and also using the BDD data structure as

described in Section 5. Figures 9 and 8 contrast the four

eﬀective algorithms in terms of execution time vs sample

size.

We attempted to plot the results many diﬀerent ways:

time as a function of input size, number of disjoint sets in the

input, number of new types generated in the output. Some

of these plots are available in the technical report [12]. The

plot which we found heuristically to show the strongest vi-

sual correlation was calculation time vs the integer product

of the number of given input types multiplied by the num-

ber of calculated output types. E.g., if the algorithm takes a

list of 5 type speciﬁers and computes 3 disjoint types in 0.1

seconds, the graph contains a point at (15,0.1). Although

we don’t claim to completely understand why this particular

plotting strategy shows better correlation than the others we

tried, it does seem that all the algorithms begin a Opn

q loop

by iterating over the given set of types which is incremen-

tally converted to the output types, so the algorithms in

some sense ﬁnish by iterating over the output types. More

research is needed to better understand the correlation.

6.1 Performance Test Setup

The type speciﬁers used in Figure 9 are those designating

all the subtypes of fixnum such as. (member 2 6 7 9) and

(member 1 2 8 10). The type speciﬁers used in Figure 8

are those designating a randomly selected set of subtypes of

cl:number and cl:condition together with programmati-

cally generated logical combinations thereof such as (and

number (not bit)) and (or real type-error).

´3

´2

´1

Size

Time

DECOMPOSE-TYPES

DECOMPOSE-TYPES-GRAPH

BDD-DECOMPOSE-TYPES

DECOMPOSE-TYPES-BDD-GRAPH

Figure 8: Combinations of number and condition

The performance tests comprise starting with a list of ran-

domly selected type speciﬁers from a pool, calling each of

the four functions to calculate the disjoint decomposition,

and recording the time of each calculation. We have plot-

´3

´2

´1

Size

Time

Figure 9: Subtypes of ﬁxnum

ted in Figures 9 and 8 the results of the runs which took

less than 30 seconds to complete. This omission does not

in any way eﬀect the presentation of which algorithms were

the fastest on each test.

The tests were performed on a MacBook 2 GHz Intel Core

i7 processor with 16GB 1600 MHz DDR3 memory, and using

SBCL 1.3.0 ANSI Common Lisp.

6.2 Analysis of Performance Tests

There is no clear winner for small sample sizes. But it

seems the tree based algorithms do very well on large sample

sizes. This is not surprising, as the graph based algorithm

was designed with the intent to reduce the number of passes,

and take advantage of subtype and disjointness information.

Often the better performing of the graph based algorithms

is the BDD based one as shown in Figure 8. However there

is a notable exception shown in Figures 9 where graph algo-

rithm using s-expressions performs best.

7. RELATED WORK

Computing a disjoint decomposition when permitted to

look into the sets has been referred to as union ﬁnd [15, 9].

MDTD diﬀers in that we decompose the set without knowl-

edge of the speciﬁc elements; i.e. we are not permitted to

iterate over or visit the individual elements. The correspon-

dence of types to sets and subtypes to subsets thereof is also

treated extensively in the theory of semantic subtyping [8].

BDDs have been used in electronic circuit generation[?],

veriﬁcation, symbolic model checking[?], and type system

models such as in XDuce [10]. None of these sources dis-

cusses how to extend the BDD representation to support

subtypes.

Decision tree techniques are useful in the eﬃcient com-

pilation of pattern matching constructs in functional lan-

guages[?]. An important concern in pattern matching com-

pilation is ﬁnding the best ordering of the variables which is

known to be NP-hard. However, when using BDDs to repre-

sent Common Lisp type speciﬁers, we obtain representation

(pointer) equality, simply by using a consistent ordering;

ﬁnding the best ordering is not necessary for our applica-

tion.

8. CONCLUSION AND FUTURE WORK

The results of the performance testing in Section 6 lead us

to believe that the BDD as data structure for representing

Common Lisp type speciﬁers is promising, but there is still

work to do, especially in identifying heuristics to predict its

performance relative to more traditional approaches.

It is known that algorithms using BDD data structure

tend to trade space for speed. Castagna [7] suggests a lazy

version of the BDD data structure which may reduce the

memory footprint, which would have a positive eﬀect on the

BDD based algorithms. We have spent only a few weeks

optimizing our BDD implementation based on the Ander-

sen’s description [3], whereas the CUDD [16] developers have

spent many years of research optimizing their algorithms.

Certainly our BDD algorithm can be made more eﬃcient

using techniques of CUDD or others.

Although, we do not attempt, in this paper, to motivate

in detail the applications or implications of MDTD, we sus-

pect there may be a connection between the problem, and

eﬃcient compilation of type-case and its use in improving

pattern matching capabilities of Common Lisp. We con-

sider such development and motivation a matter of future

research.

An immediate priority in our research is to formally prove

the correctness of our algorithms, most notably the graph

decomposition algorithm from Section 2.2. Experimentation

leads us to believe that the graph algorithm always termi-

nates with the correct answer, nevertheless we admit there

may be exotic cases which cause deadlock or other errors.

It has also been observed that in the algorithm explained

in section 2.2 that the convergence rate varies depending on

the order the reduction operations are performed. We do not

yet have enough data to characterize this dependence. Fur-

thermore, the order to break connections in the algorithm

in Section 2.2. It is clear that many diﬀerent strategies are

possible, (1) break busiest connections ﬁrst, (2) break con-

nections with the fewest dependencies, (3) random order,

(4) closest to top of tree, etc. These are all areas of ongoing

research.

We plan to investigate whether there are other applica-

tions MDTD outside the Common Lisp type system. We

hope the user of Castagna’s techniques [7] on type systems

with semantic subtyping may beneﬁt from the optimizations

we have discussed.

A potential application with Common Lisp is improving

the subtypep implementation itself, which is known to be

slow in some cases. Section 5 gave a BDD speciﬁc implemen-

tation of bdd-subtypep. We intend to investigate whether

existing Common Lisp implementations could use our tech-

nique to represent type speciﬁers in their inferencing en-

gines, and thereby make some subtype checks more eﬃcient.

9. REFERENCES

[1] H. Abelson and G. J. Sussman. Structure and

Interpretation of Computer Programs. MIT Press,

Cambridge, MA, USA, 2nd edition, 1996.

[2] S. B. Akers. Binary decision diagrams. IEEE Trans.

Comput., 27(6):509–516, June 1978.

[3] H. R. Andersen. An introduction to binary decision

diagrams. Technical report, Course Notes on the

WWW, 1999.

[4] Ansi. American National Standard: Programming

Language – Common Lisp. ANSI X3.226:1994

(R1999), 1994.

[5] H. G. Baker. A decision procedure for Common Lisp’s

SUBTYPEP predicate. Lisp and Symbolic

Computation, 5(3):157–190, 1992.

[6] R. E. Bryant. Graph-based algorithms for boolean

function manipulation. IEEE Transactions on

Computers, 35:677–691, August 1986.

[7] G. Castagna. Covariance and contravariance: a fresh

look at an old issue. Technical report, CNRS, 2016.

[8] G. Castagna and A. Frisch. A gentle introduction to

semantic subtyping. In Proceedings of the 7th ACM

SIGPLAN International Conference on Principles and

Practice of Declarative Programming, PPDP ’05,

pages 198–199, New York, NY, USA, 2005. ACM.

[9] B. A. Galler and M. J. Fisher. An improved

equivalence algorithm. Commununication of the ACM,

7(5):301–303, may 1964.

[10] H. Hosoya, J. Vouillon, and B. C. Pierce. Regular

expression types for XML. ACM Trans. Program.

Lang. Syst., 27(1):46–90, Jan. 2005.

[11] J. Newton. Report: Eﬃcient dynamic type checking of

heterogeneous sequences. Technical report,

EPITA/LRDE, 2016.

[12] J. Newton. Analysis of algorithms calculating the

maximal disjoint decomposition of a set. Technical

report, EPITA/LRDE, 2017.

[13] J. Newton, A. Demaille, and D. Verna. Type-Checking

of Heterogeneous Sequences in Common Lisp. In

European Lisp Symposium, Kraków, Poland, May

2016.

[14] P. Norvig. Paradigms of Artiﬁcial Intelligence

Programming: Case Studies in Common Lisp. Morgan

Kaufmann, 1992.

[15] M. M. A. Patwary, J. R. S. Blair, and F. Manne.

Experiments on union-ﬁnd algorithms for the

disjoint-set data structure. In P. Festa, editor,

Proceedings of 9th International Symposium on

Experimental Algorithms (SEA’10), volume 6049 of

Lecture Notes in Computer Science, pages 411–423.

Springer, 2010.

[16] F. Somenzi. CUDD: BDD package, University of

Colorado, Boulder.

http://vlsi.colorado.edu/~fabio/CUDD/.