Type-Checking of Heterogeneous Sequences in Common

Lisp

Jim E. Newton

jnewton@lrde.epita.fr

Akim Demaille

akim@lrde.epita.fr

Didier Verna

didier@lrde.epita.fr

EPITA/LRDE

14-16 rue Voltaire

F-94270 Le Kremlin-Bicêtre

France

ABSTRACT

We introduce the abstract concept of rational type expres-

sion and show its relationship to rational language theory.

We further present a concrete syntax, regular type expres-

sion, and a Common Lisp implementation thereof which al-

lows the programmer to declaratively express the types of

heterogeneous sequences in a way which is natural in the

Common Lisp language. The implementation uses tech-

niques well known and well founded in rational language

theory, in particular the use of the Brzozowski derivative

and deterministic automata to reach a solution which can

match a sequence in linear time. We illustrate the concept

with several motivating examples, and ﬁnally explain many

details of its implementation.

CCS Concepts

•Theory of computation → Regular languages; Au-

tomata extensions; •Software and its engineering →

Data types and structures; Source code generation;

Keywords

Rational languages, Type checking, Finite automata

1. INTRODUCTION

In Common Lisp [15] a type is identically a set of (poten-

tial) values at a particular point in time during the execution

of a program [15, Section 4.1]. Information about types pro-

vides clues for the compiler to make optimizations such as

for performance, space (image size), safety or debuggability

[10, Section 4.3] [15, Section 3.3]. Application programmers

may as well make explicit use of types within their programs,

such as with typecase, typep, the etc.

While the programmer can specify a homogeneous type for

all the elements of a vector [15, Section 15.1.2.2], or the type

for a particular element of a list, [15, System Class CONS],

two notable limitations, which we address in this article, are

1) that there is no standard way to specify heterogeneous

types for diﬀerent elements of a vector, 2) neither is there

a standard way to declare types (whether heterogeneous or

homogeneous) for all the elements of a list.

We introduce the concept of rational type expression for

abstractly describing patterns of types within sequences.

The concept is envisioned to be intuitive to the program-

mer in that it is analogous to patterns described by regular

expressions, which we assume the reader is already familiar

with.

Just as the characters of a string may be described by

a rational expression such as (a · b

∗

· c), which intends to

match strings such as "ac", "abc", and "abbbbc", the ratio-

nal type expression (string · number

∗

· symbol) is intended

to match vectors like #("hello" 1 2 3 world) and lists

like ("hello" world). Rational expressions match charac-

ter constituents of strings according to character equality.

Rational type expressions match elements of sequences by

element type.

We further introduce an s-expression based syntax, called

regular type expression to encode a rational type expres-

sion. This syntax replaces the inﬁx and post-ﬁx operators

in the rational type expression with preﬁx notation based

s-expressions. The regular type expression (:1 string (:*

number) symbol) corresponds to the rational type expres-

sion (string · number

∗

· symbol). In addition, we provide

a parameterized type named rte, whose argument is a reg-

ular type expression. The members of such a type are all

sequences matching the given regular type expression. Sec-

tion 2 describes the syntax.

While we avoid making claims about the potential utility

of declarations of such a type from the compiler’s perspec-

tive [1], we do suggest that a declarative system to describe

patterns of types within sequences has great utility for pro-

gram logic, code readability, and type safety.

A discussion of the theory of rational languages in which

our research is grounded, may be found in [9, Chapters 3,4].

This article avoids many details of the theory, and instead

emphasizes examples of problems this approach helps to

solve and explains a high level view of its implementation.

A more in-depth report of the research including the source

code is provided in [11].

2. THE REGULAR TYPE EXPRESSION

We have implemented a parameterized type named rte

(regular type expression), via deftype. The argument of

:* match zero or more times.

:+ match one or more times.

:? match zero or one time.

:1 match exactly once.

:or match any of the regular type expressions.

:and match all of the regular type expressions.

Table 1: Regular type expression keywords

rte is a regular type expression.

A regular type expression is deﬁned as either a Common

Lisp type speciﬁer, such as number, (cons number), (eql

12), or (and integer (satisfies oddp)), or a list whose

ﬁrst element is one of a limited set of keywords shown in

Table 1, and whose trailing elements are other regular type

expressions.

As a counter example, (rte (:1 (number number))) is

invalid because (number number) is neither a valid Common

Lisp type speciﬁer, nor does it begin with a keyword from

Table 1. Here are some valid examples.

(rte (:1 number number number))

corresponds to the rational type expression (number ·

number · number) and matches a sequence of exactly

three numbers.

(rte (:or number (:1 number number number)))

corresponds to (number+(number·number·number))

and matches a sequence of either one or three numbers.

(rte (:1 number (:? number number)))

corresponds to (number · (number · number)

) and

matches a sequence of one mandatory number followed

by exactly zero or two numbers. This happens to be

equivalent to the previous example.

(rte (:* cons number))

corresponds to (cons · number)

∗

and matches a se-

quence of a cons followed by a number repeated zero

or more times, i.e., a sequence of even length.

The rte type can be used anywhere Common Lisp expects

a type speciﬁer as the following examples illustrate. The

point slot of the class C expects a sequence of two numbers,

e.g., (1 2.0) or #(1 2.0).

(defclass C ()

((point :type (and list (rte (:1 number number))))

...))

The Common Lisp type speciﬁed by (cons number) is

the set of non-empty lists headed by a number, as speciﬁed

in [15, System Class CONS]. The Y argument of the func-

tion F must be a possibly empty sequence of such objects,

because it is declared as type (rte (:* (cons number))).

E.g., #((1.0) (2 :x) (0 :y "zero")).

(defun F (X Y)

(declare

(type (rte (:* (cons number)))

Y))

...)

3. APPLICATION USE CASES

The following subsections illustrate some motivating ex-

amples of regular type expressions.

lambda-list :=

(var*

[&optional

{var | (var [init-form [supplied-p-parameter]])}*]

[&rest var]

[&key {var | ({var | (keyword-name var)}

[init-form [supplied-p-parameter]]) }*

[&allow-other-keys]]

[&aux {var | (var [init-form])}*]

)

Figure 1: CL speciﬁcation syntax for the ordinary

lambda list

3.1 RTE based string regular expressions

Since a string in Common Lisp is a sequence, the rte

type may be used to perform simple string regular expres-

sion matching. Our tests have shown that the rte based

string regular expression matching is about 35% faster than

CL-PPCRE [16] when restricted to features strictly sup-

ported by the theory of rational languages, thus ignoring

CL-PPCRE features such as character encoding, capture

buﬀers, recursive patterns, etc.

The call to the function remove-if-not, below, ﬁlters a

given list of strings, retaining only the ones that match an

implicit regular expression "a*Z*b*". The function, regexp-

to-rte converts a string regular expression to a regular type

expression.

(regexp-to-rte "(ab)*Z*(ab)*")

==>

(:1 (:* (member #\a #\b))

(:* (eql #\Z))

(:* (member #\a #\b)))

(remove-if-not

(lambda (str)

(typep str

‘(rte ,(regexp-to-rte "(ab)*Z*(ab)*"))))

’("baZab"

"ZaZabZbb"

"aaZbbbb"

"aaZZZZbbbb"))

==>

("baZab"

"aaZbbbb"

"aaZZZZbbbb")

The regexp-to-rte function does not attempt the daunt-

ing task of fully implementing Perl compatible regular ex-

pressions as provided in CL-PPCRE. Instead regexp-to-

rte implements a small but powerful subset of CL-PPCRE

whose grammar is provided by [4]. Starting with this context

free grammar, we use CL-Yacc [5] to parse a string regular

expression and convert it to a regular type expression.

3.2 DWIM lambda lists

As a complex yet realistic example we use a regular type

expression to test the validity of a Common Lisp lambda

list, which are sequences which indeed are described by a

pattern.

Common Lisp speciﬁes several diﬀerent kinds of lambda

lists, used for diﬀerent purposes in the language. E.g.., the

ordinary lambda list is used to deﬁne lambda functions, the

macro lambda list is for deﬁning macros, and the destructur-

ing lambda list is for use with destructuring-bind. Each

of these lambda lists has its own syntax, the simplest of

which is the ordinary lambda list (Figure 1). The following

code shows examples of ordinary lambda lists which obey

the speciﬁcation but may not mean what you think.

(defun F1 (a b &key x &rest other-args)

...)

(defun F2 (a b &key ((Z U) nil u-used-p))

...)

The function F1, according to careful reading of the Com-

mon Lisp speciﬁcation, is a function with three keyword ar-

guments, x, &rest, and other-args, which can be refer-

enced at the call site with a bizarre function calling syntax

such as (F1 1 2 :x 3 :&rest 4 :other-args 5). What

the programmer probably meant was one keyword argument

named x and an &rest argument named other-args. Ac-

cording to the Common Lisp speciﬁcation [15, Section 3.4.1],

in order for &rest to have its normal rest-args semantics in

conjunction with &key, it must appear before not after the

&key lambda list keyword. The speciﬁcation makes no pre-

vision for &rest following &key other than that one name

a function parameter and the other have special seman-

tics. This issue is subtle. In fact, SBCL considers this such

a bizarre situation that it diverges from the speciﬁcation

and ﬂags a SB-INT:SIMPLE-PROGRAM-ERROR during compi-

lation: misplaced &REST in lambda list: (A B &KEY X

&REST OTHER-ARGS)

The function F2 is deﬁned with an unconventional &key

parameter which is not a symbol in the keyword package but

rather in the current package. Thus the parameter U must

be referenced at the call-site as (F2 1 2 ’Z 3) rather than

(F2 1 2 :Z 3).

These situations are potentially confusing, so we deﬁne

what we call the dwim ordinary lambda list. Figure 2 shows

an implementation of the type dwim-ordinary-lambda-list.

A Common Lisp programmer might want to use this type

as part of a code-walker based checker. Elements of this

type are lists which are indeed valid lambda lists for defun,

even though the Common Lisp speciﬁcation allows a more

relaxed syntax.

The dwim ordinary lambda list diﬀers from the ordinary

lambda list, in the aspects described above and also it ignores

semantics the particular lisp implement may have given to

additional lambda list keywords. It only supports semantics

for: &optional, &rest, &key, &allow-other-keys, and &aux.

3.3 destructuring-case

(defun F3 (obj)

(typecase obj

((rte (:1 symbol (:+ (eql :count) integer)))

(destructuring-bind (name &key (count 0)) obj

...))

((rte (:1 symbol list (:* string)))

(destructuring-bind (name data

&rest strings) obj

...))))

Notice in the code above that each rte clause of the type-

case includes a call to destructuring-bind which is related

and hand coded for consistency. The function F3 is imple-

mented such that the object being destructured is certain to

be of the format expected by the corresponding destructur-

ing lambda list.

We provide a macro destructuring-case which combines

the capability of destructuring-bind and typecase. More-

over, destructuring-case constructs the rte type speciﬁers

in an intelligent way, taking into account not only the struc-

ture of the destructuring lambda list but also any given type

declarations.

(deftype var ()

‘(and symbol

(not (or keyword

(member t nil)

(member ,@lambda-list-keywords)))))

(deftype dwim-ordinary-lambda-list ()

(let* ((optional-var

’(:or var

(:and list

(rte

(:1 var

(:? t

(:? var)))))))

(optional

‘(:1 (eql &optional)

(:* ,optional-var)))

(rest ’(:1 (eql &rest) var))

(key-var

’(:or var

(:and list

(rte (:1

(:or var

(cons keyword

(cons var

null)))

(:? t

(:? var)))))))

(key

‘(:1 (eql &key)

(:* ,key-var)

(:?

(eql &allow-other-keys))))

(aux-var

’(:or var

(:and list

(rte (:1 var (:? t))))))

(aux ‘(:1 (eql &aux)

(:* ,aux-var))))

‘(rte

(:1 (:* var)

(:? ,optional)

(:? ,rest)

(:? ,key)

(:? ,aux)))))

Figure 2: The dwim-ordinary-lambda-list type

(defun F4 (obj)

(destructuring-case obj

((name &key count)

(declare (type symbol name)

(type integer count))

...)

((name data &rest strings)

(declare (type name symbol)

(type data list)

(type strings

(rte (:* string))))

...)))

This macro is able to parse any valid destructuring lambda

list and convert it to a regular type expression. Supported

syntax includes &whole, &optional, &key, &allow-other-

keys, &aux, and recursive lambda lists such as:

(&whole llist a (b c)

&key x ((:y (c d)) ’(1 2))

&allow-other-keys)

A feature of destructuring-case is that it can construct

regular type expressions much more complicated than would

be practical by hand. Consider the following example which

includes two destructuring lambda lists, whose computed

regular type expressions pretty-print to about 20 lines each.

An example of the regular type expression matching Case-1

is shown in Figure 3.

(:1 (:1 fixnum (:and list (rte (:1 fixnum fixnum))))

(:and

(:* keyword t)

(:or

(:1 (:? (eql :x) symbol (:* (not (member :y :z)) t))

(:? (eql :y) string (:* (not (eql :z)) t))

(:? (eql :z) list (:* t t)))

(:1 (:? (eql :y) string (:* (not (member :x :z)) t))

(:? (eql :x) symbol (:* (not (eql :z)) t))

(:? (eql :z) list (:* t t)))

(:1 (:? (eql :x) symbol (:* (not (member :y :z)) t))

(:? (eql :z) list (:* (not (eql :y)) t))

(:? (eql :y) string (:* t t)))

(:1 (:? (eql :z) list (:* (not (member :x :y)) t))

(:? (eql :x) symbol (:* (not (eql :y)) t))

(:? (eql :y) string (:* t t)))

(:1 (:? (eql :y) string (:* (not (member :x :z)) t))

(:? (eql :z) list (:* (not (eql :x)) t))

(:? (eql :x) symbol (:* t t)))

(:1 (:? (eql :z) list (:* (not (member :x :y)) t))

(:? (eql :y) string (:* (not (eql :x)) t))

(:? (eql :x) symbol (:* t t))))))

Figure 3: Regular type expression matching de-

structuring lambda list Case-1

(destructuring-case data

;; Case-1

((&whole llist

a (b c)

&rest keys

&key x y z

&allow-other-keys)

(declare (type fixnum a b c)

(type symbol x)

(type string y)

(type list z))

...)

;; Case-2

((a (b c)

&rest keys

&key x y z)

(declare (type fixnum a b c)

(type symbol x)

(type string y)

(type list z))

...))

4. IMPLEMENTATION OVERVIEW

Using an rte involves several steps. The following subsec-

tions describe these steps.

1. Convert a parameterized rte type into code that will

perform run-time type checking.

2. Convert the regular type expression to DFA (determin-

istic ﬁnite automaton, sometimes called a ﬁnite state

machine).

3. Decompose a list of type speciﬁers into disjoint types.

4. Convert the DFA into code which will perform run-

time execution of the DFA.

4.1 Type deﬁnition

The rte type is deﬁned by deftype.

(deftype rte (pattern)

‘(and sequence

(satisfies ,(compute-match-function

pattern))))

As in this deﬁnition, when the satisfies type speci-

ﬁer is used, it must be given a symbol naming a glob-

ally callable unary function. In our case compute-match-

function accepts a regular type expression, such as (:1

number (:* string)), and computes a named unary predi-

cate. The predicate can thereafter be called with a sequence

and will return true or false indicating whether the se-

quence matches the pattern. Notice that the pattern is usu-

ally provided at compile-time, while the sequence is pro-

vided at run-time. Furthermore, compute-match-function

ensures that given two patterns which are EQUAL, the same

function name will be returned, but will only be created and

compiled once. An example will make it clearer.

(deftype 3-d-point ()

‘(rte (:1 number number number)))

The type 3-d-point invokes the rte parameterized type

deﬁnition with argument (:1 number number number). The

deftype of rte assures that a function is deﬁned as follows.

The function name, |(:1 number number number)| even if

somewhat unusual, is so chosen to improve the error message

and back-trace that occurs in some situations.

(defun rte::|(:1 number number number)|

(input-sequence)

(match-sequence input-sequence

’(:1 number number number)))

The following back-trace occurs when attempting to eval-

uate a failing assertion.

CL-USER> (the 3-d-point (list 1 2))

The value (1 2)

is not of type

(OR (AND #1=(SATISFIES |(:1 NUMBER NUMBER NUMBER)|)

CONS)

(AND #1# NULL) (AND #1# VECTOR)

(AND #1# SB-KERNEL:EXTENDED-SEQUENCE)).

[Condition of type TYPE-ERROR]

It is also assured that the DFA corresponding to (:1 num-

ber number number) is built and cached, to avoid unneces-

sary re-creation at run-time. Finally, the type speciﬁer (rte

(:1 number number number)) expands to the following.

(and sequence

(satisfies |(:1 number number number)|))

A caveat of using rte is that the usage must obey a restric-

tion posed by the Common Lisp speciﬁcation [15, Section

DEFTYPE]. A self-referential type deﬁnition is not valid.

Common Lisp speciﬁcation states: Recursive expansion of

the type speciﬁer returned as the expansion must terminate,

including the expansion of type speciﬁers which are nested

within the expansion.

As an example of this limitation, here is a failed attempt

to implement a type which matches a unary tree, i.e. a type

whose elements are 1, (1), ((1)), (((1))), etc.

CL-USER> (deftype unary-tree ()

‘(or (eql 1)

(rte unary-tree)))

UNARY-TREE

RTE> (typep ’(1) ’unary-tree)

Control stack exhausted (no more space for function call

frames). This is probably due to heavily nested or

infinitely recursive function calls, or a tail call that

SBCL cannot or has not optimized away.

PROCEED WITH CAUTION.

[Condition of type SB-KERNEL::CONTROL-STACK-EXHAUSTED]

∂

∅ = ∅

∂

ε = ∅

∂

a = ε

∂

b = ∅ for b 6= a

∂

(r ∪ s) = ∂

r ∪ ∂

∂

(r · s) =

(

(∂

r ) · s, if r is not nullable

(∂

r ) · s ∪ ∂

s, if r is nullable

∂

(r ∩ s) = ∂

r ∩ ∂

∂

∗

) = (∂

r ) · r

∗

∂

) = (∂

r ) · r

∗

Figure 4: Rules for the Brzozowski derivative

4.2 Constructing a DFA

In order to determine whether a given sequence matches a

particular regular type expression, we conceptually execute

a DFA with the sequence as input. Thus we must convert

the regular type expression to a DFA. This need only be

done once and can often be done at compile time.

4.2.1 Rational derivative

In 1964, Janusz Brzozowski [3] introduced the concept of

the Rational Language Derivative, and provided a theory for

converting a regular expression to a DFA. Additional work

was done by Scott Owens et al. [12] which presented the

algorithm in easy to follow steps.

To understand what the rational expression derivative is

and how to calculate it, ﬁrst think of a rational expression in

terms of its language, i.e. the set of sequences the expression

generates. For example, the language of ((a|b) · c

∗

· d) is the

set of words (ﬁnite sequences of letters) which begin with

exactly one letter a or exactly one letter b, end with exactly

one letter d and between contain zero or more occurrences

of the letter c.

The derivative of the language with respect to a given

letter is the set of suﬃxes of words which have the given

letter as preﬁx. Analogously the derivative of the rational

expression is the rational expression which generates that

language. E.g., ∂

((a|b) · c

∗

· d) = (c

∗

· d).

The Owens [12] paper explains a systematic algorithm

for symbolically calculating such derivatives. The formu-

las listed in Figure 4 detail the calculations which must be

recursively applied to calculate the derivative.

4.2.2 DFA for regular expressions

Another commonly used algorithm for constructing a DFA

was inspired by Ken Thompson [18, 17] and involves decom-

posing a rational expression into a small number of cases

such as base variable, concatenation, disjunction, and Kleene

star, then following a graphical template substitution for

each case. While this algorithm is easy to implement, it

has a serious limitation. It is not able to easily express

automata resulting from the intersection or complemention

of rational expressions. We rejected this approach as we

would like to support regular type expressions containing

the keywords :and and :not, such as in (:and (:* t inte-

ger) (:not (:* float t))).

We chose the algorithm based on Brzozowski derivatives.

Initial state Create the single initial state, and label it

with the original rational expression. Seed a to-do list

with this initial state. Seed a visited list to ∅.

States While the to-do list is non empty, operate on the

ﬁrst element as follows:

1. Move the state from the to-do list to the visited

list.

2. Get the expression associated with the state.

3. Calculate the derivative of this expression with

respect to each letter of the necessarily ﬁnite al-

phabet.

4. Reduce each derivative to a canonical form.

5. For each canonical form that does not correspond

to a state in the to-do nor visited list, create a

new state corresponding to this expression, and

add it to the to-do list.

Transitions Construct transitions between states as fol-

lows: If S

is the expression associated with state P

and S

is the expression associated with state P

and

∂

= S

, then construct a transition from state P

to state P

labeled a.

Final states If the rational expression labeling a state is

nullable, i.e. if it matches the empty word, label the

state a ﬁnal state.

Brzozowski argued that this procedure terminates because

there is only a ﬁnite number of derivatives possible, modulo

multiple equivalent algebraic forms. Eventually all the ex-

pressions encountered will be algebraically equivalent to the

derivative of some other expression in the set.

4.2.3 DFA for regular type expressions

The set of sequences of Common Lisp objects is not a

rational language, because for one reason, the perspective

alphabet (the set of all possible Common Lisp objects) is

not a ﬁnite set. Even though the set of sequences of objects

is inﬁnite, the set of sequences of type speciﬁers is a ratio-

nal language, if we only consider as the alphabet, the set of

type speciﬁers explicitly referenced in a regular type expres-

sion. With this choice of alphabet, sequences of Common

Lisp type speciﬁers conform to the deﬁnition of words in a

rational language.

There is a delicate matter when the mapping of sequence

of objects to sequence of type speciﬁers: the mapping is

not unique. This issue is ignored for the moment, but is

discussed in Section 4.4.

Consider the extended rational type expression P

(symbol · (number

∪ string

))

. We wish to construct

a DFA which recognizes sequences matching this pattern.

Such a DFA is shown in Figure 5.

First, we create a state P

corresponding to the given

rational type expression.

Next we proceed, to calculate the derivative with respect

to each type speciﬁer mentioned in P

and construct states

, P

, and P

as those are the unique derivative forms which

are obtained by the calculation. We discard the ∅ value.

∂

number

= ∅

∂

string

= ∅

0 1

symbol

number

string

symbol

number

symbol

string

Figure 5: Example DFA

∂

symbol

= (number

∪ string

)

· (symbol · (number

∪ string

))

∗

= P

∂

number

= number

∗

· (symbol · (number

∪ string

))

∗

= P

∂

string

= string

∗

· (symbol · (number

∪ string

))

∗

= P

∂

symbol

= ∅

∂

number

= P

∂

string

= ∅

∂

symbol

= P

∂

number

= ∅

∂

string

= P

∂

symbol

= P

Next, we label the transitions between states with the

type speciﬁer which was used in the derivative calculation

between those states. We ignore transitions from any state

to the ∅ state.

Finally, we label the ﬁnal states. They are P

and P

because only those two states are nullable. I.e. (number

∗

(symbol · (number

∪ string

))

∗

) can match the empty

sequence, and so can (string

∗

· (symbol · (number

∪

string

))

∗

)

4.3 Optimized code generation

The mechanism we chose for implementing the execution

(as opposed to the generation) of the DFA was to generate

specialized code based on typecase, block, and go. As an

example, consider the DFA shown in Figure 5. The code in

Figure 6 was generated given this DFA as input.

The code is organized according to a regular pattern. The

typecase, commented as OUTER-TYPECASE switches on the

type of the sequence itself. Whether the sequence, seq,

matches one of the carefully ordered types list, simple-

vector, vector, or sequence, determines which functions

are used to access the successive elements of the sequence:

svref, incf, pop, etc.

The ﬁnal case, sequence, is especially useful for applica-

tions which wish to exploit the SBCL feature of Extensible

sequences [10, Section 7.6] [13]. One of our rte based appli-

cations uses extensible sequences to view vertical and hor-

izontal slices of 2D arrays as sequences in order to match

(lambda (seq)

(declare

(optimize (speed 3) (debug 0) (safety 0)))

(block check

(typecase seq ; OUTER-TYPECASE

(list

(tagbody

(when (null seq)

(return-from check nil)) ; rejecting

(typecase (pop seq) ; INNER-TYPECASE

(symbol (go 1))

(t (return-from check nil)))

(when (null seq)

(return-from check nil)) ; rejecting

(typecase (pop seq) ; INNER-TYPECASE

(number (go 2))

(string (go 3))

(t (return-from check nil)))

(when (null seq)

(return-from check t)) ; accepting

(typecase (pop seq) ; INNER-TYPECASE

(number (go 2))

(symbol (go 1))

(t (return-from check nil)))

(when (null seq)

(return-from check t)) ; accepting

(typecase (pop seq) ; INNER-TYPECASE

(string (go 3))

(symbol (go 1))

(t (return-from check nil)))))

(simple-vector

...)

(vector

...)

(sequence

...)

(t nil))))

Figure 6: Generated code recognizing an RTE

certain patterns within row vectors and column vectors.

While the code is iterating through the sequence, if it en-

counters an unexpected end of sequence, or an unexpected

type, the function returns nil. These cases are commented

as rejecting. Otherwise, the function will eventually en-

counter the end of the sequence and return t. These cases

are commented accepting in the ﬁgure.

Within the inner section of the code, there is one label per

state in the state machine. In the example, the labels P

, P

, and P

are used, corresponding to the states in the

DFA in Figure 5. At each step of the iteration, a check is

made for end-of-sequence. Depending on the state either t

or nil is returned depending on whether that state is a ﬁnal

state of the DFA or not.

The next element of the sequence is examined by the

integer

number ∩ integer

integer

number

Figure 7: Example DFA with disjoint types

INNER-TYPECASE, and depending of the type of the object

encountered, control is transferred (via go) to a label corre-

sponding to the next state.

One thing to note about the complexity of this function

is that the number of states encountered when the function

is applied to a sequence is equal or less than the number of

elements in the sequence. Thus the time complexity is linear

in the number of elements of the sequence and is independent

of the number of states in the DFA.

In some cases the same type may be speciﬁed with ei-

ther the rte syntax or with the Common Lisp native cons

type speciﬁer. For example, a list of three numbers can

be expressed either as (cons number (cons number (cons

number null))) or as (rte (:1 number number number)).

Should the rte system internally exploit the cons speci-

ﬁer when possible, thus avoiding the creation of ﬁnite state

machines? We began investigating this possibility, but aban-

doned the investigation on discovering that it lead to signif-

icant performance degradation for long lists. We measured

roughly a 5% penalty for lists of length 5. The penalty grew

for longer lists: 25% with a list length of 10, 40% with a list

length of 20.

4.4 The overlapping types problem

In the example in Section 4.2.3, all the types considered

(symbol, string, and number) were disjoint. If the same

method is naively used with types which are intersecting,

the resulting DFA will not be a valid representation of the

rational expression. Consider the rational expression in-

volving the intersecting types integer and number: P

((number · integer ) ∪ (integer · number)). The sequences

which match this expression are sequences of two numbers,

at least one of which is an integer. Unfortunately, when we

calculate ∂

number

and ∂

integer

we arrive at a diﬀerent

result.

∂

number

= ∂

number

( (number · integer)

∪ (integer · number))

= ∂

number

(number · integer)

∪ ∂

number

(integer · number)

= (∂

number

number ) · integer

∪ (∂

number

integer) · number

= ε · integer ∪ ∅ · number

= integer ∪ ∅

= integer

Without continuing to calculate all the derivatives, it is

already clear that this result is wrong. If you start with the

set of sequences of two numbers one of which is an integer,

and out of that ﬁnd the subset of sequences starting with a

number, we get back the entire set. The set of suﬃxes of

this set is not the set of singleton sequences of integer.

To address this problem, we augment the algorithm of Br-

zozowski with an additional step. Rather than calculating

the derivative at each state with respect to each type men-

tioned in the regular type expression, some of which might

be overlapping, instead we calculate a disjoint set of types.

More speciﬁcally, given a set A of overlapping types, we cal-

culate a set B which has the properties: Each element of B

is a subtype of some element of A, any two elements B are

disjoint from each other, and ∪A = ∪B.

Figure 7 illustrates such a disjoint union. The set of over-

lapping types A = {number, integer} has been replaced

with the set of disjoint types B = {number∩integer, integer}.

This extra step has two positive eﬀects on the algorithm.

1) it assures that the constructed automaton is determin-

istic, i.e., we assure that all the transitions leaving a state

specify disjoint types, and 2) it forces our treatment of the

problem to comply with the assumptions required by the the

Brzozowski/Owens algorithm.

The algorithm for decomposing a set of types into a set

of disjoint types is an interesting research topic in its own

right. While this topic is still under investigation, we have

several algorithms which work very well for a small number

of types (i.e. lists of up to 15 types). At the inescapable

core of each algorithm is Common Lisp function subtypep

[2]. This function is crucial not only in type speciﬁer simpliﬁ-

cation, needed to test equivalence of symbolically calculated

Brzozowski derivatives, but also in deciding whether two

given types are disjoint. For example, we know that string

and number are disjoint because (and string number) is a

subtype of nil.

We explicitly omit further discussion of that algorithm in

this article. We will consider it for future publication. For a

complete exposition of our ongoing research into this topic,

see the project report on the LRDE website [11].

5. RELATED WORK

Attempts to implement destructuring-case are numer-

ous. We mention three here. R7RS Scheme provides case-

lambda [14, Section 4.2.9] which appears to be syntactically

similar construct, allowing argument lists of various ﬁxed

lengths. However, according to the speciﬁcation nothing

similar to Common Lisp style destructuring is allowed.

The implementation of destructuring-case provided in

[6] does not have the feature of selecting the clause to be

executed according to the format of the list being destruc-

tured. Rather it uses the ﬁrst element of the given list as a

case-like key. This key determines which pattern to use to

destructure the remainder of the list.

The implementation provided in [7], named destructure-

case, provides similar behavior to that which we have devel-

oped. It destructures the given list according to which of the

given patterns matches the list. However, it does not handle

destructuring within the optional and keyword arguments.

(destructuring-case ’(3 :x (4 5))

((a &key ((:x (b c))))

(list 0 a b c)) ;; this clause should be taken

((a &key x)

(list 2 a x))) ;; not this clause

In none of the above cases does the clause selection con-

sider the types of the objects within the list being destruc-

tured. Clause selection also based on type of object is a

distinguishing feature of the rte based implementation of

destructuring-case.

The rte type along with destructuring-bind and type-

case as mentioned in Section 3.3 enables something similar

to pattern matching in the XDuce language [8]. The XDuce

language allows the programmer to deﬁne a set of functions

with various lambda lists, each of which serves as a pat-

tern available to match particular target structure within

an XML document. Which function gets executed depends

on which lambda list matches the data found in the XML

data structure.

XDuce introduces a concept called regular expression types

which indeed seems very similar to regular type expressions.

In [8] Hosoya et al. introduce a semantic type approach to

describe a system which enables their compiler to guarantee

that an XML document conform to the intended type. The

paper deals heavily with assuring that the regular expression

types are well deﬁned when deﬁned recursively, and that

decisions about subtype relationships can be calculated and

exploited.

A notable distinction of the rte implementation as op-

posed to the XDuce language is that our proposal illustrates

adding such type checking ability to an existing type system

and suggests that such extensions might be feasible in other

existing dynamic or reﬂective languages.

The concept of regular trees, is more general that what rte

supports, posing interesting questions regarding apparent

shortcomings of our approach. The semantic typing concept

described in [8] indeed seems to have many parallels with

the Common Lisp type system in that types are deﬁned by

a set of objects, and sub-types correspond to subsets thereof.

These parallels would suggest further research opportunities

related to rte and Common Lisp. However, the limitation

that rte cannot be used to express trees of arbitrary depth

as discussed in Section 4.1 seems to be a signiﬁcant limita-

tion of the Common Lisp type system. Furthermore, the use

of satisfies in the rte type deﬁnition, seriously limits the

subtypep function’s ability to reason about the type. Con-

sequently, programs cannot always use subtypep to decide

whether two rte types are disjoint or equivalent, or even if

a particular rte type is empty. Neither can the compiler

dependably use subtypep to make similar decisions to avoid

redundant assertions in function declarations.

It is not clear whether Common Lisp could provide a way

for a type deﬁnition in an application program to extend

the behavior of subtypep. Having such a capability would

allow such an extension for rte. Rational language the-

ory does provide a well deﬁned algorithm for deciding such

questions given the relevant rational expressions [9, Sections

4.1.1, 4.2.1]. It seems from the speciﬁcation that a Com-

mon Lisp implementation is forbidden from allowing self-

referential types, even in cases where it would be possible to

do so.

6. CONCLUSIONS

In this paper we presented a Common Lisp type deﬁnition,

rte, which implements a declarative pattern based approach

for declaring types of heterogeneous sequences illustrating it

with several motivating examples. We further discussed the

implementation of this type deﬁnition and its inspiration

based in rational language theory. While the total compu-

tation needed for such type checking may be large, our ap-

proach allows most of the computation to be done at compile

time, leaving only an O(n) complexity calculation remaining

for run-time computation.

Our contributions are

1. recognizing the possibility to use principles from ratio-

nal theory to address the problem dynamic type check-

ing of sequences in Common Lisp,

2. adapting the Brzozowski derivative algorithm to se-

quences of lisp types by providing an algorithm to sym-

bolically decompose a set of lisp types into an equiva-

lent set of disjoint types,

3. implementing an eﬃcient O(n) algorithm to pattern

match an arbitrary lisp sequence, and

4. implementing concrete rte based algorithms for recog-

nizing certain commonly occurring sequence patterns.

For future extensions to this research we would like to ex-

periment with extending the subtypep implementation to al-

low application level extensions, and therewith examine run-

time performance when using rte based declarations within

function deﬁnitions.

Another topic we would like to research is whether the

core of this algorithm can be implemented in other dynamic

languages, and to understand more precisely which features

such a language would need to have to support such imple-

mentation.

7. REFERENCES

[1] Declaring the elements of a list, discussion on

comp.lang.lisp, 2015.

[2] H. G. Baker. A decision procedure for Common Lisp’s

SUBTYPEP predicate. Lisp and Symbolic

Computation, 5(3):157–190, 1992.

[3] J. A. Brzozowski. Derivatives of regular expressions. J.

ACM, 11(4):481–494, 1964.

[4] R. D. Cameron. Perl style regular expressions in

Prolog, CMPT 384 lecture notes, 1999.

[5] J. Chroboczek. CL-Yacc, a LALR(1) parser generator

for Common Lisp, 2009.

[6] P. Domain. Alexandria implementation of

destructuring-case.

[7] N. Funato. Public domain implementation of

destructuring-bind, 2013.

[8] H. Hosoya, J. Vouillon, and B. C. Pierce. Regular

expression types for XML. ACM Trans. Program.

Lang. Syst., 27(1):46–90, Jan. 2005.

[9] J. D. U. Johh E. Hopcroft, Rajeev Motwani.

Introduction to Automata Theory, Languages, and

Computation. Addison Wesley, 2001.

[10] W. H. Newman. Steel Bank Common Lisp user

manual, 2015.

[11] J. Newton. Report: Eﬃcient dynamic type checking of

heterogeneous sequences. Technical report,

EPITA/LRDE, 2016.

[12] S. Owens, J. Reppy, and A. Turon. Regular-expression

derivatives re-examined. J. Funct. Program.,

19(2):173–190, Mar. 2009.

[13] C. Rhodes. User-extensible sequences in Common

Lisp. In Proceedings of the 2007 International Lisp

Conference, ILC ’07, pages 13:1–13:14, New York, NY,

USA, 2009. ACM.

[14] A. Shinn, J. Cowan, and A. A. Gleckler. Revised 7

report on the algorithmic language scheme. Technical

report, 2013.

[15] Ansi. American National Standard: Programming

Language – Common Lisp. ANSI X3.226:1994

(R1999), 1994.

[16] E. Weitz. Common Lisp Recipes: A Problem-solution

Approach. Apress, 2015.

[17] G. Xing. Minimized Thompson NFA. Int. J. Comput.

Math., 81(9):1097–1106, 2004.

[18] F. Yvon and A. Demaille. Th´eorie des Langages

Rationnels. 2014.