Finite Automata Theory Based Optimization of Conditional

Variable Binding

Jim E. Newton

Didier Verna

jnewton@lrde.epita.fr

didier@lrde.epita.fr

EPITA/LRDE

Le Kremlin-Bicêtre, France

ABSTRACT

We present an ecient and highly optimized implementation of

destructuring-case

in Common Lisp. This macro allows the se-

lection of the most appropriate destructuring lambda list of several

given based on structure and types of data at run-time and there-

after dispatches to the corresponding code branch. We examine an

optimization technique, based on nite automata theory applied

to conditional variable binding and execution, and type-based pat-

tern matching on Common Lisp sequences. A risk of ineciency

associated with a naive implementation of

destructuring-case

is that the candidate expression being examined may be traversed

multiple times, once for each clause whose format fails to match,

and nally once for the successful match. We have implemented

destructuring-case

in such a way to avoid multiple traversals of

the candidate expression. This article explains how this optimiza-

tion has been implemented.

CCS CONCEPTS

• Theory of computation → Data structures design and anal-

ysis; Type theory;

ACM Reference Format:

Jim E. Newton and Didier Verna. 2019. Finite Automata Theory Based

Optimization of Conditional Variable Binding. In Proceedings of The 12th

European Lisp Symposium (ELS’19). ACM, New York, NY, USA, 8 pages.

1 INTRODUCTION

The Common Lisp macro

destructuring-bind

[

Ans94

] binds the

variables specied in a given lambda list to the corresponding val-

ues in the tree structure resulting from the evaluation of a given

expression. However, in the case that the tree structure of the ex-

pression does not coincide with the given lambda list, a run-time

error is signaled. This error may pose a challenge to the program-

mer. The problem, simply stated, is that the destructuring lambda

list [

Ans94

, Section 3.4.5] is specied at compile time, and the ex-

pression is evaluated at run-time. Thus, it may not be possible to

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specic permission and/or a

fee. Request permissions from permissions@acm.org.

ELS’19, April 1–2 2018, Genova, Italy

ACM ISBN 978-2-9557474-2-1.. . $15.00

( de st ru cturing -c as e ex pr essio n

(( X Y)

( d ecl are ( type fi xnu m X Y))

: c lause -1 )

(( X Y)

( d ecl are ( type fi xnu m X)

( type in teger Y))

: c lause -2 )

(( X Y)

( d ecl are ( type ( or s tri ng f ixn um ) X)

( type nu mbe r Y ))

: c lause -3 ))

Figure 1: Example of destructuring-case usage.

know until run-time that the input data is problematic. In certain

cases the programmer would like specify the run-time behavior

to take if the match fails, rather than having an error signaled.

This behavior cannot be specied portably using the condition sys-

tem [

Ans94

, Chapter 9], because the condition signaled is simply

of type

error

with no additional information about exactly what

failed. Furthermore, the programmer may not wish to signal an

error at all, but rather detect the actual run-time pattern of the

input data and proceed dierently depending on which format of

data is discovered.

We presented

destructuring-case

in [

NDV16

] as a mecha-

nism to test run-time adherence of the destructuring lambda list to

the value of a candidate expression. An example usage of this macro

can be seen in Figure 1. This example shows three clauses, each with

the same lambda list,

(X Y)

, but with dierent type declarations.

In general, a usage of

destructuring-case

may use radically dif-

ferent lambda lists, which dier in number of variables, having

dierent

&optional

and

&key

sections, and also using dierent

hierarchical structure of the variables.

The semantics of

destructuring-case

are that the value of

the given

expression

is tested in turn against each of the given

destructuring lambda lists, until a match is found, i.e. a match in

both hierarchical structure and type of values. Only at such time

are the indicated consequent expressions or any default values

evaluated. This restriction is especially important if there are side-

eects in the default values of optional arguments in the lambda

lists such as (... &optional (x (incf *global-var*))).

ELS’19, April 1–2 2018, Genova, Italy Jim E. Newton and Didier Verna

( r te-ca se expr es si on

((: cat fixn um fixnum )

( de st ru cturing -b in d ( X Y) e xp ressi on

: c lause -1 ))

((: cat fixn um in teg er )

( de st ru cturing -b in d ( X Y) e xp ressi on

: c lause -2 ))

((: cat ( or string fixnum ) number )

( de st ru cturing -b in d ( X Y) e xp ressi on

: c lause -3 )))

Figure 2: Expansion of destructuring-case from Figure 1

into rte-case.

( r te-ca se expr es si on

((: cat fixn um fixnum )

: c lause -1 )

((: cat fixn um in teg er )

: c lause -2 )

((: cat ( or string fixnum ) number )

: c lause -3 ))

Figure 3: Simple example of rte-case from Figure 2.

The implementations of the macros discussed in this arti-

cle, including

destructuring-case

rte-case

rte-ecase

, and

bdd-typecase

, are available in Quicklisp

via the package

:regular-type-expression.

2 FROM DESTRUCTURING-CASE TO RTE-CASE

Our implementation of

destructuring-case

converts its input of

destructuring lambda lists to rte (regular type expression) and then

outputs an invocation of

rte-case

. The essential part of such an

expansion is shown in Figure 2. An rte, introduced in [

NDV16

], is

Common Lisp syntax to specify a set of sequences, i.e. a subtype of

the

sequence

type. We explain in Section 2.2 how a destructuring

lambda list is converted to an rte.

As can be seen in Figure 2, each destructuring lambda list has

been converted to an rte such as

(:cat fixnum fixnum)

in the rst

clause, followed by a call to

destructuring-bind

. As is implied by

the syntax, the

destructuring-bind

will only be executed at run-

time if the value of the candidate expression matches the pattern

designated by the rte.

We further notice in the simplistic example shown in Figure 2,

that no

destructuring-bind

in the

rte-case

expansion plays

any role. The variables bound by the

destructuring-bind

are

not used in the expressions which follow. Therefore, in our further

discussion we will refer to the even simpler, semantically equivalent

code in Figure 3.

A straightforward expansion of

rte-case

might include succes-

sive type checks of

expression

such as suggested in Figure 4. Such

Quicklisp,

https://www.quicklisp.org/

, is a public repository, maintained by Zach

Beane, consisting of user contributed Common Lisp libraries.

( t ypeca se expr es si on

(( rte (: cat f ixn um f ixn um ))

: c lause -1 )

(( rte (: cat f ixn um i ntege r ))

: c lause -2 )

(( rte (: cat ( or st rin g fix num ) n umb er ))

: c lause -3 ))

Figure 4: Naive expansion of rte-case from Figure 2

an expansion would be semantically correct, but inecient because

the sequence

expression

would be traversed three times in the

worst case, to determine which consequent clause to evaluate. As

will be seen, our technique eliminates these redundant traversals,

allowing one single traversal of the sequence to be executed and

thereby determining which consequent expressions to evaluate.

2.1 Examples of rte Syntax

The grammar an rte is explicitly detailed in [

New18

], but the basic

grammar can be understood intuitively, assuming the reader has

a basic understanding of string-based regular expression syntax.

The concatenation operator,

:cat

species a sequences successive

elements: e.g.,

(:cat fixnum string)

denotes a sequence of ex-

actly two elements, the rst of type

string

and the second of type

string

. To make the

string

optional use the syntax

(:cat fixnum

(:? string))

. To specify the occurrence, zero or more times, of

fixnum

followed by an optional string, use

(:cat (:* fixnum)

(:? string))

. Substitute

for

to expression an occurrence

one or more times Finally, expressions may be combined logically

using

:and

:or

, and

:not

, e.g.,

(:or (:cat fixnum string))

(:+ (:not number)).

2.2 From Destructuring Lambda List to rte

In this section we summarize how a destructuring lambda list and

associated type declarations may be converted into an rte. The

conversion procedure is explained in more detail in [NDV16].

The set of lists which are valid argument lists for a given invoca-

tion of

destructuring-bind

with an optional set of type declara-

tions can be characterized by an rte. A destructuring lambda list,

such as used in

destructuring-bind

, species a required portion,

denoted by a leading sequence of variables; an optional portion,

delimited by

&optional

; and a repeating portion of keyword value

pairs, delimited by

&key

. To construct the rte corresponding to a

given destructuring lambda list, we construct the required-rte, the

optional-rte, and the repeating-rte, and concatenate them using the

:cat operator.

(:cat required-rte optional-rte repeating-rte )

As an example, consider the lambda list shown in Figure 5. The

required portion and optional portions are easy.

The reader may well notice that a fourth traversal is also necessary to evaluate the

destructuring-bind

which is present in each of the consequent clauses. In this paper

we do not address the elimination of this fourth traversal.

Finite Automata Theory Based Optimization of Conditional Variable Binding ELS’19, April 1–2 2018, Genova, Italy

( de st ru cturing -b in d ( A B & op tiona l Q & key X Y)

ex pr essio n

( d ecl are ( type st rin g A B)

( type list Q)

( type real X)

( type in teger Y ))

...)

Figure 5: Example destructuring-bind with declarations

required-rte = (:cat string string)

optional-rte = (:? list)

The repeating portion deserves careful attention; we consider

two restrictions.

(1)

&allow-other-keys

is not given, such as is the case in

Figure 5, then the only allowed keywords are those explicitly

specied. In our case the only allowed keywords are

and :Y, meaning the repeating portion is also of the form

(:* (:cat (member :X :Y) t)) .

(2)

Type declarations such as

(declare real X)

only restrict

the value associated with the rst occurrence of each key-

word in an argument list, because only the rst such oc-

currence is bound the the associated variable [

Ans94

, Sec-

tion 3.3.4]. A keyword portion of the argument list such as

(:X 1.2 :X ’not-real)

is perfectly valid, whereas

(:X

’not-real :X 1.2)

is not. Thus, we iterate over all speci-

ed keywords, generating one pattern for each. The pattern

handling

&key X

requires that either there is either no

given, or that the rst

is followed by a

real

. See the note

restriction 2 in Figure 6.

Putting all these restrictions together, we have the rte in Figure 6

representing the

destructuring-bind

with type declarations in

Figure 5.

There are several other features of

destructuring-bind

which

are supported by

destructuring-case

, but whose details we omit

in this discussion, including tree structure variables/data, default

values, supplied-p-parameter, &allow-other-keys, and others.

3 FROM RTE-CASE TO INDIVIDUAL DFAS

Each rte shown in Figure 3 can be converted to ecient type check-

ing Common Lisp code, as explained in [

NDV16

]. Such conversion

involves rst converting each rte to a deterministic nite automa-

ton (DFA), where the transition labels represent type checks for

successive elements of the candidate expression. Figure 7 shows

the three DFAs corresponding to the rte-case in Figure 3.

We now summarize how a deterministic nite automata (DFA) is

constructed, given an rte. Some approaches to such generation, such

as [

YD14

HMU06

], involve constructing a non-deterministic nite

automaton and thereafter determinizing it. We use the technique

presented by Brzozowski [

Brz64

] and claried by Owens [

ORT09

The Brzozowski algorithm uses a technique called the rational

derivative, to construct a DFA, and thereby obviating the necessity

(: cat

;; re qu ir ed -r te

(: cat st rin g st rin g )

;; op ti on al -r te

(:? list )

;; re pe at in g- rt e

(: and

;; re st ri ction 1

(:* (: cat ( mem ber : X : Y ) t ))

;; re st ri ction 2 for : X real

(: or (:* (: cat (: not ( eql : X )) t ))

(: cat (:* (: cat (: not ( eql : X )) t ))

( eql : X) real

(:* t )))

;; re st ri ction 2 for : Y i nte ger

(: or (:* (: cat (: not ( eql : Y )) t ))

(: cat (:* (: cat (: not ( eql : Y )) t ))

( eql : Y) integ er

(:* t ) ))))

Figure 6: The rte representing the destructuring-bind and

type declarations from Figure 5.

(:cat fixnum fixnum)

1.0 1.1

1.2

clause-1

(:cat fixnum integer)

2.0 2.1

2.2

clause-2

(:cat (or string fixnum) number)

3.0 3.1

3.2

clause-3

Label Type specier

fixnum

integer

number

(or fixnum string)

Figure 7: Automata for clauses of rte-case in Figure 2

to determinize the result. In [

NDV16

New18

], we explain how the

rational derivative can be extended to accommodate Common Lisp

types, in particular rather than calculating the rational derivative (as

Owens suggests) with respect to each letter of the alphabet, instead

ELS’19, April 1–2 2018, Genova, Italy Jim E. Newton and Didier Verna

we calculate the derivative with respect each type calculated in the

maximal disjoint type decomposition as explained in [NVC17].

3.1 Constructing States and Transitions

The algorithm can be summarized as follows. Each state in the DFA

represents all the possible futures which are accepting. Moreover,

there is a (not necessarily unique) rte which expresses that set of

futures. For example, let:

= (:or (:cat number string) (:cat fixnum float))

be the rte representing all the sequences of either a

number

followed

by a

string

or a

fixnum

followed by a

float

. Suppose there is a

state in the DFA associated with this rte. Now we consider all the

possible types of the rst element of such a sequence. And for each

such rst element type, we calculate what the remaining future

would be given that the rst element of that type. If the rst element

is a

fixnum

, then the future is a sequence containing either a

string

or a

float

. Such a sequence is denoted by the rte

(:or string

float). In terms of the rational derivative we say:

= ∂

fixnum

= (:or string float) .

If, on the other hand, the rst element is not a

fixnum

but is a

number

, then the remaining sequence whose only element is a

string. That is to say:

= ∂

(and number (not fixnum))

= string .

Since there is no other possible rst element of

, we con-

struct two additional states,

and

and construct two transitions

→ P

labeled

fixnum

, and

→ P

labeled

(and number (not

fixnum)).

We continue this process until all the futures of each state have

been calculated, generating all the possible states, and all the possi-

ble transitions between the states.

3.2 Associating Code with Accepting States

DFAs used for matching pattern languages such as regular expres-

sions, normally represent Boolean functions; returning TRUE if the

sequence matches the expression, and FALSE otherwise. In our case

each accepting state of the DFAs in Figure 7 indicate which code

paths to take in the originating

rte-case

, Figure 3. This problem

is easily addressed. We have simply extended our

state

object

(Clos class [

GWB91

Kee89

]) to contain a slot indicating a piece of

continuation code to be serialized in the nal macro expansion.

3.3 Overlapping Clauses

The synchronized cross-product (SXP) of two or more given DFAs is

a single DFA whose behavior simultaneously emulates the behavior

of the given DFAs. Typically such a cross-product implements the

intersection or union languages of the input DFAs; however the

semantics of such a cross-product can be taken to be any Boolean

combination of the input.

For example, to implement the symmetric dierence language

we apply the Boolean XOR function; a state, X, in the SXP, cor-

responding to states A and B from two given DFAs, is marked as

an accepting state if A XOR B are accepting (if either but not both

are accepting). In our case we would like to select the code for

evaluation corresponding to the code appearing rst in the original

( r te-ca se expr es si on

((: cat fixn um fixnum )

: c lause -1 )

((: and (: cat fixnum i nte ger )

(: not (: cat f ixn um f ixn um )))

: c lause -2 )

((: and (: cat ( or string fi xnu m ) n umb er )

(: not (: cat f ixn um f ixn um ))

(: not (: cat f ixn um f ixn um )))

: c lause -3 ))

Figure 8: Example of rte-case with pairwise disjoint pat-

terns

destructuring-case

; so we need priority based selection, rather

than simply a Boolean function.

An important property of the behavior of

rte-case

is that if

more than one pattern matches the expression in question, then

the clause appearing rst has priority over the others. For example,

in the code in Figure 3, if the value of

expression

is the list

, then all three rtes match; nevertheless

:clause-1

must be the

return value.

An approach of addressing this ambiguity is to extend or aug-

ment the patterns so that they are mutually exclusive; i.e. assure

that no two patterns simultaneously match any candidate expres-

sion. The code shown in Figure 8 is equivalent to that in Figure 3

but any input expression,

(1 2)

, for example, matches at most

one pattern. This pattern augmentation can be accomplished as a

code transformation. The pattern corresponding to

:clause-1

unchanged, but the subsequent clauses have been augmented to

emphasize that those clauses are never reached if any prior pattern

matches.

These rtes correspond to the DFAs shown in Figure 9. The rst

DFA is exactly the same as before, but we notice in the second DFA

that the state labeled 2.2 is non-coäccessible; i.e., there is no path

from state 2.2 to any accepting state. This non-useful state corre-

sponds to

(:not (:cat fixnum fixnum))

in the input pattern,

and it enforces that a sequence consisting of two objects of type

fixnum

, is a rejected sequence rather than a matching sequence.

The third DFA in the gure contains a similar state, 3.4, but in

addition, contains two states 3.2 and 3.5 which are equivalent to

each other.

The disjoining process described here produces DFAs which

have redundant or non-coäccessible states. Despite this fact, these

slightly more complex DFAs play an important role in the SXP con-

struction, because the process guarantees that the SXP construction

will never encounter a situation where it must choose between two

dierent pieces of code to execute on reaching an acceptance condi-

tion. If attempting to calculate the union of the three DFAs shown

in Figure 7, the algorithm would have to deal with the fact that a

sequence of

(1 2)

at run time should return

:clause-1

rather than

:clause-2

. However, if calculating the union of the DFAs from

Figure 9, such ambiguity is averted. The union can be performed

purely algebraically, with no consideration or order of priority.

Finite Automata Theory Based Optimization of Conditional Variable Binding ELS’19, April 1–2 2018, Genova, Italy

(: cat fi xnu m fi xnu m )

1.0 1.1

1.2

clause-1

(: and (: cat f ixn um i ntege r )

(: not (: cat f ixn um f ixn um )))

2.0 2.1

2.2

2.3

clause-2

(: and (: cat ( or st rin g fix num ) n umb er )

(: not (: cat f ixn um i ntege r ))

(: not (: cat f ixn um f ixn um )))

3.0

3.1

3.3

3.2

clause-3

3.4

3.5

clause-3

Label Type specier

fixnum

integer

number

string

(and (not fixnum) integer)

(and (not integer) number)

Figure 9: DFAs for disjoined clause-1, clause-2, and clause-3

4 MERGING DFAS INTO SYNCHRONIZED

CROSS-PRODUCT DFA

We explain in detail in [

New18

] how the type check associated with

an rte is compiled to ecient Common Lisp code by rst converting

it to a deterministic nite automaton. It is further pointed out in the

perspectives of [

New18

] that it is desirable to merge these automata

into a single automaton in order to share states between the various

automata which serve the same function, and also to eliminate

redundant traversals of the candidate expression. Having a single

automaton which implements the union of the mutually exclusive

patterns enables the candidate list to be traversed once and thereby

matching any one of the expressions specied in the various clauses

of the rte-case.

One advantage of the conversion from destructuring lambda list

to rte is that rtes support an algebra sucient for expressing sets

of non-overlapping types, resulting in mutually exclusive patterns

in the expansion to

rte-case

. As an additional feature of the im-

plementation of

rte-case

, we have arranged so that it treats the

code in Figure 3 and Figure 8 exactly the same, internally disjoining

patterns which are not already disjoint.

The following is an explanation of how several automata are

merged into such a single automaton.

We would like to merge the three DFAs shown in Figure 9 into a

single DFA. There are well known techniques for merging multiple

DFAs [

HMU06

YD14

] into the SXP DFA. These techniques are

not general enough for several reasons which we address in our

approach.

It is not necessary to explicitly consider the SXP of more than

two DFAs, because the operation is associative. Therefore, given the

Common Lisp function

synchronized-product

, we may compute

the SXP of one or more DFAs as a call to cl:reduce.

( reduce # ' s ynchr on iz ed -p roduc t dfas )

4.1 Calculating States and Transitions

We consider constructing the SXP of two DFAs,

dfa-1

(with

states)

and

dfa-2

(with

state). We construct a DFA,

dfa-3

, having

m × n

states, worst case; one state for each pair

(x, y)

with

x ∈ dfa

and

y ∈ dfa

. Fortunately, this worst case does not often occur in

practice as many of the states are not accessible. For example, if

computing the SXP of the rst two DFAs of Figure 9, there is no

possible input sequence which would put

dfa

into state 1.1 while

putting

dfa

into state 2.2. Thus there will be no state in the product

DFA corresponding to (1.1, 2.2).

An ecient algorithm is described in [

YD14

]. We seed a work list

with the one initial state. Next, we traverse the work list, growing

it by adding new states as we construct them. All possible input

types are considered for each state, and all possible transitions are

generated.

An example will make this clearer. First start with

dfa

and

dfa

the second and third DFAs illustrated in Figure 9. The states list is

initialized to S = {(2.0, 3. 0)}.

We examine the behavior of states 2.0 and 3.0. We must char-

acterize the behavior for every possible input. This innite set of

potential input values is partitioned into several disjoint types:

those annotated on transitions exiting state 2.0 and 3.0, and the

complement of their union. This complement type represents the

set of all values for which an implicit transition leads to the virtual

so-called sync state, denoted

⊥

. The sync state is a state which has

exactly one exiting, all encompassing, transition: ⊥

⊤

−→ ⊥.

State 2.0 has one explicit transition, namely 2

−−→

1. Thus,

there is an implicit complement transition 2

⊤\T

−−−−→ ⊥

, where

⊤

represents the universal type. State 3.0 has two explicit transitions:

namely 3

−−→

3 and 3

−−→

1. Thus, there is an implicit

complement transition 3.0

⊤\(T

∪T

)

−−−−−−−−→ ⊥.

To compute the transitions from

(

)

, we must consider

all six pairwise intersections between the transition types of the

two states (2.0 and 3.0). These intersections are shown in Figure 10,

which also indicates the target states in the three non-empty cases.

Given an input of type

fixnum

dfa

transitions from state 2.0 to

state 2.1; and given the same input

dfa

transitions from state 3.0 to

state 3.3. So we add

(

)

;

S = S = {(

), (

)}

and add transition

(

)

−−→ (

)

. Likewise, given an input

of type

string

dfa

transitions from state 2.0 to state

⊥

; and given

ELS’19, April 1–2 2018, Genova, Italy Jim E. Newton and Didier Verna

dfa

intersection Target State

(2.1, 3.3)

∅

⊤ \ (T

∪ T

) ∅

⊤ \ T

∅

⊤ \ T

(⊥, 3.1)

⊤ \ T

⊤ \ (T

∪ T

) ⊤ \ (T

∪ T

) (⊥, ⊥)

Figure 10: Transition Computation for dfa

× dfa

clause-3

clause-1

clause-2

clause-3

Label Type specier

fixnum

number

string

(and (not fixnum) integer)

(and (not integer) number)

(and integer

(or (not integer) fixnum)

(not fixnum))

Figure 11: DFA for rte-case not yet reduced

the same input

dfa

transitions from state 3.0 to state 3.1. So we

add

(⊥,

)

;

S = S = {(

), (

), (⊥,

)}

, and add

transition

(

)

−−→ (⊥,

)

. Finally, given an input of type

(and (not fixnum) (not string))

dfa

transitions from state

2.0 to state

⊥

, and

dfa

transitions from state 3.0 to state

⊥

. The

state

(⊥, ⊥)

is the sync state of the cross product DFA so we need

generate no additional transition from (2.0, 3.0).

Next, we to apply the same procedure to calculate any new states

and transitions of any newly added elements of

. We continue the

procedure until all elements of

have been visited, and no new

states were generated.

After

dfa

× dfa

has been computed, we can repeat the process

via the

reduce

operation mentioned above to compute

dfa

×dfa

dfa

. This procedure constructs a DFA isomorphic to that shown in

Figure 11. We say isomorphic because the choice of state names is

arbitrary. Figure 11 has states named 0 through 7 rather name names

such as (1.0, 2.0, 3.0), (1.1, 2.1, 3.3) as suggested in the procedure

description in Section 4.1.

s ∈ S υ ∈ ϒ δ(s, υ)

0 T

1 T

2 T

s ∈ S υ ∈ ϒ ψ

(s, υ) ∈ Π

0 T

{0, 1, 2}

0 T

{0, 1, 2}

1 T

{5}

1 T

{4}

1 T

{6, 7}

2 T

{6, 7}

s ∈ S Φ

(s)



, {0, 1, 2}), (T

, {0, 1, 2})





, {5}), (T

, {4}), (T

, {6, 7})





, {6, 7})



4 ∅

6 ∅

7 ∅

Figure 12: All values of the δ , ψ

, and Φ

functions.

The DFA shown in Figure 11 is not in minimal form. It has a non-

coäccessible state, 3, from which there is no path to an accepting

state. It also has indistinguishable states; e.g., states 6 and 7 have the

exact same future, albeit a trivial one of just returning the symbol

clause-3. Since each of the states in the computed DFA and each

of the transitions contribute to the number of lines of Common Lisp

code which will be generated when the DFA is serialized in Section 5,

we should simplify this DFA to reduce the lines of redundant code

in the nal macro expansion.

We eliminate non-coäccessible states by a simply trimming pro-

cedure based on graph traversal, nding states which lack a path

to an accessible state. However, the procedure to coalesce indistin-

guishable states is more subtle, and we discuss it in Section 4.2.

4.2 DFA Simplication

The goal of simplication is to coalesce indistinguishable states

such as states 6 and 7 in Figure 11, to result in the DFA in Figure 13.

In order to give a good explanation of the simplication algo-

rithm we need some notation. Let S denote the set of states of the

DFA,

S = {

}

. Let

denote the set of all Common

Lisp types annotated in the DFA:

ϒ = {T

}

. Denote the

state transfer function,

, which given a state,

∈ S

, and a type

υ ∈ ϒ

, returns the target state,

∈ S

of the transition

−→ s

. The

values of δ are given in Figure 12 (top left).

We will construct a sequence

{Π

, Π

, ...Π

, ...}

of partitions

. A partition of

is a set of mutually disjoint subsets of

for

which the union of the subsets is

itself. Each element

κ ∈ Π

called a k-equivalence class. If

, s

∈ κ

, then

and

are said to

be k-equivalent to each other.

To construct the initial partition,

, we group the set of all non-

accepting states into one 0-equivalence class:

{

}

; thereafter,

there is one 0-equivalence class per unique return value:

:clause-1

:clause-2, and :clause-3: {5}, {4}, and {6, 7} respectively.

= {{0, 1, 2}, {4}, {5}, {6, 7 }}

Next, we wish to construct

, ...

n+1

in turn, continu-

ing the iteration until

= Π

n+1

. Each

is derived from

k−1

as we will explain.

Finite Automata Theory Based Optimization of Conditional Variable Binding ELS’19, April 1–2 2018, Genova, Italy

clause-3

clause-1

clause-2

Label Type specier

fixnum

number

string

(and (not fixnum) integer)

(and (not integer) number)

Figure 13: DFA for rte-case simplied

For each integer

k >

0, to determine the k-equivalence classes

we dene two functions

and

In each case, we will construct

k+1

and

k+1

by examining

. These two functions may be dif-

cult to understand intuitively from their mathematical denitions.

Nevertheless, the mathematical denitions help when coding the

simplication function in Common Lisp.

k+1

is a function which takes two arguments,

s ∈ S

and

υ ∈ ϒ

and returns a k-equivalence class

κ ∈ Π

. (I.e.,

k+1

S × ϒ → Π

)

To compute the value of

k+1

(s, υ)

, we select and return the unique

κ ∈ Π

for which

δ(s, υ) ∈ κ

. Figure 12 (top right) shows all the

values of ψ

k+1

takes an element

s ∈ S

and returns a set of order pairs,

each of the form

(υ, κ)

where

υ ∈ ϒ

and

κ ∈ Π

k+1

(s)

is dened

as the set of all pairs

(υ, ψ

k+1

(s, υ))

, such that

υ ∈ ϒ

, and such that

k+1

(s, υ) exists. Figure 12 (bottom) shows all the values of Φ

Now we construct the (k+1)-equivalence classes by splitting the

k-equivalence classes; i.e. we rene

to construct

k+1

, so that

every

κ ∈ Π

k+1

contains those elements which have the same

value of

k+1

. This rule implies that if

has is a singleton set

(e.g.

{

} ∈ Π

, and

{

} ∈ Π

), then

κ ∈ Π

k+1

(i.e.

{

} ∈ Π

, and

{5} ∈ Π

Consider the 0-equivalence class

{

} ∈ Π

. Since

(

)

(

)

, and

(

)

have three dierent values, then we must further

partition

{

}

into three distinct 1-equivalence classes

{

}

{

}

and {2}.

Consider the 0-equivalence

{

}

. Since

(

) = Φ

(

)

, then

{6, 7} is a 1-equivalence class, and {6, 7} ∈ Π

= {{0}, {1}, {2}, {4}, {5}, {6, 7}}

If we repeat this process, generating the functions

and

and use

to construct

, we would nd that

= Π

, which

means Π

is a xed point of the procedure.

= {{0}, {1}, {2}, {4}, {5}, {6, 7}}

is referred to as the partition transformation function.

is referred to as the

partition image function.

We can use

, directly, to construct the minimum DFA shown

in Figure 13. We simply merge the states which are 1-equivalent.

We have determined that states 6 and 7 are 1-equivalent, and no

others. We can thus construct the DFA in Figure 13 by merging

states 6 and 7 from Figure 11.

5 OPTIMIZED CODE GENERATION

Figure 14 shows the essential part of the nal macro expansion of

the

rte-case

shown in Figure 2. Each state in the DFA corresponds

to a label within a

tagbody

, a conditional

unless

checking for end

of sequence, and a

typecase

with one branch per transition in the

DFA, including the implicit transition to

⊥

. We have used

typecase

in this example output, but reader may well notice that there are

several occurrences of redundant type checks in the output. For

example, the

typecase

at label

s.2

in Figure 14 contains multiple

checks for

fixnum

and

integer

. We showed in [

NV18

] how these

redundant type checks might be eliminated simply by replacing

typecase with bdd-typecase.

6 PREVIOUS WORK

Attempts to implement

destructuring-case

are numerous. We

mention three here. R7RS Scheme provides

case-lambda

[

SCG13

Section 4.2.9], allowing xed length argument lists, but lacking any

sort of destructuring; the implementation of

destructuring-case

provided in [

Dom

] is missing tree-structure-based clause selection;

the implementation provided in [

Fun13

], provides tree-structure-

based clause selection, but not within the

&optional

nor

&key

portion. In none of these cases does the clause selection consider

the types of the objects within the list being destructured.

Manuel and Ramanujam [

MR12

] introduces automata over in-

nite alphabets, which seems to be an interesting theoretical ap-

proach of viewing DFA whose transitions are Common Lisp types.

Manuel and Ramanujam do not investigate questions of construc-

tion and simplication as we have investigated in our approach.

6.1 Conclusion and Perspectives

The simplication algorithm described in Section 4.2 may not guar-

antee a minimum result. For example, reconsider

in Figure 12

(bottom). Suppose

= T

′

∪ T

′′

, and suppose there exists

s ∈ S

such that

Φ(s) =



′

, {

}), T

′′

, {

}



. In such a case, states 2

and

would be indistinguishable, but not mergable with the sim-

plcation algorithm we have described. More research is needed

to determine whether such a case can occur, and what the most

general form is. Such analysis is necessary to accomplish our goal

of generalizing nite automata theory on nite alphabets to handle

innite alphabets representable as disjoinable types.

In the procedure described in Section 4, we constructed the SXP

starting with DFAs which were sub-optimal. The DFAs shown in

Figure 9 have states which are not coäccessible: states 2.2 and 3.4.

Furthermore, one of the DFAs has states 3.2 and 3.5 which are

indistinguishable. If we choose to trim and simplify the input DFAs

before constructing the SXP there seem to be cases where we reduce

the number of state pairs which need to be visited.

A natural question is whether it is better to simplify the input

DFAs before computing the SXP, simplify after, or both. One might

be tempted to claim that we should always simplify DFAs before

ELS’19, April 1–2 2018, Genova, Italy Jim E. Newton and Didier Verna

( let * (( g1 ex pr essio n )

( g2 g1 ))

( blo ck che ck

( t agb ody

s .0

( unless g1 ( return -f ro m che ck nil ))

( t ypeca se ( pop g1 )

( fixnum ( go s .2 ))

( string ( go s .1 ))

(t ( return -f ro m che ck nil )))

s .1

( unless g1 ( return -f ro m che ck nil ))

( t ypeca se ( pop g1 )

( number ( go s .3 ))

(t ( return -f ro m che ck nil )))

s .2

( unless g1 ( return -f ro m che ck nil ))

( t ypeca se ( pop g1 )

( fixnum

( go s .4))

(( and ( not i nteger ) n umb er )

( go s .3))

(( and ( not f ixn um ) in teger )

( go s .5))

(t ( return -f ro m che ck nil )))

s .3

( unless g1 ( return -f ro m che ck

( de st ru cturing -b in d ( X Y ) g2

( d ecl are ( type ( or s tri ng f ixn um ) X)

( type nu mbe r Y ))

: c lause -3 )))

( case ( pop g1 )

(t ( return -f ro m che ck nil )))

s .4

( unless g1 ( return -f ro m che ck

( de st ru cturing -b in d ( X Y ) g2

( d ecl are ( type fi xnu m X Y ))

: c lause -1 )))

( case ( pop g1 )

(t ( return -f ro m che ck nil )))

s .5

( unless g1 ( return -f ro m che ck

( de st ru cturing -b in d ( X Y ) g2

( d ecl are ( type fi xnu m X)

( type in teger Y ))

: c lause -2 )))

( case ( pop g1 )

(t ( return -f ro m che ck nil ))))))

Figure 14: Macro expansion of rte-case from Figure 2 and

consequently of destructuring-case from Figure 1.

computing the SXP. However, we do not currently have enough

data to condently support this claim.

We also discussed in Section 3.3 a technique for making the

DFAs match non-overlapping languages before attempting to cal-

culate the SXP. This technique avoids having to make priority

based decisions when the languages overlap. We thereafter saw

that this technique produces DFAs with non-coäccessible states. It

may well be worth investigation whether robustly implementing

the priority based SXP procedure is more ecient, as the input

DFAs would themselves be smaller in many cases, and be absent

the non-coäccessible states.

The

rte-case

macro we discuss in this paper does not attempt to

answer questions about exhaustiveness. It is possible however, to en-

hance the

rte-case

macro with

rte-ecase

(exhaustive

rte-case

)

which would append a nal otherwise clause,

(:* t)

. This clause

would serve at compile time to detect whether the leading clauses

are exhaustive; for if no state in the DFA corresponds to this

:otherwise-clause

, then the given rte patterns are exhaustive.

However, if there is a path in the DFA from an initial state to the

:otherwise-clause

, then the type labels on such a path form a

type signature for such a counter example. The types of the elements

of such a counter-example sequence could easily be generated by

nding any transit through the DFA, and clipping away any loops

it contains. The macro might also produce a compiler warning, as

well as insert a call to error in the code in case the code path is

taken at run-time.

REFERENCES

[Ans94]

Ansi. American National Standard: Programming Language – Common Lisp.

ANSI X3.226:1994 (R1999), 1994.

[Brz64]

Janusz A. Brzozowski. Derivatives of Regular Expressions. J. ACM, 11(4):481–

494, October 1964.

[Dom] Public Domain. Alexandria implementation of destructuring-case.

[Fun13]

Nobuhiko Funato. Public domain implementation of destructuring-bind,

2013. accessed 14 October 2018, 12h36 +0200.

[GWB91]

Richard P. Gabriel, Jon L. White, and Daniel G. Bobrow. CLOS: integrating

object-oriented and functional programming. Communications of the ACM,

34(9):29–38, 1991.

[HMU06]

John E. Hopcroft, Rajeev Motwani, and Jerey D. Ullman. Introduction to

Automata Theory, Languages, and Computation (3rd Edition). Addison-Wesley

Longman Publishing Co., Inc., Boston, MA, USA, 2006.

[Kee89]

Sonja E. Keene. Object-Oriented Programming in Common Lisp: a Program-

mer’s Guide to CLOS. Addison-Wesley, 1989.

[MR12]

Amaldev Manuel and Ramaswamy Ramanujam. Automata over innite

alphabets. In Modern Applications of Automata Theory, pages 529–554. 2012.

[NDV16]

Jim Newton, Akim Demaille, and Didier Verna. Type-Checking of Hetero-

geneous Sequences in Common Lisp. In European Lisp Symposium, Kraków,

Poland, May 2016.

[New18]

Jim Newton. Representing and Computing with Types in Dynamically Typed

Languages. PhD thesis, Sorbonne University, November 2018.

[NV18]

Jim Newton and Didier Verna. Strategies for typecase optimization. In

European Lisp Symposium, Marbella, Spain, April 2018.

[NVC17]

Jim Newton, Didier Verna, and Maximilien Colange. Programmatic Ma-

nipulation of Common Lisp Type Speciers. In European Lisp Symposium,

Brussels, Belgium, April 2017.

[ORT09]

Scott Owens, John Reppy, and Aaron Turon. Regular-expression Derivatives

Re-examined. J. Funct. Program., 19(2):173–190, March 2009.

[SCG13]

Alex Shinn, John Cowan, and Arthur A. Gleckler. Revised 7 report on the

algorithmic language Scheme. Technical report, 2013.

[YD14]

Francois Yvon and Akim Demaille. Théorie des Langages Rationnels. EPITA

LRDE, 2014. Lecture notes.