Finite Automata Theory Based Optimization of Conditional
Variable Binding
Jim E. Newton
Didier Verna
jnewton@lrde.epita.fr
didier@lrde.epita.fr
EPITA/LRDE
Le Kremlin-Bicêtre, France
ABSTRACT
We present an ecient and highly optimized implementation of
destructuring-case
in Common Lisp. This macro allows the se-
lection of the most appropriate destructuring lambda list of several
given based on structure and types of data at run-time and there-
after dispatches to the corresponding code branch. We examine an
optimization technique, based on nite automata theory applied
to conditional variable binding and execution, and type-based pat-
tern matching on Common Lisp sequences. A risk of ineciency
associated with a naive implementation of
destructuring-case
is that the candidate expression being examined may be traversed
multiple times, once for each clause whose format fails to match,
and nally once for the successful match. We have implemented
destructuring-case
in such a way to avoid multiple traversals of
the candidate expression. This article explains how this optimiza-
tion has been implemented.
CCS CONCEPTS
Theory of computation Data structures design and anal-
ysis; Type theory;
ACM Reference Format:
Jim E. Newton and Didier Verna. 2019. Finite Automata Theory Based
Optimization of Conditional Variable Binding. In Proceedings of The 12th
European Lisp Symposium (ELS’19). ACM, New York, NY, USA, 8 pages.
1 INTRODUCTION
The Common Lisp macro
destructuring-bind
[
Ans94
] binds the
variables specied in a given lambda list to the corresponding val-
ues in the tree structure resulting from the evaluation of a given
expression. However, in the case that the tree structure of the ex-
pression does not coincide with the given lambda list, a run-time
error is signaled. This error may pose a challenge to the program-
mer. The problem, simply stated, is that the destructuring lambda
list [
Ans94
, Section 3.4.5] is specied at compile time, and the ex-
pression is evaluated at run-time. Thus, it may not be possible to
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
ELS’19, April 1–2 2018, Genova, Italy
© 2019 Association for Computing Machinery.
ACM ISBN 978-2-9557474-2-1.. . $15.00
( de st ru cturing -c as e ex pr essio n
(( X Y)
( d ecl are ( type fi xnu m X Y))
: c lause -1 )
(( X Y)
( d ecl are ( type fi xnu m X)
( type in teger Y))
: c lause -2 )
(( X Y)
( d ecl are ( type ( or s tri ng f ixn um ) X)
( type nu mbe r Y ))
: c lause -3 ))
Figure 1: Example of destructuring-case usage.
know until run-time that the input data is problematic. In certain
cases the programmer would like specify the run-time behavior
to take if the match fails, rather than having an error signaled.
This behavior cannot be specied portably using the condition sys-
tem [
Ans94
, Chapter 9], because the condition signaled is simply
of type
error
with no additional information about exactly what
failed. Furthermore, the programmer may not wish to signal an
error at all, but rather detect the actual run-time pattern of the
input data and proceed dierently depending on which format of
data is discovered.
We presented
destructuring-case
in [
NDV16
] as a mecha-
nism to test run-time adherence of the destructuring lambda list to
the value of a candidate expression. An example usage of this macro
can be seen in Figure 1. This example shows three clauses, each with
the same lambda list,
(X Y)
, but with dierent type declarations.
In general, a usage of
destructuring-case
may use radically dif-
ferent lambda lists, which dier in number of variables, having
dierent
&optional
and
&key
sections, and also using dierent
hierarchical structure of the variables.
The semantics of
destructuring-case
are that the value of
the given
expression
is tested in turn against each of the given
destructuring lambda lists, until a match is found, i.e. a match in
both hierarchical structure and type of values. Only at such time
are the indicated consequent expressions or any default values
evaluated. This restriction is especially important if there are side-
eects in the default values of optional arguments in the lambda
lists such as (... &optional (x (incf *global-var*))).
ELS’19, April 1–2 2018, Genova, Italy Jim E. Newton and Didier Verna
( r te-ca se expr es si on
((: cat fixn um fixnum )
( de st ru cturing -b in d ( X Y) e xp ressi on
: c lause -1 ))
((: cat fixn um in teg er )
( de st ru cturing -b in d ( X Y) e xp ressi on
: c lause -2 ))
((: cat ( or string fixnum ) number )
( de st ru cturing -b in d ( X Y) e xp ressi on
: c lause -3 )))
Figure 2: Expansion of destructuring-case from Figure 1
into rte-case.
( r te-ca se expr es si on
((: cat fixn um fixnum )
: c lause -1 )
((: cat fixn um in teg er )
: c lause -2 )
((: cat ( or string fixnum ) number )
: c lause -3 ))
Figure 3: Simple example of rte-case from Figure 2.
The implementations of the macros discussed in this arti-
cle, including
destructuring-case
,
rte-case
,
rte-ecase
, and
bdd-typecase
, are available in Quicklisp
1
via the package
:regular-type-expression.
2 FROM DESTRUCTURING-CASE TO RTE-CASE
Our implementation of
destructuring-case
converts its input of
destructuring lambda lists to rte (regular type expression) and then
outputs an invocation of
rte-case
. The essential part of such an
expansion is shown in Figure 2. An rte, introduced in [
NDV16
], is
Common Lisp syntax to specify a set of sequences, i.e. a subtype of
the
sequence
type. We explain in Section 2.2 how a destructuring
lambda list is converted to an rte.
As can be seen in Figure 2, each destructuring lambda list has
been converted to an rte such as
(:cat fixnum fixnum)
in the rst
clause, followed by a call to
destructuring-bind
. As is implied by
the syntax, the
destructuring-bind
will only be executed at run-
time if the value of the candidate expression matches the pattern
designated by the rte.
We further notice in the simplistic example shown in Figure 2,
that no
destructuring-bind
in the
rte-case
expansion plays
any role. The variables bound by the
destructuring-bind
are
not used in the expressions which follow. Therefore, in our further
discussion we will refer to the even simpler, semantically equivalent
code in Figure 3.
A straightforward expansion of
rte-case
might include succes-
sive type checks of
expression
such as suggested in Figure 4. Such
1
Quicklisp,
https://www.quicklisp.org/
, is a public repository, maintained by Zach
Beane, consisting of user contributed Common Lisp libraries.
( t ypeca se expr es si on
(( rte (: cat f ixn um f ixn um ))
: c lause -1 )
(( rte (: cat f ixn um i ntege r ))
: c lause -2 )
(( rte (: cat ( or st rin g fix num ) n umb er ))
: c lause -3 ))
Figure 4: Naive expansion of rte-case from Figure 2
an expansion would be semantically correct, but inecient because
the sequence
expression
would be traversed three times in the
worst case, to determine which consequent clause to evaluate. As
will be seen, our technique eliminates these redundant traversals,
allowing one single traversal of the sequence to be executed and
thereby determining which consequent expressions to evaluate.
2
2.1 Examples of rte Syntax
The grammar an rte is explicitly detailed in [
New18
], but the basic
grammar can be understood intuitively, assuming the reader has
a basic understanding of string-based regular expression syntax.
The concatenation operator,
:cat
species a sequences successive
elements: e.g.,
(:cat fixnum string)
denotes a sequence of ex-
actly two elements, the rst of type
string
and the second of type
string
. To make the
string
optional use the syntax
(:cat fixnum
(:? string))
. To specify the occurrence, zero or more times, of
fixnum
followed by an optional string, use
(:cat (:* fixnum)
(:? string))
. Substitute
:+
for
:*
to expression an occurrence
one or more times Finally, expressions may be combined logically
using
:and
,
:or
, and
:not
, e.g.,
(:or (:cat fixnum string))
(:+ (:not number)).
2.2 From Destructuring Lambda List to rte
In this section we summarize how a destructuring lambda list and
associated type declarations may be converted into an rte. The
conversion procedure is explained in more detail in [NDV16].
The set of lists which are valid argument lists for a given invoca-
tion of
destructuring-bind
with an optional set of type declara-
tions can be characterized by an rte. A destructuring lambda list,
such as used in
destructuring-bind
, species a required portion,
denoted by a leading sequence of variables; an optional portion,
delimited by
&optional
; and a repeating portion of keyword value
pairs, delimited by
&key
. To construct the rte corresponding to a
given destructuring lambda list, we construct the required-rte, the
optional-rte, and the repeating-rte, and concatenate them using the
:cat operator.
(:cat required-rte optional-rte repeating-rte )
As an example, consider the lambda list shown in Figure 5. The
required portion and optional portions are easy.
2
The reader may well notice that a fourth traversal is also necessary to evaluate the
destructuring-bind
which is present in each of the consequent clauses. In this paper
we do not address the elimination of this fourth traversal.
Finite Automata Theory Based Optimization of Conditional Variable Binding ELS’19, April 1–2 2018, Genova, Italy
( de st ru cturing -b in d ( A B & op tiona l Q & key X Y)
ex pr essio n
( d ecl are ( type st rin g A B)
( type list Q)
( type real X)
( type in teger Y ))
...)
Figure 5: Example destructuring-bind with declarations
required-rte = (:cat string string)
optional-rte = (:? list)
The repeating portion deserves careful attention; we consider
two restrictions.
(1)
If
&allow-other-keys
is not given, such as is the case in
Figure 5, then the only allowed keywords are those explicitly
specied. In our case the only allowed keywords are
:X
and :Y, meaning the repeating portion is also of the form
(:* (:cat (member :X :Y) t)) .
(2)
Type declarations such as
(declare real X)
only restrict
the value associated with the rst occurrence of each key-
word in an argument list, because only the rst such oc-
currence is bound the the associated variable [
Ans94
, Sec-
tion 3.3.4]. A keyword portion of the argument list such as
(:X 1.2 :X ’not-real)
is perfectly valid, whereas
(:X
’not-real :X 1.2)
is not. Thus, we iterate over all speci-
ed keywords, generating one pattern for each. The pattern
handling
&key X
requires that either there is either no
:X
given, or that the rst
:X
is followed by a
real
. See the note
restriction 2 in Figure 6.
Putting all these restrictions together, we have the rte in Figure 6
representing the
destructuring-bind
with type declarations in
Figure 5.
There are several other features of
destructuring-bind
which
are supported by
destructuring-case
, but whose details we omit
in this discussion, including tree structure variables/data, default
values, supplied-p-parameter, &allow-other-keys, and others.
3 FROM RTE-CASE TO INDIVIDUAL DFAS
Each rte shown in Figure 3 can be converted to ecient type check-
ing Common Lisp code, as explained in [
NDV16
]. Such conversion
involves rst converting each rte to a deterministic nite automa-
ton (DFA), where the transition labels represent type checks for
successive elements of the candidate expression. Figure 7 shows
the three DFAs corresponding to the rte-case in Figure 3.
We now summarize how a deterministic nite automata (DFA) is
constructed, given an rte. Some approaches to such generation, such
as [
YD14
,
HMU06
], involve constructing a non-deterministic nite
automaton and thereafter determinizing it. We use the technique
presented by Brzozowski [
Brz64
] and claried by Owens [
ORT09
].
The Brzozowski algorithm uses a technique called the rational
derivative, to construct a DFA, and thereby obviating the necessity
(: cat
;; re qu ir ed -r te
(: cat st rin g st rin g )
;; op ti on al -r te
(:? list )
;; re pe at in g- rt e
(: and
;; re st ri ction 1
(:* (: cat ( mem ber : X : Y ) t ))
;; re st ri ction 2 for : X real
(: or (:* (: cat (: not ( eql : X )) t ))
(: cat (:* (: cat (: not ( eql : X )) t ))
( eql : X) real
(:* t )))
;; re st ri ction 2 for : Y i nte ger
(: or (:* (: cat (: not ( eql : Y )) t ))
(: cat (:* (: cat (: not ( eql : Y )) t ))
( eql : Y) integ er
(:* t ) ))))
Figure 6: The rte representing the destructuring-bind and
type declarations from Figure 5.
(:cat fixnum fixnum)
1.0 1.1
T1
1.2
T1
clause-1
(:cat fixnum integer)
2.0 2.1
T1
2.2
T2
clause-2
(:cat (or string fixnum) number)
3.0 3.1
T5
3.2
T3
clause-3
Label Type specier
T
1
fixnum
T
2
integer
T
3
number
T
5
(or fixnum string)
Figure 7: Automata for clauses of rte-case in Figure 2
to determinize the result. In [
NDV16
,
New18
], we explain how the
rational derivative can be extended to accommodate Common Lisp
types, in particular rather than calculating the rational derivative (as
Owens suggests) with respect to each letter of the alphabet, instead
ELS’19, April 1–2 2018, Genova, Italy Jim E. Newton and Didier Verna
we calculate the derivative with respect each type calculated in the
maximal disjoint type decomposition as explained in [NVC17].
3.1 Constructing States and Transitions
The algorithm can be summarized as follows. Each state in the DFA
represents all the possible futures which are accepting. Moreover,
there is a (not necessarily unique) rte which expresses that set of
futures. For example, let:
P
1
= (:or (:cat number string) (:cat fixnum float))
be the rte representing all the sequences of either a
number
followed
by a
string
or a
fixnum
followed by a
float
. Suppose there is a
state in the DFA associated with this rte. Now we consider all the
possible types of the rst element of such a sequence. And for each
such rst element type, we calculate what the remaining future
would be given that the rst element of that type. If the rst element
is a
fixnum
, then the future is a sequence containing either a
string
or a
float
. Such a sequence is denoted by the rte
(:or string
float). In terms of the rational derivative we say:
P
2
=
fixnum
P
1
= (:or string float) .
If, on the other hand, the rst element is not a
fixnum
but is a
number
, then the remaining sequence whose only element is a
string. That is to say:
P
3
=
(and number (not fixnum))
P
1
= string .
Since there is no other possible rst element of
P
1
, we con-
struct two additional states,
P
2
and
P
3
and construct two transitions
P
1
P
2
labeled
fixnum
, and
P
1
P
3
labeled
(and number (not
fixnum)).
We continue this process until all the futures of each state have
been calculated, generating all the possible states, and all the possi-
ble transitions between the states.
3.2 Associating Code with Accepting States
DFAs used for matching pattern languages such as regular expres-
sions, normally represent Boolean functions; returning TRUE if the
sequence matches the expression, and FALSE otherwise. In our case
each accepting state of the DFAs in Figure 7 indicate which code
paths to take in the originating
rte-case
, Figure 3. This problem
is easily addressed. We have simply extended our
state
object
(Clos class [
GWB91
,
Kee89
]) to contain a slot indicating a piece of
continuation code to be serialized in the nal macro expansion.
3.3 Overlapping Clauses
The synchronized cross-product (SXP) of two or more given DFAs is
a single DFA whose behavior simultaneously emulates the behavior
of the given DFAs. Typically such a cross-product implements the
intersection or union languages of the input DFAs; however the
semantics of such a cross-product can be taken to be any Boolean
combination of the input.
For example, to implement the symmetric dierence language
we apply the Boolean XOR function; a state, X, in the SXP, cor-
responding to states A and B from two given DFAs, is marked as
an accepting state if A XOR B are accepting (if either but not both
are accepting). In our case we would like to select the code for
evaluation corresponding to the code appearing rst in the original
( r te-ca se expr es si on
((: cat fixn um fixnum )
: c lause -1 )
((: and (: cat fixnum i nte ger )
(: not (: cat f ixn um f ixn um )))
: c lause -2 )
((: and (: cat ( or string fi xnu m ) n umb er )
(: not (: cat f ixn um f ixn um ))
(: not (: cat f ixn um f ixn um )))
: c lause -3 ))
Figure 8: Example of rte-case with pairwise disjoint pat-
terns
destructuring-case
; so we need priority based selection, rather
than simply a Boolean function.
An important property of the behavior of
rte-case
is that if
more than one pattern matches the expression in question, then
the clause appearing rst has priority over the others. For example,
in the code in Figure 3, if the value of
expression
is the list
(1
2)
, then all three rtes match; nevertheless
:clause-1
must be the
return value.
An approach of addressing this ambiguity is to extend or aug-
ment the patterns so that they are mutually exclusive; i.e. assure
that no two patterns simultaneously match any candidate expres-
sion. The code shown in Figure 8 is equivalent to that in Figure 3
but any input expression,
(1 2)
, for example, matches at most
one pattern. This pattern augmentation can be accomplished as a
code transformation. The pattern corresponding to
:clause-1
is
unchanged, but the subsequent clauses have been augmented to
emphasize that those clauses are never reached if any prior pattern
matches.
These rtes correspond to the DFAs shown in Figure 9. The rst
DFA is exactly the same as before, but we notice in the second DFA
that the state labeled 2.2 is non-coäccessible; i.e., there is no path
from state 2.2 to any accepting state. This non-useful state corre-
sponds to
(:not (:cat fixnum fixnum))
in the input pattern,
and it enforces that a sequence consisting of two objects of type
fixnum
, is a rejected sequence rather than a matching sequence.
The third DFA in the gure contains a similar state, 3.4, but in
addition, contains two states 3.2 and 3.5 which are equivalent to
each other.
The disjoining process described here produces DFAs which
have redundant or non-coäccessible states. Despite this fact, these
slightly more complex DFAs play an important role in the SXP con-
struction, because the process guarantees that the SXP construction
will never encounter a situation where it must choose between two
dierent pieces of code to execute on reaching an acceptance condi-
tion. If attempting to calculate the union of the three DFAs shown
in Figure 7, the algorithm would have to deal with the fact that a
sequence of
(1 2)
at run time should return
:clause-1
rather than
:clause-2
. However, if calculating the union of the DFAs from
Figure 9, such ambiguity is averted. The union can be performed
purely algebraically, with no consideration or order of priority.
Finite Automata Theory Based Optimization of Conditional Variable Binding ELS’19, April 1–2 2018, Genova, Italy
(: cat fi xnu m fi xnu m )
1.0 1.1
T1
1.2
T1
clause-1
(: and (: cat f ixn um i ntege r )
(: not (: cat f ixn um f ixn um )))
2.0 2.1
T1
2.2
T1
2.3
T6
clause-2
(: and (: cat ( or st rin g fix num ) n umb er )
(: not (: cat f ixn um i ntege r ))
(: not (: cat f ixn um f ixn um )))
3.0
3.1
T4
3.3
T1
3.2
T3
clause-3
3.4
T2
3.5
T7
clause-3
Label Type specier
T
1
fixnum
T
2
integer
T
3
number
T
4
string
T
6
(and (not fixnum) integer)
T
7
(and (not integer) number)
Figure 9: DFAs for disjoined clause-1, clause-2, and clause-3
4 MERGING DFAS INTO SYNCHRONIZED
CROSS-PRODUCT DFA
We explain in detail in [
New18
] how the type check associated with
an rte is compiled to ecient Common Lisp code by rst converting
it to a deterministic nite automaton. It is further pointed out in the
perspectives of [
New18
] that it is desirable to merge these automata
into a single automaton in order to share states between the various
automata which serve the same function, and also to eliminate
redundant traversals of the candidate expression. Having a single
automaton which implements the union of the mutually exclusive
patterns enables the candidate list to be traversed once and thereby
matching any one of the expressions specied in the various clauses
of the rte-case.
One advantage of the conversion from destructuring lambda list
to rte is that rtes support an algebra sucient for expressing sets
of non-overlapping types, resulting in mutually exclusive patterns
in the expansion to
rte-case
. As an additional feature of the im-
plementation of
rte-case
, we have arranged so that it treats the
code in Figure 3 and Figure 8 exactly the same, internally disjoining
patterns which are not already disjoint.
The following is an explanation of how several automata are
merged into such a single automaton.
We would like to merge the three DFAs shown in Figure 9 into a
single DFA. There are well known techniques for merging multiple
DFAs [
HMU06
,
YD14
] into the SXP DFA. These techniques are
not general enough for several reasons which we address in our
approach.
It is not necessary to explicitly consider the SXP of more than
two DFAs, because the operation is associative. Therefore, given the
Common Lisp function
synchronized-product
, we may compute
the SXP of one or more DFAs as a call to cl:reduce.
( reduce # ' s ynchr on iz ed -p roduc t dfas )
4.1 Calculating States and Transitions
We consider constructing the SXP of two DFAs,
dfa-1
(with
n
states)
and
dfa-2
(with
m
state). We construct a DFA,
dfa-3
, having
m × n
states, worst case; one state for each pair
(x, y)
with
x dfa
1
and
y dfa
2
. Fortunately, this worst case does not often occur in
practice as many of the states are not accessible. For example, if
computing the SXP of the rst two DFAs of Figure 9, there is no
possible input sequence which would put
dfa
1
into state 1.1 while
putting
dfa
2
into state 2.2. Thus there will be no state in the product
DFA corresponding to (1.1, 2.2).
An ecient algorithm is described in [
YD14
]. We seed a work list
with the one initial state. Next, we traverse the work list, growing
it by adding new states as we construct them. All possible input
types are considered for each state, and all possible transitions are
generated.
An example will make this clearer. First start with
dfa
2
and
dfa
3
,
the second and third DFAs illustrated in Figure 9. The states list is
initialized to S = {(2.0, 3. 0)}.
We examine the behavior of states 2.0 and 3.0. We must char-
acterize the behavior for every possible input. This innite set of
potential input values is partitioned into several disjoint types:
those annotated on transitions exiting state 2.0 and 3.0, and the
complement of their union. This complement type represents the
set of all values for which an implicit transition leads to the virtual
so-called sync state, denoted
. The sync state is a state which has
exactly one exiting, all encompassing, transition:
.
State 2.0 has one explicit transition, namely 2
.
0
T
1
2
.
1. Thus,
there is an implicit complement transition 2
.
0
⊤\T
1
, where
represents the universal type. State 3.0 has two explicit transitions:
namely 3
.
0
T
1
3
.
3 and 3
.
0
T
4
3
.
1. Thus, there is an implicit
complement transition 3.0
⊤\(T
1
T
4
)
.
To compute the transitions from
(
2
.
0
,
3
.
0
)
, we must consider
all six pairwise intersections between the transition types of the
two states (2.0 and 3.0). These intersections are shown in Figure 10,
which also indicates the target states in the three non-empty cases.
Given an input of type
fixnum
,
dfa
2
transitions from state 2.0 to
state 2.1; and given the same input
dfa
3
transitions from state 3.0 to
state 3.3. So we add
(
2
.
1
,
3
.
3
)
to
S
;
S = S = {(
2
.
0
,
3
.
0
), (
2
.
1
,
3
.
3
)}
,
and add transition
(
2
.
0
,
3
.
0
)
T
1
(
2
.
1
,
3
.
3
)
. Likewise, given an input
of type
string
,
dfa
2
transitions from state 2.0 to state
; and given
ELS’19, April 1–2 2018, Genova, Italy Jim E. Newton and Didier Verna
dfa
2
dfa
3
intersection Target State
T
1
T
1
T
1
(2.1, 3.3)
T
1
T
4
T
1
\ (T
1
T
4
)
\ T
1
T
1
\ T
1
T
4
T
4
(⊥, 3.1)
\ T
1
\ (T
1
T
4
) \ (T
1
T
4
) (⊥, ⊥)
Figure 10: Transition Computation for dfa
2
× dfa
3
0
2
T4
1
T1
7
T3
6
T7
5
T1
4
T6
3
T9
clause-3
clause-1
clause-2
clause-3
Label Type specier
T
1
fixnum
T
3
number
T
4
string
T
6
(and (not fixnum) integer)
T
7
(and (not integer) number)
T
9
(and integer
(or (not integer) fixnum)
(not fixnum))
Figure 11: DFA for rte-case not yet reduced
the same input
dfa
3
transitions from state 3.0 to state 3.1. So we
add
(⊥,
3
.
1
)
to
S
;
S = S = {(
2
.
0
,
3
.
0
), (
2
.
1
,
3
.
3
), (⊥,
3
.
1
)}
, and add
transition
(
2
.
0
,
3
.
0
)
T
4
(⊥,
3
.
1
)
. Finally, given an input of type
(and (not fixnum) (not string))
,
dfa
2
transitions from state
2.0 to state
, and
dfa
3
transitions from state 3.0 to state
. The
state
(⊥, ⊥)
is the sync state of the cross product DFA so we need
generate no additional transition from (2.0, 3.0).
Next, we to apply the same procedure to calculate any new states
and transitions of any newly added elements of
S
. We continue the
procedure until all elements of
S
have been visited, and no new
states were generated.
After
dfa
2
× dfa
3
has been computed, we can repeat the process
via the
reduce
operation mentioned above to compute
dfa
1
×dfa
2
×
dfa
3
. This procedure constructs a DFA isomorphic to that shown in
Figure 11. We say isomorphic because the choice of state names is
arbitrary. Figure 11 has states named 0 through 7 rather name names
such as (1.0, 2.0, 3.0), (1.1, 2.1, 3.3) as suggested in the procedure
description in Section 4.1.
s S υ ϒ δ(s, υ)
0 T
1
1
0 T
4
2
1 T
1
5
1 T
6
4
1 T
7
6
2 T
3
7
s S υ ϒ ψ
1
(s, υ) Π
0
0 T
1
{0, 1, 2}
0 T
4
{0, 1, 2}
1 T
1
{5}
1 T
6
{4}
1 T
7
{6, 7}
2 T
3
{6, 7}
s S Φ
1
(s)
0
(T
1
, {0, 1, 2}), (T
4
, {0, 1, 2})
1
(T
1
, {5}), (T
6
, {4}), (T
7
, {6, 7})
2
(T
3
, {6, 7})
4
6
7
Figure 12: All values of the δ , ψ
1
, and Φ
1
functions.
The DFA shown in Figure 11 is not in minimal form. It has a non-
coäccessible state, 3, from which there is no path to an accepting
state. It also has indistinguishable states; e.g., states 6 and 7 have the
exact same future, albeit a trivial one of just returning the symbol
clause-3. Since each of the states in the computed DFA and each
of the transitions contribute to the number of lines of Common Lisp
code which will be generated when the DFA is serialized in Section 5,
we should simplify this DFA to reduce the lines of redundant code
in the nal macro expansion.
We eliminate non-coäccessible states by a simply trimming pro-
cedure based on graph traversal, nding states which lack a path
to an accessible state. However, the procedure to coalesce indistin-
guishable states is more subtle, and we discuss it in Section 4.2.
4.2 DFA Simplication
The goal of simplication is to coalesce indistinguishable states
such as states 6 and 7 in Figure 11, to result in the DFA in Figure 13.
In order to give a good explanation of the simplication algo-
rithm we need some notation. Let S denote the set of states of the
DFA,
S = {
0
,
1
,
2
,
4
,
5
,
6
,
7
}
. Let
ϒ
denote the set of all Common
Lisp types annotated in the DFA:
ϒ = {T
1
,T
3
,T
4
,T
6
,T
7
}
. Denote the
state transfer function,
δ
, which given a state,
s
i
S
, and a type
υ ϒ
, returns the target state,
s
j
S
of the transition
s
i
υ
s
j
. The
values of δ are given in Figure 12 (top left).
We will construct a sequence
{Π
1
, Π
2
, ...Π
n
, ...}
of partitions
of
S
. A partition of
S
is a set of mutually disjoint subsets of
S
for
which the union of the subsets is
S
itself. Each element
κ Π
k
is
called a k-equivalence class. If
s
i
, s
j
κ
, then
s
i
and
s
j
are said to
be k-equivalent to each other.
To construct the initial partition,
Π
0
, we group the set of all non-
accepting states into one 0-equivalence class:
{
0
,
1
,
2
}
; thereafter,
there is one 0-equivalence class per unique return value:
:clause-1
,
:clause-2, and :clause-3: {5}, {4}, and {6, 7} respectively.
Π
0
= {{0, 1, 2}, {4}, {5}, {6, 7 }}
Next, we wish to construct
Π
1
,
Π
2
, ...
Π
n
,
Π
n+1
in turn, continu-
ing the iteration until
Π
n
= Π
n+1
. Each
Π
k
is derived from
Π
k1
as we will explain.
Finite Automata Theory Based Optimization of Conditional Variable Binding ELS’19, April 1–2 2018, Genova, Italy
0
2
T4
1
T1
6
T3
T7
5
T1
4
T6
clause-3
clause-1
clause-2
Label Type specier
T
1
fixnum
T
3
number
T
4
string
T
6
(and (not fixnum) integer)
T
7
(and (not integer) number)
Figure 13: DFA for rte-case simplied
For each integer
k >
0, to determine the k-equivalence classes
we dene two functions
ψ
k
and
Φ
k
.
3
In each case, we will construct
ψ
k+1
and
Φ
k+1
by examining
Π
k
. These two functions may be dif-
cult to understand intuitively from their mathematical denitions.
Nevertheless, the mathematical denitions help when coding the
simplication function in Common Lisp.
ψ
k+1
is a function which takes two arguments,
s S
and
υ ϒ
,
and returns a k-equivalence class
κ Π
k
. (I.e.,
ψ
k+1
:
S × ϒ Π
k
)
To compute the value of
ψ
k+1
(s, υ)
, we select and return the unique
κ Π
k
for which
δ(s, υ) κ
. Figure 12 (top right) shows all the
values of ψ
1
.
Φ
k+1
takes an element
s S
and returns a set of order pairs,
each of the form
(υ, κ)
where
υ ϒ
and
κ Π
k
.
Φ
k+1
(s)
is dened
as the set of all pairs
(υ, ψ
k+1
(s, υ))
, such that
υ ϒ
, and such that
ψ
k+1
(s, υ) exists. Figure 12 (bottom) shows all the values of Φ
1
.
Now we construct the (k+1)-equivalence classes by splitting the
k-equivalence classes; i.e. we rene
Π
k
to construct
Π
k+1
, so that
every
κ Π
k+1
contains those elements which have the same
value of
Φ
k+1
. This rule implies that if
κ
has is a singleton set
(e.g.
{
4
} Π
0
, and
{
5
} Π
0
), then
κ Π
k+1
(i.e.
{
4
} Π
1
, and
{5} Π
1
).
Consider the 0-equivalence class
{
0
,
1
,
2
} Π
0
. Since
Φ
1
(
0
)
,
Φ
1
(
1
)
, and
Φ
1
(
2
)
have three dierent values, then we must further
partition
{
0
,
1
,
2
}
into three distinct 1-equivalence classes
{
0
}
,
{
1
}
,
and {2}.
Consider the 0-equivalence
{
6
,
7
}
. Since
Φ
1
(
6
) = Φ
1
(
7
)
, then
{6, 7} is a 1-equivalence class, and {6, 7} Π
1
.
Π
1
= {{0}, {1}, {2}, {4}, {5}, {6, 7}}
If we repeat this process, generating the functions
ψ
2
and
Φ
2
,
and use
Φ
2
to construct
Π
2
, we would nd that
Π
2
= Π
1
, which
means Π
1
is a xed point of the procedure.
Π
2
= {{0}, {1}, {2}, {4}, {5}, {6, 7}}
3
ψ
is referred to as the partition transformation function.
Φ
is referred to as the
partition image function.
We can use
Π
1
, directly, to construct the minimum DFA shown
in Figure 13. We simply merge the states which are 1-equivalent.
We have determined that states 6 and 7 are 1-equivalent, and no
others. We can thus construct the DFA in Figure 13 by merging
states 6 and 7 from Figure 11.
5 OPTIMIZED CODE GENERATION
Figure 14 shows the essential part of the nal macro expansion of
the
rte-case
shown in Figure 2. Each state in the DFA corresponds
to a label within a
tagbody
, a conditional
unless
checking for end
of sequence, and a
typecase
with one branch per transition in the
DFA, including the implicit transition to
. We have used
typecase
in this example output, but reader may well notice that there are
several occurrences of redundant type checks in the output. For
example, the
typecase
at label
s.2
in Figure 14 contains multiple
checks for
fixnum
and
integer
. We showed in [
NV18
] how these
redundant type checks might be eliminated simply by replacing
typecase with bdd-typecase.
6 PREVIOUS WORK
Attempts to implement
destructuring-case
are numerous. We
mention three here. R7RS Scheme provides
case-lambda
[
SCG13
,
Section 4.2.9], allowing xed length argument lists, but lacking any
sort of destructuring; the implementation of
destructuring-case
provided in [
Dom
] is missing tree-structure-based clause selection;
the implementation provided in [
Fun13
], provides tree-structure-
based clause selection, but not within the
&optional
nor
&key
portion. In none of these cases does the clause selection consider
the types of the objects within the list being destructured.
Manuel and Ramanujam [
MR12
] introduces automata over in-
nite alphabets, which seems to be an interesting theoretical ap-
proach of viewing DFA whose transitions are Common Lisp types.
Manuel and Ramanujam do not investigate questions of construc-
tion and simplication as we have investigated in our approach.
6.1 Conclusion and Perspectives
The simplication algorithm described in Section 4.2 may not guar-
antee a minimum result. For example, reconsider
Φ
1
in Figure 12
(bottom). Suppose
T
3
= T
T
′′
, and suppose there exists
s S
such that
Φ(s) =
(T
, {
6
,
7
}), T
′′
, {
6
,
7
}
. In such a case, states 2
and
s
would be indistinguishable, but not mergable with the sim-
plcation algorithm we have described. More research is needed
to determine whether such a case can occur, and what the most
general form is. Such analysis is necessary to accomplish our goal
of generalizing nite automata theory on nite alphabets to handle
innite alphabets representable as disjoinable types.
In the procedure described in Section 4, we constructed the SXP
starting with DFAs which were sub-optimal. The DFAs shown in
Figure 9 have states which are not coäccessible: states 2.2 and 3.4.
Furthermore, one of the DFAs has states 3.2 and 3.5 which are
indistinguishable. If we choose to trim and simplify the input DFAs
before constructing the SXP there seem to be cases where we reduce
the number of state pairs which need to be visited.
A natural question is whether it is better to simplify the input
DFAs before computing the SXP, simplify after, or both. One might
be tempted to claim that we should always simplify DFAs before
ELS’19, April 1–2 2018, Genova, Italy Jim E. Newton and Didier Verna
( let * (( g1 ex pr essio n )
( g2 g1 ))
( blo ck che ck
( t agb ody
s .0
( unless g1 ( return -f ro m che ck nil ))
( t ypeca se ( pop g1 )
( fixnum ( go s .2 ))
( string ( go s .1 ))
(t ( return -f ro m che ck nil )))
s .1
( unless g1 ( return -f ro m che ck nil ))
( t ypeca se ( pop g1 )
( number ( go s .3 ))
(t ( return -f ro m che ck nil )))
s .2
( unless g1 ( return -f ro m che ck nil ))
( t ypeca se ( pop g1 )
( fixnum
( go s .4))
(( and ( not i nteger ) n umb er )
( go s .3))
(( and ( not f ixn um ) in teger )
( go s .5))
(t ( return -f ro m che ck nil )))
s .3
( unless g1 ( return -f ro m che ck
( de st ru cturing -b in d ( X Y ) g2
( d ecl are ( type ( or s tri ng f ixn um ) X)
( type nu mbe r Y ))
: c lause -3 )))
( case ( pop g1 )
(t ( return -f ro m che ck nil )))
s .4
( unless g1 ( return -f ro m che ck
( de st ru cturing -b in d ( X Y ) g2
( d ecl are ( type fi xnu m X Y ))
: c lause -1 )))
( case ( pop g1 )
(t ( return -f ro m che ck nil )))
s .5
( unless g1 ( return -f ro m che ck
( de st ru cturing -b in d ( X Y ) g2
( d ecl are ( type fi xnu m X)
( type in teger Y ))
: c lause -2 )))
( case ( pop g1 )
(t ( return -f ro m che ck nil ))))))
Figure 14: Macro expansion of rte-case from Figure 2 and
consequently of destructuring-case from Figure 1.
computing the SXP. However, we do not currently have enough
data to condently support this claim.
We also discussed in Section 3.3 a technique for making the
DFAs match non-overlapping languages before attempting to cal-
culate the SXP. This technique avoids having to make priority
based decisions when the languages overlap. We thereafter saw
that this technique produces DFAs with non-coäccessible states. It
may well be worth investigation whether robustly implementing
the priority based SXP procedure is more ecient, as the input
DFAs would themselves be smaller in many cases, and be absent
the non-coäccessible states.
The
rte-case
macro we discuss in this paper does not attempt to
answer questions about exhaustiveness. It is possible however, to en-
hance the
rte-case
macro with
rte-ecase
(exhaustive
rte-case
)
which would append a nal otherwise clause,
(:* t)
. This clause
would serve at compile time to detect whether the leading clauses
are exhaustive; for if no state in the DFA corresponds to this
:otherwise-clause
, then the given rte patterns are exhaustive.
However, if there is a path in the DFA from an initial state to the
:otherwise-clause
, then the type labels on such a path form a
type signature for such a counter example. The types of the elements
of such a counter-example sequence could easily be generated by
nding any transit through the DFA, and clipping away any loops
it contains. The macro might also produce a compiler warning, as
well as insert a call to error in the code in case the code path is
taken at run-time.
REFERENCES
[Ans94]
Ansi. American National Standard: Programming Language – Common Lisp.
ANSI X3.226:1994 (R1999), 1994.
[Brz64]
Janusz A. Brzozowski. Derivatives of Regular Expressions. J. ACM, 11(4):481–
494, October 1964.
[Dom] Public Domain. Alexandria implementation of destructuring-case.
[Fun13]
Nobuhiko Funato. Public domain implementation of destructuring-bind,
2013. accessed 14 October 2018, 12h36 +0200.
[GWB91]
Richard P. Gabriel, Jon L. White, and Daniel G. Bobrow. CLOS: integrating
object-oriented and functional programming. Communications of the ACM,
34(9):29–38, 1991.
[HMU06]
John E. Hopcroft, Rajeev Motwani, and Jerey D. Ullman. Introduction to
Automata Theory, Languages, and Computation (3rd Edition). Addison-Wesley
Longman Publishing Co., Inc., Boston, MA, USA, 2006.
[Kee89]
Sonja E. Keene. Object-Oriented Programming in Common Lisp: a Program-
mer’s Guide to CLOS. Addison-Wesley, 1989.
[MR12]
Amaldev Manuel and Ramaswamy Ramanujam. Automata over innite
alphabets. In Modern Applications of Automata Theory, pages 529–554. 2012.
[NDV16]
Jim Newton, Akim Demaille, and Didier Verna. Type-Checking of Hetero-
geneous Sequences in Common Lisp. In European Lisp Symposium, Kraków,
Poland, May 2016.
[New18]
Jim Newton. Representing and Computing with Types in Dynamically Typed
Languages. PhD thesis, Sorbonne University, November 2018.
[NV18]
Jim Newton and Didier Verna. Strategies for typecase optimization. In
European Lisp Symposium, Marbella, Spain, April 2018.
[NVC17]
Jim Newton, Didier Verna, and Maximilien Colange. Programmatic Ma-
nipulation of Common Lisp Type Speciers. In European Lisp Symposium,
Brussels, Belgium, April 2017.
[ORT09]
Scott Owens, John Reppy, and Aaron Turon. Regular-expression Derivatives
Re-examined. J. Funct. Program., 19(2):173–190, March 2009.
[SCG13]
Alex Shinn, John Cowan, and Arthur A. Gleckler. Revised 7 report on the
algorithmic language Scheme. Technical report, 2013.
[YD14]
Francois Yvon and Akim Demaille. Théorie des Langages Rationnels. EPITA
LRDE, 2014. Lecture notes.