Implementing Baker’s SUBTYPEP decision procedure
Léo Valais
Jim E. Newton
Didier Verna
lvalais@lrde.epita.fr
jnewton@lrde.epita.fr
didier@lrde.epita.fr
EPITA/LRDE
Le Kremlin-Bicêtre, France
ABSTRACT
We present here our partial implementation of Baker’s decision
procedure for
subtypep
. In his article A Decision Procedure for
Common Lisp’s
SUBTYPEP
Predicate”, he claims to provide imple-
mentation guidelines to obtain a
subtypep
more accurate and as
ecient as the average implementation. However, he did not pro-
vide any serious implementation and his description is sometimes
obscure. In this paper we present our implementation of part of his
procedure, only supporting primitive types, Clos classes,
member
,
range and logical type speciers. We explain in our words our un-
derstanding of his procedure, with much more detail and examples
than in Baker’s article. We therefore clarify many parts of his de-
scription and ll in some of its gaps or omissions. We also argue in
favor and against some of his choices and present our alternative
solutions. We further provide some proofs that might be missing
in his article and some early eciency results. We have not re-
leased any code yet but we plan to open source it as soon as it is
presentable.
CCS CONCEPTS
Theory of computation Type theory
; Divide and conquer;
Pattern matching.
ACM Reference Format:
Léo Valais, Jim E. Newton, and Didier Verna. 2019. Implementing Baker’s
SUBTYPEP decision procedure. In Proceedings of the 12th European Lisp
Symposium (ELS’19). ACM, New York, NY, USA, 8 pages. https://doi.org/10.
5281/zenodo.2646982
1 INTRODUCTION
The Common Lisp standard [
1
] provides the predicate function
subtypep
for introspecting the sub-typing relationship. Every invo-
cation
(subtypep A B)
either returns the values
(t t)
when
A
is a
subtype of
B
,
(nil t)
when not, or
(nil nil)
meaning the pred-
icate could not (or failed to) answer the question. The latter can
happen when the type specier
(satisfies P)
(representing the
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).
ELS’19, April 01–02 2019, Genova, Italy
© 2019 Copyright held by the owner/author(s).
ACM ISBN 978-2-9557474-3-8.
https://doi.org/10.5281/zenodo.2646982
(sb!xc:deftype keyword ()
'(and symbol (satisfies keywordp)))
Listing 1: The keyword type denition in Sbcl
type
{x | P(x)}
for some predicate and total function
1
P
) is involved.
For example, given two arbitrary predicates
F
and
G
, there is no
way
subtypep
can answer the question
(subtypep ’(satisfies F)
’(satisfies G)).
However, some implementations abuse the permission to return
(nil nil)
. For example, in Sbcl 1.4.10 (the implementation we are
currently focusing our eorts on),
(subtypep ’boolean ’keyword)
returns
(nil nil)
, thus violating the standard
2
. The denition of
the
keyword
type is responsible for this failure: as shown in Listing 1,
it involves a satisfies type specier
3
.
Another kind of problem for which
subtypep
’s accuracy matters
is the optimization of the
typecase
construct as shown in [
7
] and
[
8
]. The aim is to remove redundant checks in the construct and
the approach is to use binary decision diagrams. However, to build
such a structure,
subtypep
is repeatedly used. The unreliability of
the predicate leads here to many lost BDD reductions and therefore
to the generation of sub-optimal code.
Our implementation is still in active development, currently tar-
gets Sbcl and focuses almost entirely on result accuracy. It supports
primitive types, user-dened types (
deftype
, classes and structures),
member
(and
eql
) type speciers and ranges (e.g.,
(integer * 12)
).
We present our strategy for implementing each one of these while
discussing how and why we decided or not to diverge from Baker’s
[
3
] approach—or potentially lling some gaps or unclear bits. No
optimization work has been done yet and the implementation still
has bugs and diverse issues, but we have found some encouraging
results about accuracy and even about eciency.
2 THE COMMON LISP TYPE SYSTEM
2.1 Type speciers
Common Lisp types are not manipulated directly. Instead, the type
to be manipulated is described using a type specier. The type
specier Domain-Specic Language (DSL) allows programmers
to describe types by writing S-expressions which obey some rules
described in the Common Lisp standard [1].
1
A function dened over its entire denition domain.
2
The Common Lisp standard requires that no invocation of
subtypep
involving only
primitive types return (nil nil).
3
C.f. bug #1533685 in Sbcl bug tracker.
Implementing Baker’s SUBTYPEP decision procedure ELS’19, April 01–02 2019, Genova, Italy
(deftype except (x)
`(not (eql ,x)))
Listing 2: The deftype construct
A subtlety about type speciers is that dierent ones can rep-
resent the same type (e.g.,
integer
,
(integer * *)
and
(or fixnum
bignum)
all describe the same type). This means that symbolic com-
putation does not suce to answer the sub-typing question. Note
that one could write a predicate, say
type=
, to determine whether
two type speciers in fact describe the same type using two calls
of subtypep.
It is possible to dene parametric aliases using the
deftype
con-
struct. It is then possible to refer to a whole type specier using its
alias. Listing 2 shows an example of parametric deftype.
2.2 Vocabulary
type
A set of elements. For any type
u
:
u {x | x
:
u}
canonical t.s. A type specier without aliases.
primitive type
A standardized type ([
2
]) that is not necessarily
implemented as a class.
symbolic form A type specier whose type is symbol.
compound form A type specier whose type is list.
logical form A compound form whose car is or, and or not.
kingdom
In Baker’s terminology, a “type kingdom” des-
ignates the types that can be described using
only one kind of type specier.
nil
(the empty
type) belongs to every type kingdom.
In this article we focus on two particular type kingdoms:
the literal type kingdom, represented using only symbolic,
member and logical type speciers, and,
the range type kingdom, represented only using range and
logical type speciers
For example,
(or string symbol)
belongs to the literal type king-
dom.
(and number (not real))
belongs to the range type kingdom.
However,
(or symbol integer)
belongs to the literal type kingdom
while
(or symbol (integer * *))
belongs to both. This situation is
handled in section 4.
There are other type kingdoms that Baker mentions in his arti-
cle, such as the array type kingdom, represented using only
array
and logical type speciers. Note that a type can belong to several
kingdoms, as multiple type speciers can describe it. For example,
integer
belongs to literal and range kingdoms as the type speciers
integer
(symbolic) and
(integer * *)
(range) both describe it. In
Section 4, we describe how to guarantee that a given type is only
described by one kind of type specier, hence restricting it to one
kingdom.
3 PROCEDURE’S MECHANISMS OVERVIEW
Figure 1 shows the internals of our implementation. Every step will
be detailed in the following sections. There are three major stages:
(1)
The pre-processing — Both type speciers are processed in or-
der to simplify further calculations: the aliases are expanded,
and each occurrence of numeric types are converted to their
equivalent range type specier. Finally, as explained there-
after, the procedure splits into several sub-procedures, one
for each type kingdom, because their internal type represen-
tation dier. In order to achieve that, the type speciers must
also be split into equivalent subtype speciers restricted to
each concerned kingdom. This stage is detailed in Section 4.
(2)
Expert sub-procedures — Once split, each subtype specier
is redirected to the appropriate expert sub-procedure. The
job of such a procedure is to prove, in its own kingdom, the
assertion “
A
is a subtype of
B
” to be wrong. Our procedures
currently only support literal and range type speciers—an
expert sub-procedure has been implemented only for these
two kingdoms. This stage is detailed in Section 5.
(3)
Result conjunction — Eventually, all expert sub-procedures
return (a Boolean) and the results are accumulated using
conjunction. (In practice, as soon as one expert procedure
returns false, subtypep returns.)
4 PRE-PROCESSING
4.1 Alias expansion
The very rst step is to ensure that the type specier is in its
canonical form, that is, having all its aliases expanded. This is done
by the
expand
function. For example, considering the type created
in Listing 2, (expand ’(except 12)) should return (not (eql 12)).
Unlike macro expansion,
deftype
expansion is not standardized
in Common Lisp. Thus a solution must be found for each Common
Lisp implementation independently. As our eorts are currently
focused on Sbcl, we discuss how we implement the
expand
function
for that compiler.
Sbcl’s
subtypep
heavily relies on the function
sb-kernel:specifier-type
, which does type expansion. It
also does type simplication—turning
(and integer string)
into
nil
—which could have saved us some work. We hoped we
could simplify that function to make it compatible with Baker’s
algorithm while keeping the
deftype
expansion and the range
canonicalization work. However we found, thanks to [
7
] tools,
that the function is responsible for most of the work of
subtypep
,
as shown in Figure 2 Considering the lack of eciency of that
function and the fact that it would not be trivial to simplify it
to only keep the interesting bits, we decided on another, more
cost-eective solution.
The function
sb-ext:typexpand
takes a type specier and tries to
expand it (not recursively). It either returns the expansion result, or
the input type specier if it is not expandable.
(sb-ext:typexpand
’integer)
returns
integer
since it is not a
deftype
alias whereas
(sb-ext:typexpand ’(except 12))
returns
(not (eql 12))
. To ex-
pand a whole type specier, it just needs to walk through it, apply-
ing
sb-ext:typexpand
on each list or atom manually. One subtlety
though is that the result of an expansion may itself be an alias to
expand
4
. For example, let’s say that we have
(deftype my-type ()
’(except 0.0))
, then the result of
(sb-ext:typexpand ’my-type)
is
(except 0.0), which is of course an alias to expand again.
4
Fortunately,
sb-ext:typexpand
also returns a Boolean indicating whether or not an
expansion happened.
ELS’19, April 01–02 2019, Genova, Italy Léo Valais, Jim E. Newton, and Didier Verna
numeric types ranges numeric types ranges
alias expansion alias expansion
split split
type A type B
(and l-t-1 (not l-t-2))
missing types registration
bit-vector computation
= [0, 0, · · · , 0]?
(and r-t-1 (not r-t-2))
type diversity reduction
canonicalization
= ?
Ó
subtypep result
literal-type-1 literal-type-2
range-type-1
range-type-2
Figure 1: Internal owchart of (subtypep A B)
Figure 2: specifier-type weight in cl:subtypep execution
a
a
cached-subtypep-caching-call is just a memoizing wrapper around Sbcl’s subtypep
which is a bit more ecient than the raw implementation.
4.2 Numeric type speciers conversion
As explained in Section 3, after pre-processing both type speciers,
the procedure splits in two expert sub-procedures: one for literal
type speciers and one for range type speciers. Numeric types—
types containing numbers (mathematically speaking)—can have
dierent representations: a symbol (e.g.,
fixnum
), a
member
expression
(e.g.,
(member 1 2 3)
) or a range (e.g.,
(integer 1 6)
). However,
the rst two belong to the literal type kingdom whereas the latter
belongs to the range kingdom. Thus, the numerical type information
would be distributed over the dierent expert sub-procedures. For
consistency and accuracy, a single internal representation has to
be chosen. The symbolic and
member
numeric types must each be
converted into an equivalent type specier, in which numerical
data are only represented using ranges.
Symbolic numeric type specier — say
U
, replace it by
(U *
*)
5
. Note the new “type specier” is likely not to be valid
(e.g.,
(fixnum * *)
is invalid). Because it is never exposed to
the user—as it is an intermediate, internal representation—
nothing bad can happen. However, it cannot be used with
other functions requiring a type specier, such as typep.
member
type speciers — e.g.,
(member a 1 2 :b)
is converted
to
(or (member a :b) (bit 1 1) (integer 2 2))
. To do that,
(1) extract the numbers out of the expression,
(2)
map each number, say
n
, to construct the type specier
((type-of n) n n)
6
,
(3)
and combine the remaining
member
expression and the
ranges with the or logical type specier.
A subtlety to consider is that super-types of
number
also con-
tain numerical data that must be extracted. Indeed, the type
atom
contains both numerical data—
(number * *)
—and non-numerical
data—
(and atom (not (number * *)))
. Thus, its replacement in
the numeric type kingdom is straightforward:
(number * *)
. In the
literal type kingdom however, its replacement is
(or stream array
character function standard-object symbol structure-object
structure-class)
. The type
t
—which is
(or atom sequence)
—must
be converted similarily.
Yet another subtlety is that the type speciers
(and)
and
(or)
respectively describe the types
t
and
nil
. Hence every occurrence
of
(and)
must be replaced by the replacement of
t
described in the
previous paragraph. In order to remove that annoying corner case
completely, (or) is also replaced, by nil.
4.3 Splitting
Having reached this step, the input now only contains canonical
literal and range type speciers, numeric types being only expressed
as ranges. The next stage—expert sub-procedures—requires literal
and numeric types to be separated.
Thus the top type
t
is divided into two
7
disjoint subtypes—
“kingdoms” as Baker says. The previous step, described in Sec-
tion 4.2, ensures that the representation (in terms of type speciers)
of the types in each kingdom is dierent. All numeric types are
5
Implementations supporting the IEEE oating point raise many concerns with -0.0,
N aN , + and −∞. Baker explains in detail how to handle these cases.
6
The results of
type-of
are implementation-dependent. We suppose here that
type-of
only returns the name (as a symbol) of the type of n (n being a number).
7
One per kingdom actually, but since our implementation only supports two—literal
and range types—we only focus our attention on these.
Implementing Baker’s SUBTYPEP decision procedure ELS’19, April 01–02 2019, Genova, Italy
represented as ranges, and literal types as symbolic and
member
(without numbers) type speciers.
This step roughly consists of an in-depth traversal of the type
specier, using pattern-matching to recognize which type specier
represents which type. We use the implementation of [
9
] because
of its simplicity and versatility.
Our implementation uses a function
type-keep-if
which takes a
predicate P and a type specier U and returns:
U as it is when P(U ) = ,
nil when P(U ) = ,
(op U
1
· · · U
n
)
where
U
i
= (type-keep-if P U
i
)
when
U = (op U
1
· · · U
n
) and op {and, or, not}.
Given the predicate
literal-type-p
and a type
U
,
type-keep-if
returns
U
with every inner type specier that describes a non-
literal type replaced by
nil
(interpreted as the empty type). The
result is then a subtype of
(and (not number) (not (array * *)))
.
Likewise, given the predicate
range-type-p
, this function returns
U
with every non-range inner type specier replaced by
nil
(inter-
preted this time as the empty range). Thus, the result is a subtype
of
number
. Therefore,
split
can easily be implemented in terms of
type-keep-if.
4.4 Type reformulation
For any types
U
and
V
,
U V U V =
. Therefore, for any
type speciers
U
and
V
, when
(subtypep U V)
returns
T T
, then
(subtypep ‘(and ,U (not ,V)) nil) also returns T T.
The results of the
split
function are zipped together using
(lambda (x y) ‘(and ,x (not ,y)))
before being passed to the
expert sub-procedures. This way, they will not have to prove that
an arbitrary type is a subtype of another arbitrary subtype, but
rather whether one arbitrary type specier describes the empty
type (which is substantially easier to reason about, and implement).
5 EXPERT SUB-PROCEDURES
Listing 3 shows how
subtypep
could be dened from a top-
down point of view. It shows that, according to Figure 1, both
type speciers are processed independently, split into two king-
doms (literal and numeric types) and unied in an
(and U (not
V))
fashion. The expert sub-procedures,
null-literal-type-p
and
null-numeric-type-p
, each accept one argument—a type specier,
say
U
—and returns a Boolean indicating whether
U
describes the
empty type (nil).
Each sub-procedure answers restricted to its kingdom—as no
type can (at this point of the procedure) belong to two dierent
kingdoms, as shown in section 4. With that piece of information,
we can (now) safely assert that:
the literal type kingdom is the type described by
(and (not
number) (not (array * *)))
8
, and,
the numeric type kingdom is the type described by
number
9
.
8
Actually this is not completely accurate since the type
string
can be described using
array type speciers. However, since the latter are not supported by our implementation
yet, we consider the types
string
and
bit-vector
as being literal types since their
symbolic representation is kept through the entire process. This is very likely to
change in the future.
9
Our implementation does not support complex numbers yet, and considers the
complex
type as being empty. Some wrong results arise from that supposition—such as
(subtypep
(defun subtypep (a b)
(reduce (lambda (x y) (and x y))
(mapcar (lambda (expert t1 t2)
(funcall expert `(and ,t1 (not ,t2))))
(list #'null-literal-type-p
#'null-numeric-type-p)
(split (num-types->ranges (expand a)))
(split (num-types->ranges (expand b))))))
Listing 3: A top-down approach of subtypep
There are several properties that are derived from the preceding
pre-processing steps. First of all, both kingdoms’ procedures are
guaranteed to only ever receive argument canonical type speciers.
These are also guaranteed to never contain
atom
or
t
type speciers.
The occurrences of
(and)
and
(or)
have been replaced respectively
by
t
and
nil
.
eql
type speciers have been replaced by equivalent
member expressions. member type speciers only occur in the literal
type kingdom and contain no numerical data. Numerical data are
only expressed as intervals, which are likely not to be valid type
speciers. Both kingdoms accept the type specier
nil
but with a
dierent meaning: for literal types,
nil
means the empty type which
complement is
t
whereas for numeric types it represent the empty
range whose complement is (number * *).
In the following sections we describe in detail the implementa-
tion of the expert sub-procedures for the literal (Section 5.1) and
numeric (Section 5.2) type kingdoms. We also briey discuss in Sec-
tion 5.3 the array type kingdom and the
cons
type specier family,
which Baker ignores in his article.
5.1 Procedure for literal types
5.1.1 Theory. To represent types in the literal types kingdom, we
suppose at rst that there is a way to enumerate every element in
t
, say
e
1
, e
2
, . . . , e
ω
. Then, let
u
1
, u
2
, . . . , u
ω
be all the (non-strict)
subtypes of the top-level type
t
. We associate to each pair
u
i
, e
j
the bit
b
i j
with the value 1 when
e
j
u
i
and 0 when
e
j
< u
i
. Let
bv
i
be the representative bit-vector associated to the type
u
i
, dened
by
[
b
i0
, b
i1
, . . . , b
iω
]
. These bit-vectors are the rows of the innite
matrix on Eq. B
ωω
which illustrates the system.
©
«
e
1
e
2
e
3
e
4
· · · e
ω
u
1
1 0 0 0 · · · 1
u
2
0 1 1 0 · · · 0
u
3
0 0 0 1 · · · 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
u
ω
1 0 1 0 · · · 0
ª
®
®
®
®
®
®
¬
(B
ωω
)
Proof. Each type has a unique bit-vector representation.
Let
u
i
and
u
j
be two distinct types. Thus,
(u
i
u
j
)\(u
i
u
j
) ,
.
Let
e
k
(u
i
u
j
)\(u
i
u
j
)
. By denition, we have
b
ik
, b
jk
. Hence
bv
i
, bv
j
. Two distinct types are represented by two dierent bit-
vectors.
Similarly, let
bv
i
and
bv
j
be two dierent bit-vectors. Then
it necessarily exists a
k
such as
b
ik
, b
jk
. Therefore
e
k
,
e
k
< u
i
e
k
< u
j
e
k
< u
i
u
j
. Hence u
i
, u
j
.
’number ’real)
returning true. This will change as soon as complex numbers are
supported.
ELS’19, April 01–02 2019, Genova, Italy Léo Valais, Jim E. Newton, and Didier Verna
Proof.
Type intersection, union and complement are e quivalent to bitwise
Boolean operations “and”, “or” and “not” on representative bit-vectors.
Let two types u
i
and u
j
in:
(1)
Let
u
k
= u
i
u
j
. By denition,
l N {ω}, b
kl
=
1 i
b
il
=
1 or
b
jl
=
1, that is
b
kl
= b
il
b
jl
. Thus, also by
denition:
bv
k
=
[
b
k0
, b
k1
, . . . , b
kω
]
=
b
i0
b
j0
, b
i1
b
j1
, . . . , b
iω
b
jω
= bv
i
bv
j
(2)
We proceed similarly for the intersection and the Boolean
logical operator “and” ().
(3)
Let
u
k
= u
i
. We have by denition
l N {ω}, b
kl
= ¬b
il
.
Then:
bv
k
=
[
¬b
i0
, ¬b
i1
, . . . , ¬b
iω
]
= ¬bv
i
5.1.2 Implementation. Common Lisp cannot enumerate all the
possible subtypes of
t
nor all of its elements. Fortunately, we do
not need them all. We only need to consider the types mentioned
in the input type specier to determine its emptiness.
We also do not need to enumerate all the elements of these types. It
is that aspect of the procedure of Baker that makes it both powerful
and dicult to understand at rst. We only need suciently many
elements from a type to distinguish it from the other types. Because
we are now considering only a nite number of types, say
u
1
, . . . , u
n
,
to register a new type
u
n+1
to our (now nite) matrix, we only need
to nd an element e u
n+1
such as e < u
1
· · · u
n
.
Now let’s suppose that the type specier of
u
n+1
is in fact
(member
e)
, that
e
is itself chosen as a representative element for another
type, say
u
k
, and that
u
k
is only distinguished from the other reg-
istered types by that element
e
.
u
n+1
and
u
k
would then have the
same bit-vector representation when these types are likely to be
distinct. The general solution for that kind of problem is to regis-
ter all the elements found inside the
member
type specier. When
there is a conicting element
e
already registered as a representa-
tive for another types, we generate additional representatives for
these types. That precaution ensures that this kind of conict never
happens and greatly simplies the implementation of
member
type
speciers.
To implement that registration matrix system, we use two
functions:
B : type name 7− bit-vector
, with
B(u
i
) = bv
i
, and
I : representative 7− bit index
, with
I(e
i
) = i
1. Baker suggests
in his small example [
3
] using the operator
set
which is depre-
cated in modern Common Lisp programming. Instead, we use hash
tables to represent these functions. Type names are
symbol
s, bit-
vectors are
bit-vector
s and element indexes are positive
integer
s.
To register a new type
u
n+1
, it is added to the
B
hash table and
its bit-vector content
b
(n+1)i
is evaluated for all the existing repre-
sentatives (
i J
1;
mK
). To register a new representative
e
m+1
, it is
added to the
I
hash table with the index
m
. Then we add one bit
(the
m
-th bit) to each bit-vector
bv
i
and evaluate it in respect to the
type
u
i
. Thus, to retrieve the bit-vector of a registered primitive or
user-dened type
t
, we just lookup its value
B(t)
. To compute the
bit-vector of a
member
expression
(member e
1
· · · e
n
)
, we use the
value
B((member e
1
· · · e
n
)) =
Ô
n
i=1
β
(
I
(
e
i
))
, where
β(x)
returns
the null bit-vector with the x-th bit activated.
The bit-vector of logical type speciers are given in Eq. 1, Eq. 2
and Eq. 3 thereafter.
B
(
(and U
1
· · · U
n
)
)
=
n
Û
i=1
B(U
i
) (1)
B
(
(or U
1
· · · U
n
)
)
=
n
Ü
i=1
B(U
i
) (2)
B
(
(not U )
)
= ¬B(U ) (3)
5.1.3 Issues. The method for choosing the representative elements
for a type depends of its nature: it can be a primitive type, a user-
dened type (class, structure or condition) or a member expression.
Since primitive types are known (c.f. table 4.2 of [
2
]), their repre-
sentative elements are chosen at compile-time. The
u
n+1
subtlety
above should still be kept in mind. For instance, the type
null
is a
subtype of both
symbol
and
list
; so three representative elements
are needed:
nil
, a non-empty list and a symbol other than
nil
. Note
that some primitive types are an exhaustive partition of other types
(e.g.,
character (or base-char extended-char)
). Obviously, in
that case, such a precaution does not apply.
For user-dened types, Baker suggests to extend the type cre-
ation mechanism—thus modifying the implementation’s internal
functions—to register a dummy element as a representative. We
decided not to follow his approach because of the poor portability
of his solution. Indeed, this work, often non-trivial, would have
to be repeated for each targeted Common Lisp implementation.
(We would like to avoid modifying the Sbcl internal mechanisms.)
Moreover, it would register a representative for every class created,
thus increasing bit-vectors’ size uselessly since only a few of these
classes are likely to appear in a
subtypep
type specier. But more
importantly, the main drawback of his solution is that creating that
dummy element might have unexpected side-eects, as it may need
to use slot’s default values and/or
initialize-instance
. We decided
instead to use the Meta Object Protocol (Mop) [
6
], more specically
class prototypes. Class prototypes are pseudo-instances of a class,
created without executing
initialize-instance
and which
typep
and
eql
view as traditional instances. However, to create a class
prototype, the class needs to be nalized and it cannot be guaran-
teed until it is instantiated. Since that class may be involved in a
subtypep
call before that happens, when a new class is encountered,
we force its nalization using the function
ensure-finalized
from
the (portable)
closer-mop
package
10
. Then, we create the proto-
type of the class using
sb-mop:class-prototype
and register it. This
method is much more portable than Baker’s and does not require
to hook inside the implementation.
Since (in Sbcl
11
) conditions are classes, they are supported au-
tomatically. The Common Lisp standard [
1
] states that “
defstruct
without a
:type
option denes a class with the structure name as
its name”, hence in that case no additional work is required. The
standard also states that “Specifying this option [...] prevents the
10
http://common-lisp.net/project/closer/
11
Every major lisp implementations implement conditions as Clos classes—the most
obvious way to do it. We ignore exotic condition implementations.
Implementing Baker’s SUBTYPEP decision procedure ELS’19, April 01–02 2019, Genova, Italy
structure name from becoming a valid type specier recognizable by
typep
. Thus,
subtypep
is not concerned by these types of structures.
To address the misrepresentation problem when
member
type
speciers are involved, as discussed in Section 5.1.3, we must ensure
that a new representative element is generated and registered. The
Common Lisp standard ([
1
]) states that the
member
type specier
is dened in terms of
eql
. That is,
(typep e ’(member e
1
· · · e
n
))
uses
eql
to compare
e
to the successive
e
k
to check the membership.
That precise property reduces the misrepresentation problem to
only two types: symbol and character (and their subtypes).
To better understand why it is the case, rst consider a reduced
version of the top-level type
t
:
t
=
(or string list symbol)
. Then,
let R = ("hello" (1 2 3) foo) be our list of representatives.
(1) Let’s ask the question (subtypep ’symbol ’(member foo)).
(2)
As discussed in Section 5.1.3, we add the elements of the
member
expression to
R
. To conform with the specication,
we rst check whether or not
foo
is already in
R eql
-wise:
foo
eql
R, so R does not change.
(3)
As shown in Eq. 4, the emptiness check passes, meaning
that
symbol
is indeed a subtype of
(member foo)
, which is
obviously wrong.
B(symbol) ¬B((member foo)) = 001 ¬001 (4)
= 001 110
= 000
= B(nil)
However, for lists, that problem does not appear, thanks to the
eql-wise comparison.
(1) (subtypep ’list ’(member (1 2 3)))
(2) (1 2 3) <
eql
R R = ("hello" (1 2 3) foo (1 2 3))
(3)
As shown in Eq. 5, the emptiness check fails and the answer
is correct.
B(list) ¬B((member (1 2 3))) = 0101 ¬0001 (5)
= 0101 1110
= 0100
, B(nil)
Within the literal types kingdom, the only types for which this
problem occurs—since the representatives are not supposed to be
accessible to the user of
subtypep
—are then
symbol
and
character
.
Therefore, only the representatives of these types need to be actually
checked when registering member’s elements.
To generate a new symbol, we use
alexandria:symbolicate
12
.
The
keyword
subtype of
symbol
is also subject to the problem. (Ac-
tually, solving the problem for keywords also solves the problem
for symbols.) To generate a new character, we rst need to know
whether it is a
base-char
or an
extended-char
. Then we pick a char-
acter of that type not registered yet. When all the characters of that
type are registered there is nothing to do (since the type is fully
represented in the matrix, no misinterpretation can occur).
We have not addressed the problem of a type specier involving a
user class
C
and a
member
expression containing the class prototype
of C yet.
12
https://common-lisp.net/project/alexandria/
5.2 Procedure for numeric types
Unlike the literal type kingdom, the range type kingdom does not
need an internal state to represent numeric types. Indeed, the expert
sub-procedure takes as input an already precise enough representa-
tion of the type described. Range type speciers allow to describe
which kind of number is specied (its type, e.g.,
integer
,
ratio
,
etc.), its bounds (inclusive and exclusive, e.g.,
(integer (0) 6)
) and
is able to represent non-bounded intervals through the symbol
*
meaning innity (e.g.,
(float * 0.0)
[
−∞; 0.0
]
). The range type
specier is as precise as the mathematical range notation. Addition-
ally, the mathematical union, the intersection and complement of
these ranges can be expressed equally using the corresponding
logical type specier.
Therefore, to assert about the emptiness of the input type spec-
ier, checking whether the canonicalized version of this interval
expression describes the empty range (i.e.,
nil
) is sucient. The cal-
culation is performed by three successive steps, which we describe
in the following sections.
This algorithm suers from an exponential time and space com-
plexity. However, Baker claims that in practice, that theoretical
complexity is not an issue (it only appears for “highly artical
cases”). We have not tried to prove (or invalidate) his statement but
Section 6 shows some early results that tend to support his claim.
We use a custom abstraction, the
interval
class, closer to the
mathematical object (with type, bounds and limits slots). Thus
we avoid the annoying manipulation of lists (with the many
standardized ranges syntaxes). The rst step is to write a func-
tion
range->interval
that converts (using pattern matching) a
range type specier to its corresponding
interval
instance. This
function also takes care of the exotic compound forms—such as
(unsigned-byte s)
which describes the
integer
range
[
0; 2
s
1
]
.
We also use a similar structure for interval operations to fully dis-
card the list representation.
We also need the following interval functions:
(interval-and I
1
I
2
)
— returns
I
1
I
2
if their
type
s are
eql
,
or otherwise.
(interval-or I
1
I
2
)
— returns
I
1
I
2
if their
type
s are
eql
and I
1
I
2
, , or otherwise.
(interval-minus I
1
I
2
)
returns
I
1
I
2
(may return two
values when I
2
I
1
) if their types are eql, or I
1
otherwise.
(interval-empty-p I ) — returns whether I = .
5.2.1 Type diversity reduction. Functions working with intervals
must be aware of the relationship of the types of these inter-
vals. For example, the intersection of two
integer
intervals might
be non-empty whereas the intersection of one
integer
and one
single-float
intervals is always null as these two types are disjoint.
However,
integer
and
fixnum
are dierent types but the intersec-
tion of intervals of such types might be non-empty. The subtype
relationship of the types of intervals needs to be introspected to
accurately apply some operations (such as intersection or union).
The type
number
(complex numbers being ignored) is an ex-
haustive partition of six mutually disjoint types:
integer
,
ratio
,
single-float
,
short-float
,
double-float
, and
long-float
. Baker ad-
vises to dene what he calls “simple intervals”, that is intervals
guaranteed to have their
type
equal to one of these six types. This
ELS’19, April 01–02 2019, Genova, Italy Léo Valais, Jim E. Newton, and Didier Verna
Supertype Conversion
number (or rational float)
real (or rational float)
rational (or integer ratio)
float (or short-float single-float double-float long-float)
bignum (or integer (not fixnum))
Table 1: Conversion table for supertypes
way, as these types are mutually disjoint, operations on intervals
of such types have their implementation greatly simplied.
To convert each numeric type into its equivalent using only the
six types above, a two-step conversion is required.
(1)
For intervals whose
type
is a supertype of one of these
types, the conversion table 1 is used. E.g.: the conversion of
(rational a b) gives (or (integer a b) (ratio a b)).
(2)
For intervals whose
type
is a bounded subtype (i.e.: having
dened bounds, not innity) of these six types, their actual
bounds have to be constrained to t within the bounds of
their type, before being converted to their corresponding su-
pertype. For example,
(fixnum
12 2
100
)
, has to be converted
to
(integer most-negative-fixnum most-positive-fixnum)
,
where
most-positive-fixnum <
2
100
, as 2
100
is a
bignum
,
thus discarding the numbers in between. A similar proce-
dure is applied to the types
bit
,
short-float
,
single-float
,
double-float and long-float.
Eventually, the
type
of every
interval
is constrained to one of
the six types above, with the bounds (if some) of their original type
preserved.
5.2.2 Canonicalization. To check the emptiness of the interval ex-
pression, it is canonicalized. Let
Γ
be the canonicalization function.
Its parameter is either an
interval I
or an operation on intervals
χ
(intersection, union or complement).
Γ
either returns
, an
interval
or a union of disjoint intervals—the three possible outcomes of a
mathematical interval canonicalization.
First and foremost, anytime
Γ
encounters or returns a union,
it must ensure that it is attened (no nested unions). It must also
ensure that the intervals inside the union are disjoint. As shown in
Section 5.2.1, intervals with dierent types are necessarily disjoint.
Touching intervals [3] are merged using interval-or.
Γ(∅)
and
Γ(I)
are straightforward, as shown in Eq. end-
and
Eq. end-I. These are the terminal cases of the recursion of Γ.
Γ(∅) = (end-)
Γ(I) = I (end-I)
Intersections (
and
logical type speciers) are reduced as soon as
they are encountered. Their operands need to be processed by
Γ
rst (hence the implicit mapping
k n
”). Eq.
and
-apply shows
how to reduce intersections. The
Φ
f
operator denotes a fold [
5
]
operation using the function
f
.
Γ
denotes the composition of
the
Γ
function and the intersection operator. To break it down in a
bottom-up fashion:
(1) Eq. and-nal — the application of the intersection function.
(2)
Eq.
and
-distribution
— the distribution of the intersection
over the union. Next step is Eq. and-nal.
(3)
Eq.
and
-distribution also the distribution of the intersection
over the union. However,
Γ(χ)
may return an union, leading
the execution either to Eq.
and
-distribution
or directly to
Eq. and-nal.
(4)
Eq.
and
-apply the canonicalization of the
χ
n
forms us-
ing mapping. The results are then folded using
Γ
, thus
initiating the recursive intersection distribution.
Γ
Ù
n
χ
n
!
= Φ
Γ◦∩
Γ(χ
k
)
kn
(and-apply)
Γ
χ
Ø
n
I
n
!
=
Ø
n
Γ
(
Γ(χ) I
n
)
(and-distribution)
Γ
Ø
n
I
n
I
!
=
Ø
n
Γ(I
n
I ) (and-distribution
)
Γ(I
1
I
2
) = (interval-and I
1
I
2
) (and-nal)
Complements (
not
logical type speciers) are also reduced as
soon as they are encountered. Their only operand is rst canon-
icalized. Complementing
U
in
number
(the top-level type of the
range type kingdom) is equivalent to the dierence
number U
,
as shown in Eq.
not
-apply. The dierence canonicalization goes
through a similar recursive distribution path than the intersection,
that is Eq.
minus
-distribution and then Eq.
minus
-apply. Note that
this path is taken every time since the interval dierence is an
internal operation and that its left-hand operand is always U.
Γ
(
χ
)
= Γ
(
U Γ(χ )
)
(not-apply)
U = type diversity reduction of (number * *)
Γ
χ
Ø
n
I
n
!
=
Ø
n
Γ(χ I
n
) (minus-distribution)
Γ
Ø
n
I
n
I
!
=
Ø
n
(interval-minus I
n
I ) (minus-apply)
5.2.3 Range emptiness check. Once an interval expression
canonicalized, checking its emptiness is trivial. The predicate
interval-empty-p
, given the result of the rst
Γ
call, just returns
the Boolean that null-numeric-type-p has to return.
5.3 Array types and cons type speciers
This section presents some preliminary work and research results
found on array and
cons
type speciers. Obviously, since the im-
plementation of the expert sub-procedures for these kingdoms is
still a work in progress, no result nor implementation guidelines
are provided here. It does, however, give some insights about how
Baker procedure applies to modern Common Lisp implementations
such as Sbcl.
Array type speciers are complex to handle because they are
bi-dimensional: it has an element type and bounds (e.g.,
(array
integer (* 2 *))
). Internally, Common Lisp implementations do
not store which exact type specier is specied but rather only store
Implementing Baker’s SUBTYPEP decision procedure ELS’19, April 01–02 2019, Genova, Italy
10
1
10
2
10
3
10
4
10
3
10
2
10
1
10
0
Subtypes of NUMBER
10
1
10
2
10
4
10
3
10
2
10
1
10
0
MEMBER types
10
1
10
2
10
3
10
4
10
4
10
3
10
2
10
1
10
0
Subtypes of T
10
0
10
1
10
2
10
3
10
4
10
3
10
2
10
1
10
0
Subtypes of CONDITION
Algorithm 1 with cl:subtypep
Algorithm 1 with baker:subtypep
Algorithm 2 with cl:subtypep
Algorithm 2 with baker:subtypep
Figure 3: Comparative eciency measures of our subtypep
implementation
the result of the function
upgraded-array-element-type
returns giv-
ing that type. E.g, for
(make-array 2 :element-type ’list)
, the
implementation does not makes an array of
list
but rather an ar-
ray of
(upgraded-array-element-type ’list)
. For every value that
might return this function, Baker requires that we store a bit matrix
(instead of bit vectors) because of the complex bounds logic of the
type specier. As for the literal type procedure, it seems to be an
ecient type representation system—albeit more complex—which
nonetheless requires an extra registration step and a global state.
Baker does not mention the
cons
type specier family at all
in his article because it appeared after he released his article [
4
].
An accurate expert sub-procedure for this kingdom would have
an exponential complexity. More investigation is needed to assert
whether or not that exponential time is “acceptable” (as it is for
ranges) before rejecting it. The accuracy of existing
subtypep
pro-
cedures for the cons type specier also needs to be studied.
6 EARLY RESULTS
Our implementation of
subtypep
is still in active development and
very experimental. No serious optimization work has been made.
Nonetheless, Newton has compared in [
7
] the performances of
several
subtypep
highly dependent algorithms, both using the im-
plementation of Sbcl and ours.
These results, shown in Figure 3, are only presented here as
complementary information. On the horizontal axis is the size of
the type speciers and on the vertical axis is the measured exe-
cution time. Hence, the lower a curve is, the better. As expected,
our implementation is often slower, but not dramatically, which is
encouraging.
Our implementation is overall slower in the range type king-
dom.
Heavy users of
member
seems to experience a slower execu-
tion. Perhaps, as predicted by Baker, the reason is that the
systematic registration of the elements makes the size of
the bit-vectors grow quickly, thus making every subsequent
operation slower.
For the symbolic type speciers—primitive types, Clos
classes and conditions—our implementation already outper-
forms Sbcl’s.
7 CONCLUSION AND FUTURE WORK
Throughout this article we presented our implementation of Baker’s
decision procedure. In Section 2 we introduced the Common Lisp
type system, the notion of type specier and some vocabulary. In
Section 4 we explained how to pre-process the caller’s type speci-
ers to make the work of the expert sub-procedures presented in
Section 5 easier. We described our implementation for the symbolic,
member
, range and logical type speciers. We also gave some insights
about the implementation for the
array
and
cons
type speciers.
We nally presented some early eciency measures, which are
globally encouraging.
Our implementation is still a work in progress and highly exper-
imental. But with some cleaning and the implementation of both
array
and
cons
expert sub-procedures, it could be a viable alter-
native to existing
subtypep
implementations. We will have open
sourced its code by then. We still have to nd a solution for the
satisfies
type specier and the related uncertainty. Indeed, in some
situations,
subtypep
still can answer even though the type speci-
er is involved. For example, in
(subtypep ’string ’(and number
(satisfies evenp)))
, as the second operand is guaranteed to be
a subtype of
number
, the predicate can safely return false. Finally,
a lot of measures on accuracy and eciency are needed to assert
whether Baker’s intuition about his procedure was correct or not.
Even if, in the future, we are to conclude that our implementation
is less ecient than those which already exists, Baker’s algorithm
would still likely to improve the predicate’s accuracy. Lispers would
then have the ability to choose whichever
subtypep
implementation
ts their needs the best.
REFERENCES
[1]
Ansi. American National Standard: Programming Language Common Lisp.
ANSI X3.226:1994 (R1999), 1994.
[2]
Ansi. American National Standard: Programming Language – Common Lisp –
Type Speciers (Section 4.2.3). ANSI X3.226:1994 (R1999), 1994. http://www.
lispworks.com/documentation/lw50/CLHS/Body/04_bc.htm.
[3]
Henry G. Baker. A Decision Procedure for Common Lisp’s
SUBTYPEP
Predicate.
Lisp and Symbolic Computation, 1992.
[4]
Paul F. Dietz. “subtypep tests” discussion on gcl-devel, 2005. https://lists.gnu.org/
archive/html/gcl-devel/2005-07/msg00038.html.
[5]
Graham Hutton. A tutorial on the universality and expressiveness of fold. Journal
of Functional Programming, 9(4):355–372, July 1999. URL http://dblp.uni-trier.de/
db/journals/jfp/jfp9.html#Hutton99.
[6]
Gregor J. Kiczales, Jim des Rivières, and Daniel G. Bobrow. The Art of the Metaobject
Protocol. MIT Press, Cambridge, MA, 1991.
[7]
Jim Newton. Representing and Computing with Types in Dynamically Typed Lan-
guages. PhD thesis, Sorbonne Université, Paris, France, November 2018.
[8]
Jim Newton and Didier Verna. Approaches in
typecase
optimization. In European
Lisp Symposium, Marbella, Spain, April 2018.
[9]
Peter Norvig. Paradigms of Articial Intelligence Programming: Case Studies in
Common Lisp. Morgan Kaufmann, 1992.