Difference between revisions of "Publications/newton.18.phd"

From LRDE

(Created page with "{{Publication | published = true | date = 2018-11-01 | authors = Jim Newton | title = Representing and Computing with Types in Dynamically Typed Languages | school = Sorbonne...")
 
 
(6 intermediate revisions by the same user not shown)
Line 5: Line 5:
 
| title = Representing and Computing with Types in Dynamically Typed Languages
 
| title = Representing and Computing with Types in Dynamically Typed Languages
 
| school = Sorbonne Université
 
| school = Sorbonne Université
  +
| nodoi =
 
| address = Paris, France
 
| address = Paris, France
| abstract = In this report, we present code generation techniques related to run-time type checking of heterogeneous sequences. Traditional regular expressions can be used to recognize well defined sets of character strings called rational languages or sometimes textitregular languages. Newton et al. present an extension whereby a dynamic programming language may recognize a well defined set of heterogeneous sequences, such as lists and vectors. As with the analogous string matching regular expression theorymatching these regular type expressions can also be achieved by using a finite state machine (deterministic finite automata, DFA). Constructing such a DFA can be time consuming. The approach we chose, uses meta-programming to intervene at compile-timegenerating efficient functions specific to each DFA, and allowing the compiler to further optimize the functions if possible. The functions are made available for use at run-time. Without this use of meta-programming, the program might otherwise be forced to construct the DFA at run-time. The excessively high cost of such a construction would likely far outweigh the time needed to match a string against the expression. Our technique involves hooking into the Common Lisp type system via the deftype macro. The first time the compiler encounters a relevant type specifier, the appropriate DFA is created, which may be a <math>\Omega(2^n)</math> operation, from which specific low-level code is generated to match that specific expression. Thereafter, when the type specifier is encountered again, the same pre-generated function can be used. The code generated is <math>\Theta(n)</math> complexity at run-time. A complication of this approach, which we explain in this report, is that to build the DFA we must calculate a disjoint type decomposition which is time consuming, and also leads to sub-optimal use of typecase in machine generated code. To handle this complication, we use our own macro optimized-typecase in our machine generated code. Uses of this macro are also implicitly expanded at compile time. Our macro expansion uses BDDs (Binary Decision Diagrams) to optimize the optimized-typecase into low level code, maintaining the typecase semantics but eliminating redundant type checks. In the report we also describe an extension of BDDs to accomodate subtyping in the Common Lisp type system as well as an in-depth analysis of worst-case sizes of BDDs.
+
| abstract = In this report, we present code generation techniques related to run-time type checking of heterogeneous sequences. Traditional regular expressions can be used to recognize well defined sets of character strings called rational languages or sometimes regular languages. Newton et al. present an extension whereby a dynamic programming language may recognize a well defined set of heterogeneous sequences, such as lists and vectors. As with the analogous string matching regular expression theory, matching these regular type expressions can also be achieved by using a finite state machine (deterministic finite automata, DFA). Constructing such a DFA can be time consuming. The approach we chose, uses meta-programming to intervene at compile-time, generating efficient functions specific to each DFA, and allowing the compiler to further optimize the functions if possible. The functions are made available for use at run-time. Without this use of meta-programming, the program might otherwise be forced to construct the DFA at run-time. The excessively high cost of such a construction would likely far outweigh the time needed to match a string against the expression. Our technique involves hooking into the Common Lisp type system via the DEFTYPE macro. The first time the compiler encounters a relevant type specifier, the appropriate DFA is created, which may be a Omega(2^n operation, from which specific low-level code is generated to match that specific expression. Thereafterwhen the type specifier is encountered again, the same pre-generated function can be used. The code generated is Theta(n) complexity at run-time. A complication of this approach, which we explain in this report, is that to build the DFA we must calculate a disjoint type decomposition which is time consuming, and also leads to sub-optimal use of TYPECASE in machine generated code. To handle this complication, we use our own macro OPTIMIZED-TYPECASE in our machine generated code. Uses of this macro are also implicitly expanded at compile time. Our macro expansion uses BDDs (Binary Decision Diagrams) to optimize the OPTIMIZED-TYPECASE into low level code, maintaining the TYPECASE semantics but eliminating redundant type checks. In the report we also describe an extension of BDDs to accomodate subtyping in the Common Lisp type system as well as an in-depth analysis of worst-case sizes of BDDs.
| lrdeslides = http://www.lrde.epita.fr/dload/papers/newton.2018.phd.slides.pdf
+
| lrdepaper = http://www.lrde.epita.fr/dload/papers/newton.18.phd.pdf
  +
| lrdeslides = http://www.lrde.epita.fr/dload/papers/newton.18.phd.slides.pdf
 
| lrdekeywords = rational language, automata, lisp, bdd
 
| lrdekeywords = rational language, automata, lisp, bdd
 
| lrdeprojects = Climb
 
| lrdeprojects = Climb
  +
| lrdenewsdate = 2018-11-01
 
| type = phdthesis
 
| type = phdthesis
 
| id = newton.18.phd
 
| id = newton.18.phd
Line 19: Line 22:
 
school = <nowiki>{</nowiki>Sorbonne Universit\'e<nowiki>}</nowiki>,
 
school = <nowiki>{</nowiki>Sorbonne Universit\'e<nowiki>}</nowiki>,
 
year = 2018,
 
year = 2018,
 
nodoi = <nowiki>{</nowiki><nowiki>}</nowiki>,
 
address = <nowiki>{</nowiki>Paris, France<nowiki>}</nowiki>,
 
address = <nowiki>{</nowiki>Paris, France<nowiki>}</nowiki>,
 
month = nov,
 
month = nov,
Line 25: Line 29:
 
sequences. Traditional regular expressions can be used to
 
sequences. Traditional regular expressions can be used to
 
recognize well defined sets of character strings called
 
recognize well defined sets of character strings called
\textit<nowiki>{</nowiki>rational languages<nowiki>}</nowiki> or sometimes \textit<nowiki>{</nowiki>regular
+
rational languages or sometimes regular languages. Newton
languages<nowiki>}</nowiki>. Newton et al. present an extension whereby a
+
et al. present an extension whereby a dynamic programming
dynamic programming language may recognize a well defined
+
language may recognize a well defined set of heterogeneous
set of heterogeneous sequences, such as lists and vectors.
+
sequences, such as lists and vectors. As with the analogous
  +
string matching regular expression theory, matching these
 
 
regular type expressions can also be achieved by using a
As with the analogous string matching regular expression
 
 
finite state machine (deterministic finite automata, DFA).
theory, matching these \textit<nowiki>{</nowiki>regular type expressions<nowiki>}</nowiki>
 
 
Constructing such a DFA can be time consuming. The approach
can also be achieved by using a finite state machine
 
 
we chose, uses meta-programming to intervene at
(deterministic finite automata, DFA). Constructing such a
 
  +
compile-time, generating efficient functions specific to
DFA can be time consuming. The approach we chose, uses
 
  +
each DFA, and allowing the compiler to further optimize the
meta-programming to intervene at compile-time, generating
 
efficient functions specific to each DFA, and allowing the
+
functions if possible. The functions are made available for
 
use at run-time. Without this use of meta-programming, the
compiler to further optimize the functions if possible. The
 
  +
program might otherwise be forced to construct the DFA at
functions are made available for use at run-time. Without
 
 
run-time. The excessively high cost of such a construction
this use of meta-programming, the program might otherwise
 
be forced to construct the DFA at run-time. The excessively
+
would likely far outweigh the time needed to match a string
 
against the expression. Our technique involves hooking into
high cost of such a construction would likely far outweigh
 
the time needed to match a string against the expression.
+
the Common Lisp type system via the DEFTYPE macro. The
 
first time the compiler encounters a relevant type
 
 
specifier, the appropriate DFA is created, which may be a
Our technique involves hooking into the Common Lisp type
 
  +
Omega(2^n operation, from which specific low-level code is
system via the \texttt<nowiki>{</nowiki>deftype<nowiki>}</nowiki> macro. The first time the
 
 
generated to match that specific expression. Thereafter,
compiler encounters a relevant type specifier, the
 
 
when the type specifier is encountered again, the same
appropriate DFA is created, which may be a $\Omega(2^n)$
 
operation, from which specific low-level code is generated
+
pre-generated function can be used. The code generated is
 
Theta(n) complexity at run-time. A complication of this
to match that specific expression. Thereafter, when the
 
  +
approach, which we explain in this report, is that to build
type specifier is encountered again, the same pre-generated
 
  +
the DFA we must calculate a disjoint type decomposition
function can be used. The code generated is $\Theta(n)$
 
  +
which is time consuming, and also leads to sub-optimal use
complexity at run-time.
 
 
of TYPECASE in machine generated code. To handle this
 
A complication of this approach, which we explain in this
+
complication, we use our own macro OPTIMIZED-TYPECASE in
 
our machine generated code. Uses of this macro are also
report, is that to build the DFA we must calculate a
 
 
implicitly expanded at compile time. Our macro expansion
disjoint type decomposition which is time consuming, and
 
 
uses BDDs (Binary Decision Diagrams) to optimize the
also leads to sub-optimal use of \texttt<nowiki>{</nowiki>typecase<nowiki>}</nowiki> in
 
  +
OPTIMIZED-TYPECASE into low level code, maintaining the
machine generated code. To handle this complication, we use
 
  +
TYPECASE semantics but eliminating redundant type checks.
our own macro \texttt<nowiki>{</nowiki>optimized-typecase<nowiki>}</nowiki> in our machine
 
 
In the report we also describe an extension of BDDs to
generated code. Uses of this macro are also implicitly
 
 
accomodate subtyping in the Common Lisp type system as well
expanded at compile time. Our macro expansion uses BDDs
 
 
as an in-depth analysis of worst-case sizes of BDDs. <nowiki>}</nowiki>
(Binary Decision Diagrams) to optimize the
 
\texttt<nowiki>{</nowiki>optimized-typecase<nowiki>}</nowiki> into low level code,
 
maintaining the \texttt<nowiki>{</nowiki>typecase<nowiki>}</nowiki> semantics but eliminating
 
redundant type checks. In the report we also describe an
 
extension of BDDs to accomodate subtyping in the Common
 
Lisp type system as well as an in-depth analysis of
 
worst-case sizes of BDDs. <nowiki>}</nowiki>
 
 
<nowiki>}</nowiki>
 
<nowiki>}</nowiki>
   

Latest revision as of 10:40, 22 April 2023

Abstract

In this report, we present code generation techniques related to run-time type checking of heterogeneous sequences. Traditional regular expressions can be used to recognize well defined sets of character strings called rational languages or sometimes regular languages. Newton et al. present an extension whereby a dynamic programming language may recognize a well defined set of heterogeneous sequences, such as lists and vectors. As with the analogous string matching regular expression theory, matching these regular type expressions can also be achieved by using a finite state machine (deterministic finite automata, DFA). Constructing such a DFA can be time consuming. The approach we chose, uses meta-programming to intervene at compile-time, generating efficient functions specific to each DFA, and allowing the compiler to further optimize the functions if possible. The functions are made available for use at run-time. Without this use of meta-programming, the program might otherwise be forced to construct the DFA at run-time. The excessively high cost of such a construction would likely far outweigh the time needed to match a string against the expression. Our technique involves hooking into the Common Lisp type system via the DEFTYPE macro. The first time the compiler encounters a relevant type specifier, the appropriate DFA is created, which may be a Omega(2^n operation, from which specific low-level code is generated to match that specific expression. Thereafterwhen the type specifier is encountered again, the same pre-generated function can be used. The code generated is Theta(n) complexity at run-time. A complication of this approach, which we explain in this report, is that to build the DFA we must calculate a disjoint type decomposition which is time consuming, and also leads to sub-optimal use of TYPECASE in machine generated code. To handle this complication, we use our own macro OPTIMIZED-TYPECASE in our machine generated code. Uses of this macro are also implicitly expanded at compile time. Our macro expansion uses BDDs (Binary Decision Diagrams) to optimize the OPTIMIZED-TYPECASE into low level code, maintaining the TYPECASE semantics but eliminating redundant type checks. In the report we also describe an extension of BDDs to accomodate subtyping in the Common Lisp type system as well as an in-depth analysis of worst-case sizes of BDDs.

Documents

Bibtex (lrde.bib)

@PhDThesis{	  newton.18.phd,
  author	= {Jim Newton},
  title		= {Representing and Computing with Types in Dynamically Typed
		  Languages},
  school	= {Sorbonne Universit\'e},
  year		= 2018,
  nodoi		= {},
  address	= {Paris, France},
  month		= nov,
  abstract	= {In this report, we present code generation techniques
		  related to run-time type checking of heterogeneous
		  sequences. Traditional regular expressions can be used to
		  recognize well defined sets of character strings called
		  rational languages or sometimes regular languages. Newton
		  et al. present an extension whereby a dynamic programming
		  language may recognize a well defined set of heterogeneous
		  sequences, such as lists and vectors. As with the analogous
		  string matching regular expression theory, matching these
		  regular type expressions can also be achieved by using a
		  finite state machine (deterministic finite automata, DFA).
		  Constructing such a DFA can be time consuming. The approach
		  we chose, uses meta-programming to intervene at
		  compile-time, generating efficient functions specific to
		  each DFA, and allowing the compiler to further optimize the
		  functions if possible. The functions are made available for
		  use at run-time. Without this use of meta-programming, the
		  program might otherwise be forced to construct the DFA at
		  run-time. The excessively high cost of such a construction
		  would likely far outweigh the time needed to match a string
		  against the expression. Our technique involves hooking into
		  the Common Lisp type system via the DEFTYPE macro. The
		  first time the compiler encounters a relevant type
		  specifier, the appropriate DFA is created, which may be a
		  Omega(2^n operation, from which specific low-level code is
		  generated to match that specific expression. Thereafter,
		  when the type specifier is encountered again, the same
		  pre-generated function can be used. The code generated is
		  Theta(n) complexity at run-time. A complication of this
		  approach, which we explain in this report, is that to build
		  the DFA we must calculate a disjoint type decomposition
		  which is time consuming, and also leads to sub-optimal use
		  of TYPECASE in machine generated code. To handle this
		  complication, we use our own macro OPTIMIZED-TYPECASE in
		  our machine generated code. Uses of this macro are also
		  implicitly expanded at compile time. Our macro expansion
		  uses BDDs (Binary Decision Diagrams) to optimize the
		  OPTIMIZED-TYPECASE into low level code, maintaining the
		  TYPECASE semantics but eliminating redundant type checks.
		  In the report we also describe an extension of BDDs to
		  accomodate subtyping in the Common Lisp type system as well
		  as an in-depth analysis of worst-case sizes of BDDs. }
}