Difference between revisions of "Publications/newton.18.phd"
From LRDE
(Created page with "{{Publication | published = true | date = 2018-11-01 | authors = Jim Newton | title = Representing and Computing with Types in Dynamically Typed Languages | school = Sorbonne...") |
|||
(6 intermediate revisions by the same user not shown) | |||
Line 5: | Line 5: | ||
| title = Representing and Computing with Types in Dynamically Typed Languages |
| title = Representing and Computing with Types in Dynamically Typed Languages |
||
| school = Sorbonne Université |
| school = Sorbonne Université |
||
+ | | nodoi = |
||
| address = Paris, France |
| address = Paris, France |
||
− | | abstract = In this report, we present code generation techniques related to run-time type checking of heterogeneous sequences. |
+ | | abstract = In this report, we present code generation techniques related to run-time type checking of heterogeneous sequences. Traditional regular expressions can be used to recognize well defined sets of character strings called rational languages or sometimes regular languages. Newton et al. present an extension whereby a dynamic programming language may recognize a well defined set of heterogeneous sequences, such as lists and vectors. As with the analogous string matching regular expression theory, matching these regular type expressions can also be achieved by using a finite state machine (deterministic finite automata, DFA). Constructing such a DFA can be time consuming. The approach we chose, uses meta-programming to intervene at compile-time, generating efficient functions specific to each DFA, and allowing the compiler to further optimize the functions if possible. The functions are made available for use at run-time. Without this use of meta-programming, the program might otherwise be forced to construct the DFA at run-time. The excessively high cost of such a construction would likely far outweigh the time needed to match a string against the expression. Our technique involves hooking into the Common Lisp type system via the DEFTYPE macro. The first time the compiler encounters a relevant type specifier, the appropriate DFA is created, which may be a Omega(2^n operation, from which specific low-level code is generated to match that specific expression. Thereafterwhen the type specifier is encountered again, the same pre-generated function can be used. The code generated is Theta(n) complexity at run-time. A complication of this approach, which we explain in this report, is that to build the DFA we must calculate a disjoint type decomposition which is time consuming, and also leads to sub-optimal use of TYPECASE in machine generated code. To handle this complication, we use our own macro OPTIMIZED-TYPECASE in our machine generated code. Uses of this macro are also implicitly expanded at compile time. Our macro expansion uses BDDs (Binary Decision Diagrams) to optimize the OPTIMIZED-TYPECASE into low level code, maintaining the TYPECASE semantics but eliminating redundant type checks. In the report we also describe an extension of BDDs to accomodate subtyping in the Common Lisp type system as well as an in-depth analysis of worst-case sizes of BDDs. |
− | | |
+ | | lrdepaper = http://www.lrde.epita.fr/dload/papers/newton.18.phd.pdf |
+ | | lrdeslides = http://www.lrde.epita.fr/dload/papers/newton.18.phd.slides.pdf |
||
| lrdekeywords = rational language, automata, lisp, bdd |
| lrdekeywords = rational language, automata, lisp, bdd |
||
| lrdeprojects = Climb |
| lrdeprojects = Climb |
||
+ | | lrdenewsdate = 2018-11-01 |
||
| type = phdthesis |
| type = phdthesis |
||
| id = newton.18.phd |
| id = newton.18.phd |
||
Line 19: | Line 22: | ||
school = <nowiki>{</nowiki>Sorbonne Universit\'e<nowiki>}</nowiki>, |
school = <nowiki>{</nowiki>Sorbonne Universit\'e<nowiki>}</nowiki>, |
||
year = 2018, |
year = 2018, |
||
⚫ | |||
address = <nowiki>{</nowiki>Paris, France<nowiki>}</nowiki>, |
address = <nowiki>{</nowiki>Paris, France<nowiki>}</nowiki>, |
||
month = nov, |
month = nov, |
||
Line 25: | Line 29: | ||
sequences. Traditional regular expressions can be used to |
sequences. Traditional regular expressions can be used to |
||
recognize well defined sets of character strings called |
recognize well defined sets of character strings called |
||
− | + | rational languages or sometimes regular languages. Newton |
|
− | + | et al. present an extension whereby a dynamic programming |
|
− | + | language may recognize a well defined set of heterogeneous |
|
− | + | sequences, such as lists and vectors. As with the analogous |
|
+ | string matching regular expression theory, matching these |
||
− | |||
⚫ | |||
− | As with the analogous string matching regular expression |
||
⚫ | |||
− | theory, matching these \textit<nowiki>{</nowiki>regular type expressions<nowiki>}</nowiki> |
||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
+ | compile-time, generating efficient functions specific to |
||
⚫ | |||
+ | each DFA, and allowing the compiler to further optimize the |
||
⚫ | |||
− | + | functions if possible. The functions are made available for |
|
⚫ | |||
− | compiler to further optimize the functions if possible. The |
||
+ | program might otherwise be forced to construct the DFA at |
||
− | functions are made available for use at run-time. Without |
||
⚫ | |||
⚫ | |||
− | + | would likely far outweigh the time needed to match a string |
|
⚫ | |||
⚫ | |||
− | the |
+ | the Common Lisp type system via the DEFTYPE macro. The |
⚫ | |||
− | |||
⚫ | |||
⚫ | |||
+ | Omega(2^n operation, from which specific low-level code is |
||
− | system via the \texttt<nowiki>{</nowiki>deftype<nowiki>}</nowiki> macro. The first time the |
||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
− | + | pre-generated function can be used. The code generated is |
|
⚫ | |||
⚫ | |||
+ | approach, which we explain in this report, is that to build |
||
⚫ | |||
+ | the DFA we must calculate a disjoint type decomposition |
||
− | function can be used. The code generated is $\Theta(n)$ |
||
+ | which is time consuming, and also leads to sub-optimal use |
||
⚫ | |||
⚫ | |||
− | |||
− | + | complication, we use our own macro OPTIMIZED-TYPECASE in |
|
⚫ | |||
− | report, is that to build the DFA we must calculate a |
||
⚫ | |||
− | disjoint type decomposition which is time consuming, and |
||
⚫ | |||
− | also leads to sub-optimal use of \texttt<nowiki>{</nowiki>typecase<nowiki>}</nowiki> in |
||
+ | OPTIMIZED-TYPECASE into low level code, maintaining the |
||
⚫ | |||
+ | TYPECASE semantics but eliminating redundant type checks. |
||
− | our own macro \texttt<nowiki>{</nowiki>optimized-typecase<nowiki>}</nowiki> in our machine |
||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
− | maintaining the \texttt<nowiki>{</nowiki>typecase<nowiki>}</nowiki> semantics but eliminating |
||
⚫ | |||
⚫ | |||
− | Lisp type system as well as an in-depth analysis of |
||
⚫ | |||
<nowiki>}</nowiki> |
<nowiki>}</nowiki> |
||
Latest revision as of 10:40, 22 April 2023
- Authors
- Jim Newton
- Place
- Paris, France
- Type
- phdthesis
- Projects
- Climb
- Keywords
- rational language, automata, lisp, bdd
- Date
- 2018-11-01
Abstract
In this report, we present code generation techniques related to run-time type checking of heterogeneous sequences. Traditional regular expressions can be used to recognize well defined sets of character strings called rational languages or sometimes regular languages. Newton et al. present an extension whereby a dynamic programming language may recognize a well defined set of heterogeneous sequences, such as lists and vectors. As with the analogous string matching regular expression theory, matching these regular type expressions can also be achieved by using a finite state machine (deterministic finite automata, DFA). Constructing such a DFA can be time consuming. The approach we chose, uses meta-programming to intervene at compile-time, generating efficient functions specific to each DFA, and allowing the compiler to further optimize the functions if possible. The functions are made available for use at run-time. Without this use of meta-programming, the program might otherwise be forced to construct the DFA at run-time. The excessively high cost of such a construction would likely far outweigh the time needed to match a string against the expression. Our technique involves hooking into the Common Lisp type system via the DEFTYPE macro. The first time the compiler encounters a relevant type specifier, the appropriate DFA is created, which may be a Omega(2^n operation, from which specific low-level code is generated to match that specific expression. Thereafterwhen the type specifier is encountered again, the same pre-generated function can be used. The code generated is Theta(n) complexity at run-time. A complication of this approach, which we explain in this report, is that to build the DFA we must calculate a disjoint type decomposition which is time consuming, and also leads to sub-optimal use of TYPECASE in machine generated code. To handle this complication, we use our own macro OPTIMIZED-TYPECASE in our machine generated code. Uses of this macro are also implicitly expanded at compile time. Our macro expansion uses BDDs (Binary Decision Diagrams) to optimize the OPTIMIZED-TYPECASE into low level code, maintaining the TYPECASE semantics but eliminating redundant type checks. In the report we also describe an extension of BDDs to accomodate subtyping in the Common Lisp type system as well as an in-depth analysis of worst-case sizes of BDDs.
Documents
Bibtex (lrde.bib)
@PhDThesis{ newton.18.phd, author = {Jim Newton}, title = {Representing and Computing with Types in Dynamically Typed Languages}, school = {Sorbonne Universit\'e}, year = 2018, nodoi = {}, address = {Paris, France}, month = nov, abstract = {In this report, we present code generation techniques related to run-time type checking of heterogeneous sequences. Traditional regular expressions can be used to recognize well defined sets of character strings called rational languages or sometimes regular languages. Newton et al. present an extension whereby a dynamic programming language may recognize a well defined set of heterogeneous sequences, such as lists and vectors. As with the analogous string matching regular expression theory, matching these regular type expressions can also be achieved by using a finite state machine (deterministic finite automata, DFA). Constructing such a DFA can be time consuming. The approach we chose, uses meta-programming to intervene at compile-time, generating efficient functions specific to each DFA, and allowing the compiler to further optimize the functions if possible. The functions are made available for use at run-time. Without this use of meta-programming, the program might otherwise be forced to construct the DFA at run-time. The excessively high cost of such a construction would likely far outweigh the time needed to match a string against the expression. Our technique involves hooking into the Common Lisp type system via the DEFTYPE macro. The first time the compiler encounters a relevant type specifier, the appropriate DFA is created, which may be a Omega(2^n operation, from which specific low-level code is generated to match that specific expression. Thereafter, when the type specifier is encountered again, the same pre-generated function can be used. The code generated is Theta(n) complexity at run-time. A complication of this approach, which we explain in this report, is that to build the DFA we must calculate a disjoint type decomposition which is time consuming, and also leads to sub-optimal use of TYPECASE in machine generated code. To handle this complication, we use our own macro OPTIMIZED-TYPECASE in our machine generated code. Uses of this macro are also implicitly expanded at compile time. Our macro expansion uses BDDs (Binary Decision Diagrams) to optimize the OPTIMIZED-TYPECASE into low level code, maintaining the TYPECASE semantics but eliminating redundant type checks. In the report we also describe an extension of BDDs to accomodate subtyping in the Common Lisp type system as well as an in-depth analysis of worst-case sizes of BDDs. } }