Derived-Term Automata of Multitape Expressions with Composition

This page is a complement to the paper Derived-Term Automata of Multitape Expressions with Composition. This page exists in several forms:

More information is available here:

You may change the cells, and run then. To run a cell in a notebook, hit "Control-Enter" (in which case the focus stays on the same cell) or "Shift-Enter" (focus goes to the next cell). Beware that depending on the requested operations, Vcsn may generate and compile code, which may be a really slow process on small machines (about a minute): be patient! However, the code is compiled only once: successive uses will be way faster.

To run all the cells anew, select "Restart & Run All" in the "Kernel" menu above. $\newcommand{\Ed}{\mathsf{E}} \newcommand{\Fd}{\mathsf{F}} \newcommand{\coloneqq}{\mathrel{\vcenter{:}}=}$

In [1]:
import vcsn
vcsn.setenv(SEED=1)
vcsn.table = vcsn.ipython.table
# There will be examples with benchmarks, tell the test suite to ignore the durations.
# global ignore: \d+ms

Introduction: Motivating Examples

First we introduce the context we are interested in: labels are letter-or-empty-word (lan), two tapes (lat), weights are in the tropical semiring $\mathbb{Z}_{\min} := \langle \mathbb{Z}, \min, +, \infty, 0 \rangle$.

In [2]:
zmin2 = vcsn.context('lat<lan, lan>, zmin')
zmin2
Out[2]:
$(\{\ldots\})^? \times (\{\ldots\})^?\to\mathbb{Z}_{\text{min}}$

The following examples show how the multiplication and addition of $\mathbb{Z}_{\min}$ behave:

In [3]:
zmin2.weight('1') * zmin2.weight('3')
Out[3]:
$4$
In [4]:
zmin2.weight('1') + zmin2.weight('3')
Out[4]:
$1$

The first automaton, $\mathcal{A}$, is obtained from the following rational expression (the empty word is noted \e in Vcsn, and rendered $\varepsilon$ even as an expression, whereas in the paper it is written $\mathsf{1}$).

In [5]:
zmin2.expression(r'(a|a+b|b+⟨1⟩(\e|(a+b)+(a+b)|\e))∗')
Out[5]:
$\left(a|a + b|b + \left\langle 1 \right\rangle \,\left( \left. \varepsilon \middle| \left(a + b\right) \right. + \left. \left(a + b\right) \middle| \varepsilon \right. \right)\right)^{*}$

Or, using syntactic sugar ($a \coloneqq a|a$, $[ab] \coloneqq a+b$):

In [6]:
e1 = zmin2.expression(r'([ab] + ⟨1⟩(\e|[ab] + [ab]|\e))∗')
e1
Out[6]:
$\left(a|a + b|b + \left\langle 1 \right\rangle \,\left( \left. \varepsilon \middle| \left(a + b\right) \right. + \left. \left(a + b\right) \middle| \varepsilon \right. \right)\right)^{*}$
In [7]:
a1 = e1.derived_term()
a1
Out[7]:
%3 I0 0 (a|a+b|b+⟨1⟩(ε|(a+b)+(a+b)|ε))* I0->0 ⟨0⟩ F0 0->F0 ⟨0⟩ 0->0 ⟨1⟩[ε|a-a|εb|ε], ⟨0⟩a|a, ⟨0⟩b|b

Admittedly, the display of multitape labels on transition could be improved. The rather cryptic [ε|a-a|εb|ε] denotes ε|a+ε|b+a|ε+b|ε. This will be addressed in future versions of Vcsn.

Ng's automaton is:

In [8]:
e2 = zmin2.expression('[ab]∗ (⟨1⟩(\e|[ab] + [ab]|\e) + ⟨2⟩(a|b + b|a))∗')
e2
Out[8]:
$\left(a|a + b|b\right)^{*} \, \left( \left\langle 2 \right\rangle \,\left(a|b + b|a\right) + \left\langle 1 \right\rangle \,\left( \left. \varepsilon \middle| \left(a + b\right) \right. + \left. \left(a + b\right) \middle| \varepsilon \right. \right)\right)^{*}$
In [9]:
e2.derived_term()
Out[9]:
%3 I0 0 (a|a+b|b)*(⟨2⟩(a|b+b|a)+⟨1⟩(ε|(a+b)+(a+b)|ε))* I0->0 ⟨0⟩ F0 F1 0->F0 ⟨0⟩ 0->0 ⟨0⟩a|a, ⟨0⟩b|b 1 (⟨2⟩(a|b+b|a)+⟨1⟩(ε|(a+b)+(a+b)|ε))* 0->1 ⟨1⟩[ε|a-a|εb|ε], ⟨2⟩a|b, ⟨2⟩b|a 1->F1 ⟨0⟩ 1->1 ⟨1⟩[ε|a-a|εb|ε], ⟨2⟩a|b, ⟨2⟩b|a

Or, cleaned from state decorations:

In [10]:
a2 = e2.automaton('expansion')
a2
Out[10]:
%3 I0 0 0 I0->0 ⟨0⟩ F0 F1 0->F0 ⟨0⟩ 0->0 ⟨0⟩a|a, ⟨0⟩b|b 1 1 0->1 ⟨1⟩[ε|a-a|εb|ε], ⟨2⟩a|b, ⟨2⟩b|a 1->F1 ⟨0⟩ 1->1 ⟨1⟩[ε|a-a|εb|ε], ⟨2⟩a|b, ⟨2⟩b|a

Mohri factors his automaton $\mathcal{A}$ in the composition of $\mathcal{A}_1$ and $\mathcal{A}_2$:

In [11]:
f1 = zmin2.expression('([ab] + ⟨1⟩(\e|i + [ab]|s))∗')
f2 = zmin2.expression('([ab] + i|[ab] + s|\e)∗')
f = f1.compose(f2)
vcsn.table([["Name", "Expression", "Automaton"],
            ["$\mathcal{A}_1$", f1, f1.derived_term()],
            ["$\mathcal{A}_2$", f2, f2.derived_term()],
            ["$\mathcal{A}$",   f,  f.derived_term()]])
Out[11]:
Name Expression Automaton
$\mathcal{A}_1$ $\left(a|a + b|b + \left\langle 1 \right\rangle \,\left(\varepsilon|i + \left. \left(a + b\right) \middle| s \right. \right)\right)^{*}$ %3 I0 0 (a|a+b|b+⟨1⟩(ε|i+(a+b)|s))* I0->0 ⟨0⟩ F0 0->F0 ⟨0⟩ 0->0 ⟨1⟩[ε|ia|sb|s], ⟨0⟩a|a, ⟨0⟩b|b
$\mathcal{A}_2$ $\left(a|a + b|b + s|\varepsilon + \left. i \middle| \left(a + b\right) \right. \right)^{*}$ %3 I0 0 (a|a+b|b+s|ε+i|(a+b))* I0->0 ⟨0⟩ F0 0->F0 ⟨0⟩ 0->0 ⟨0⟩[a|ab|bi|ai|bs|ε]
$\mathcal{A}$ $\left(a|a + b|b + \left\langle 1 \right\rangle \,\left(\varepsilon|i + \left. \left(a + b\right) \middle| s \right. \right)\right)^{*}@\left(a|a + b|b + s|\varepsilon + \left. i \middle| \left(a + b\right) \right. \right)^{*}$ %3 I0 0 (a|a+b|b+⟨1⟩(ε|i+(a+b)|s))*@(a|a+b|b+s|ε+i|(a+b))* I0->0 ⟨0⟩ F0 0->F0 ⟨0⟩ 0->0 ⟨1⟩[ε|aε|ba|εb|ε], ⟨0⟩a|a, ⟨0⟩b|b

The resulting automaton is exactly the automaton $\mathcal{A}$.

Section 2.2: Weighted Rational Expressions

The set of operators supported by Vcsn is quite large, see the documentation for details. We will show a few simple examples.

We start with a simple example of a single tape expression whose labels are letters-or-empty-word (lan), and weights are rational numbers. We first introduce what in Vcsn is called a context which is simply an object to type automata, expressions, expansions, etc.

In [12]:
q = vcsn.context('lan, q')
q
Out[12]:
$(\{\ldots\})^?\to\mathbb{Q}$

From this context, we can build (typed) expressions:

In [13]:
e = q.expression('(<1/6>a*+<1/3>b*)*')
e
Out[13]:
$\left( \left\langle \frac{1}{6} \right\rangle \,{a}^{*} + \left\langle \frac{1}{3} \right\rangle \,{b}^{*}\right)^{*}$

Let's introduce qexp as a shorthand to build an expression from a string.

In [14]:
qexp = q.expression
qexp('(<1/6>a*+<1/3>b*)*')
Out[14]:
$\left( \left\langle \frac{1}{6} \right\rangle \,{a}^{*} + \left\langle \frac{1}{3} \right\rangle \,{b}^{*}\right)^{*}$

The usual ${}^+$ quantifier ($\Ed^+ \coloneqq \Ed\Ed^*$) is written {+}, to disambiguate from the binary operator ($\Ed + \Fd$):

In [15]:
qexp('a{+}+b{+}')
Out[15]:
$a \, {a}^{*} + b \, {b}^{*}$

The following examples show the trivial identities in action.

In [16]:
def example(*es):
    import pandas as pd
    pd.options.display.max_colwidth = 0
    ids = ['none', 'trivial']
    res = [[e] + ['${:x}$'.format(qexp(e, id)) for id in ids]
           for e in es]
    return pd.DataFrame(res, columns=['.' + ' ' * 20 + id.title() for id in ['input'] + ids])
example('\z+a', 'a+a', '\za', '\ea', '(\e\za+\z)*')
Out[16]:
.                    Input .                    None .                    Trivial
0 \z+a $\emptyset + a$ $a$
1 a+a $a + a$ $a + a$
2 \za $\emptyset \, a$ $\emptyset$
3 \ea $\varepsilon \, a$ $a$
4 (\e\za+\z)* $\left(\varepsilon \, \left(\emptyset \, a\right) + \emptyset\right)^{*}$ $\varepsilon$

Vcsn also provides Python operators to assemble expressions.

In [17]:
('1/6' * qexp('a').star() + '1/3' * qexp('b').star()).star()
Out[17]:
$\left( \left\langle \frac{1}{6} \right\rangle \,{a}^{*} + \left\langle \frac{1}{3} \right\rangle \,{b}^{*}\right)^{*}$

The Python operator | builds a multitape expression from simple tape ones, and the Python operator @ denotes composition. As a matter of fact, the reason why composition is denoted $@$ in Vcsn is because that is the only remaining operator available in Python. Besides, more beautiful symbols, such as $\circ$ for instance, are technical to type.

In [18]:
qexp('a*') | qexp('b*')
Out[18]:
$ \left. {a}^{*} \middle| {b}^{*} \right. $
In [19]:
(qexp('a*') | qexp('b')) @ (qexp('b') | qexp('c*'))
Out[19]:
$ \left. {a}^{*} \middle| b \right. @ \left. b \middle| {c}^{*} \right. $

Section 3: Rational Expansions

We start with a simple example of a single tape expression whose labels are letters-or-empty-word (lan), and weights are rational numbers.

In [20]:
e = qexp('a+<2>bc*')
e
Out[20]:
$a + \left\langle 2 \right\rangle \,\left(b \, {c}^{*}\right)$

Its expansion is:

In [21]:
e.expansion()
Out[21]:
$a \odot \left[\varepsilon\right] \oplus b \odot \left[\left\langle 2\right\rangle {c}^{*}\right]$

And its derived-term automaton (contrary to the paper, in Vcsn the empty expression is denoted $\varepsilon$ instead of $\mathsf{1}$):

In [22]:
e.derived_term()
Out[22]:
%3 I0 0 a+⟨2⟩(bc*) I0->0 F1 F2 1 ε 0->1 a 2 c* 0->2 ⟨2⟩b 1->F1 2->F2 2->2 c

which is smaller than its standard automaton:

In [23]:
e.standard()
Out[23]:
%3 I0 0 0 I0->0 F1 F3 F5 1 1 0->1 a 3 3 0->3 ⟨2⟩b 1->F1 3->F3 5 5 3->5 c 5->F5 5->5 c

The following example demonstrates how successors of a state are computed. It uses the shuffle operator, denoted :, for illustration.

In [24]:
qexp('ab:xy').derived_term()
Out[24]:
%3 I0 0 ab:xy I0->0 F6 1 b:xy 0->1 a 2 ab:y 0->2 x 3 b:y 1->3 x 8 xy 1->8 b 2->3 a 4 ab 2->4 y 5 b 3->5 y 7 y 3->7 b 4->5 a 6 ε 5->6 b 6->F6 7->6 y 8->7 x

Examples 1 to 4: A Simple Multitape Expression (also Figure 1)

Labels are pairs of letter-or-empty-word (lat<lan, lan>), and weights are rational numbers (q).

In [25]:
q2 = vcsn.context('lat<lan(abcde), lan(xy)>, q')
q2
Out[25]:
$(\{a, b, c, d, e\})^? \times (\{x, y\})^?\to\mathbb{Q}$

The expression $\mathsf{E}_1$ is:

In [26]:
e1 = q2.expression('⟨4⟩ade*|x + ⟨3⟩bde*|x + ⟨2⟩ace*|xy + ⟨6⟩bce*|xy')
e1
Out[26]:
$ \left. \left\langle 4 \right\rangle \,\left(a \, d \, {e}^{*}\right) \middle| x \right. + \left. \left\langle 3 \right\rangle \,\left(b \, d \, {e}^{*}\right) \middle| x \right. + \left. \left\langle 2 \right\rangle \,\left(a \, c \, {e}^{*}\right) \middle| x \, y \right. + \left. \left\langle 6 \right\rangle \,\left(b \, c \, {e}^{*}\right) \middle| x \, y \right. $

Its expansion is:

In [27]:
e1.expansion()
Out[27]:
$a|x \odot \left[\left\langle 2\right\rangle \left. c \, {e}^{*} \middle| y \right. \oplus \left\langle 4\right\rangle \left. d \, {e}^{*} \middle| \varepsilon \right. \right] \oplus b|x \odot \left[\left\langle 6\right\rangle \left. c \, {e}^{*} \middle| y \right. \oplus \left\langle 3\right\rangle \left. d \, {e}^{*} \middle| \varepsilon \right. \right]$

The derived-term automaton of $\mathsf{E}_1$, $\mathcal{A}_{\mathsf{E}_1}$, is:

In [28]:
a1 = e1.derived_term()
a1
Out[28]:
%3 I0 0 ⟨4⟩(ade*)|x+⟨3⟩(bde*)|x+⟨2⟩(ace*)|xy+⟨6⟩(bce*)|xy I0->0 F3 1 ce*|y 0->1 ⟨2⟩a|x, ⟨6⟩b|x 2 de*|ε 0->2 ⟨4⟩a|x, ⟨3⟩b|x 3 e*|ε 1->3 c|y 2->3 d|ε 3->F3 3->3 e|ε

The ten shortest accepted multitape words are:

In [29]:
a1.shortest(10)
Out[29]:
$\left\langle 2\right\rangle \mathit{ac}|\mathit{xy} \oplus \left\langle 4\right\rangle \mathit{ad}|\mathit{x} \oplus \left\langle 6\right\rangle \mathit{bc}|\mathit{xy} \oplus \left\langle 3\right\rangle \mathit{bd}|\mathit{x} \oplus \left\langle 2\right\rangle \mathit{ace}|\mathit{xy} \oplus \left\langle 4\right\rangle \mathit{ade}|\mathit{x} \oplus \left\langle 6\right\rangle \mathit{bce}|\mathit{xy} \oplus \left\langle 3\right\rangle \mathit{bde}|\mathit{x} \oplus \left\langle 2\right\rangle \mathit{acee}|\mathit{xy} \oplus \left\langle 4\right\rangle \mathit{adee}|\mathit{x}$

Example 5: An Exponential Number of States

We introduce a three-tape context. Unfortunately the automatic graphical rendering is poor.

In [30]:
q3 = vcsn.context('lat<lan, lan, lan>, q')
a3 = q3.expression('a*|b*|c*').derived_term()
a3
Out[30]:
%3 I0 0 a*|b*|c* I0->0 F0 F1 F2 F3 F4 F5 F6 0->F0 0->0 a|b|c 1 ε|ε|c* 0->1 ε|ε|c 2 ε|b*|ε 0->2 ε|b|ε 3 ε|b*|c* 0->3 ε|b|c 4 a*|ε|ε 0->4 a|ε|ε 5 a*|ε|c* 0->5 a|ε|c 6 a*|b*|ε 0->6 a|b|ε 1->F1 1->1 ε|ε|c 2->F2 2->2 ε|b|ε 3->F3 3->1 ε|ε|c 3->2 ε|b|ε 3->3 ε|b|c 4->F4 4->4 a|ε|ε 5->F5 5->1 ε|ε|c 5->4 a|ε|ε 5->5 a|ε|c 6->F6 6->2 ε|b|ε 6->4 a|ε|ε 6->6 a|b|ε

From multitape automata currently Vcsn extracts only simple-tape expressions over multitape generators (e.g. $(\varepsilon | a)^*$) instead of general mutitape expressions (e.g. $\varepsilon | a^*$):

In [31]:
a3.expression()
Out[31]:
$\left(a|b|c\right)^{*} \, \left(\varepsilon + \left(\varepsilon|\varepsilon|c\right) \, \left(\varepsilon|\varepsilon|c\right)^{*} + \left(\varepsilon|b|\varepsilon\right) \, \left(\varepsilon|b|\varepsilon\right)^{*} + \left(a|\varepsilon|\varepsilon\right) \, \left(a|\varepsilon|\varepsilon\right)^{*} + \left(\varepsilon|b|c\right) \, \left(\varepsilon|b|c\right)^{*} \, \left(\varepsilon + \left(\varepsilon|\varepsilon|c\right) \, \left(\varepsilon|\varepsilon|c\right)^{*} + \left(\varepsilon|b|\varepsilon\right) \, \left(\varepsilon|b|\varepsilon\right)^{*}\right) + \left(a|\varepsilon|c\right) \, \left(a|\varepsilon|c\right)^{*} \, \left(\varepsilon + \left(\varepsilon|\varepsilon|c\right) \, \left(\varepsilon|\varepsilon|c\right)^{*} + \left(a|\varepsilon|\varepsilon\right) \, \left(a|\varepsilon|\varepsilon\right)^{*}\right) + \left(a|b|\varepsilon\right) \, \left(a|b|\varepsilon\right)^{*} \, \left(\varepsilon + \left(\varepsilon|b|\varepsilon\right) \, \left(\varepsilon|b|\varepsilon\right)^{*} + \left(a|\varepsilon|\varepsilon\right) \, \left(a|\varepsilon|\varepsilon\right)^{*}\right)\right)$

In the case of the corresponding five-tape expression, the generated graphical rendering is so messy that it is totally useless (try it if you want!). So instead of displaying the automaton, we list its states.

In [32]:
q5 = vcsn.context('lat<lan(a), lan(b), lan(c), lan(d), lan(e)>, q')
e5 = q5.expression('a*|b*|c*|d*|e*')
e5
Out[32]:
$ \left. {a}^{*} \middle| {b}^{*} \middle| {c}^{*} \middle| {d}^{*} \middle| {e}^{*} \right. $
In [33]:
e5.expansion()
Out[33]:
$\left\langle 1\right\rangle \oplus \varepsilon|\varepsilon|\varepsilon|\varepsilon|e \odot \left[ \left. \varepsilon \middle| \varepsilon \middle| \varepsilon \middle| \varepsilon \middle| {e}^{*} \right. \right] \oplus \varepsilon|\varepsilon|\varepsilon|d|\varepsilon \odot \left[ \left. \varepsilon \middle| \varepsilon \middle| \varepsilon \middle| {d}^{*} \middle| \varepsilon \right. \right] \oplus \varepsilon|\varepsilon|\varepsilon|d|e \odot \left[ \left. \varepsilon \middle| \varepsilon \middle| \varepsilon \middle| {d}^{*} \middle| {e}^{*} \right. \right] \oplus \varepsilon|\varepsilon|c|\varepsilon|\varepsilon \odot \left[ \left. \varepsilon \middle| \varepsilon \middle| {c}^{*} \middle| \varepsilon \middle| \varepsilon \right. \right] \oplus \varepsilon|\varepsilon|c|\varepsilon|e \odot \left[ \left. \varepsilon \middle| \varepsilon \middle| {c}^{*} \middle| \varepsilon \middle| {e}^{*} \right. \right] \oplus \varepsilon|\varepsilon|c|d|\varepsilon \odot \left[ \left. \varepsilon \middle| \varepsilon \middle| {c}^{*} \middle| {d}^{*} \middle| \varepsilon \right. \right] \oplus \varepsilon|\varepsilon|c|d|e \odot \left[ \left. \varepsilon \middle| \varepsilon \middle| {c}^{*} \middle| {d}^{*} \middle| {e}^{*} \right. \right] \oplus \varepsilon|b|\varepsilon|\varepsilon|\varepsilon \odot \left[ \left. \varepsilon \middle| {b}^{*} \middle| \varepsilon \middle| \varepsilon \middle| \varepsilon \right. \right] \oplus \varepsilon|b|\varepsilon|\varepsilon|e \odot \left[ \left. \varepsilon \middle| {b}^{*} \middle| \varepsilon \middle| \varepsilon \middle| {e}^{*} \right. \right] \oplus \varepsilon|b|\varepsilon|d|\varepsilon \odot \left[ \left. \varepsilon \middle| {b}^{*} \middle| \varepsilon \middle| {d}^{*} \middle| \varepsilon \right. \right] \oplus \varepsilon|b|\varepsilon|d|e \odot \left[ \left. \varepsilon \middle| {b}^{*} \middle| \varepsilon \middle| {d}^{*} \middle| {e}^{*} \right. \right] \oplus \varepsilon|b|c|\varepsilon|\varepsilon \odot \left[ \left. \varepsilon \middle| {b}^{*} \middle| {c}^{*} \middle| \varepsilon \middle| \varepsilon \right. \right] \oplus \varepsilon|b|c|\varepsilon|e \odot \left[ \left. \varepsilon \middle| {b}^{*} \middle| {c}^{*} \middle| \varepsilon \middle| {e}^{*} \right. \right] \oplus \varepsilon|b|c|d|\varepsilon \odot \left[ \left. \varepsilon \middle| {b}^{*} \middle| {c}^{*} \middle| {d}^{*} \middle| \varepsilon \right. \right] \oplus \varepsilon|b|c|d|e \odot \left[ \left. \varepsilon \middle| {b}^{*} \middle| {c}^{*} \middle| {d}^{*} \middle| {e}^{*} \right. \right] \oplus a|\varepsilon|\varepsilon|\varepsilon|\varepsilon \odot \left[ \left. {a}^{*} \middle| \varepsilon \middle| \varepsilon \middle| \varepsilon \middle| \varepsilon \right. \right] \oplus a|\varepsilon|\varepsilon|\varepsilon|e \odot \left[ \left. {a}^{*} \middle| \varepsilon \middle| \varepsilon \middle| \varepsilon \middle| {e}^{*} \right. \right] \oplus a|\varepsilon|\varepsilon|d|\varepsilon \odot \left[ \left. {a}^{*} \middle| \varepsilon \middle| \varepsilon \middle| {d}^{*} \middle| \varepsilon \right. \right] \oplus a|\varepsilon|\varepsilon|d|e \odot \left[ \left. {a}^{*} \middle| \varepsilon \middle| \varepsilon \middle| {d}^{*} \middle| {e}^{*} \right. \right] \oplus a|\varepsilon|c|\varepsilon|\varepsilon \odot \left[ \left. {a}^{*} \middle| \varepsilon \middle| {c}^{*} \middle| \varepsilon \middle| \varepsilon \right. \right] \oplus a|\varepsilon|c|\varepsilon|e \odot \left[ \left. {a}^{*} \middle| \varepsilon \middle| {c}^{*} \middle| \varepsilon \middle| {e}^{*} \right. \right] \oplus a|\varepsilon|c|d|\varepsilon \odot \left[ \left. {a}^{*} \middle| \varepsilon \middle| {c}^{*} \middle| {d}^{*} \middle| \varepsilon \right. \right] \oplus a|\varepsilon|c|d|e \odot \left[ \left. {a}^{*} \middle| \varepsilon \middle| {c}^{*} \middle| {d}^{*} \middle| {e}^{*} \right. \right] \oplus a|b|\varepsilon|\varepsilon|\varepsilon \odot \left[ \left. {a}^{*} \middle| {b}^{*} \middle| \varepsilon \middle| \varepsilon \middle| \varepsilon \right. \right] \oplus a|b|\varepsilon|\varepsilon|e \odot \left[ \left. {a}^{*} \middle| {b}^{*} \middle| \varepsilon \middle| \varepsilon \middle| {e}^{*} \right. \right] \oplus a|b|\varepsilon|d|\varepsilon \odot \left[ \left. {a}^{*} \middle| {b}^{*} \middle| \varepsilon \middle| {d}^{*} \middle| \varepsilon \right. \right] \oplus a|b|\varepsilon|d|e \odot \left[ \left. {a}^{*} \middle| {b}^{*} \middle| \varepsilon \middle| {d}^{*} \middle| {e}^{*} \right. \right] \oplus a|b|c|\varepsilon|\varepsilon \odot \left[ \left. {a}^{*} \middle| {b}^{*} \middle| {c}^{*} \middle| \varepsilon \middle| \varepsilon \right. \right] \oplus a|b|c|\varepsilon|e \odot \left[ \left. {a}^{*} \middle| {b}^{*} \middle| {c}^{*} \middle| \varepsilon \middle| {e}^{*} \right. \right] \oplus a|b|c|d|\varepsilon \odot \left[ \left. {a}^{*} \middle| {b}^{*} \middle| {c}^{*} \middle| {d}^{*} \middle| \varepsilon \right. \right] \oplus a|b|c|d|e \odot \left[ \left. {a}^{*} \middle| {b}^{*} \middle| {c}^{*} \middle| {d}^{*} \middle| {e}^{*} \right. \right]$
In [34]:
a5 = e5.derived_term()
a5.states()
Out[34]:
['a*|b*|c*|d*|e*',
 'a*|b*|c*|d*|ε',
 'a*|b*|c*|ε|e*',
 'a*|b*|c*|ε|ε',
 'a*|b*|ε|d*|e*',
 'a*|b*|ε|d*|ε',
 'a*|b*|ε|ε|e*',
 'a*|b*|ε|ε|ε',
 'a*|ε|c*|d*|e*',
 'a*|ε|c*|d*|ε',
 'a*|ε|c*|ε|e*',
 'a*|ε|c*|ε|ε',
 'a*|ε|ε|d*|e*',
 'a*|ε|ε|d*|ε',
 'a*|ε|ε|ε|e*',
 'a*|ε|ε|ε|ε',
 'ε|b*|c*|d*|e*',
 'ε|b*|c*|d*|ε',
 'ε|b*|c*|ε|e*',
 'ε|b*|c*|ε|ε',
 'ε|b*|ε|d*|e*',
 'ε|b*|ε|d*|ε',
 'ε|b*|ε|ε|e*',
 'ε|b*|ε|ε|ε',
 'ε|ε|c*|d*|e*',
 'ε|ε|c*|d*|ε',
 'ε|ε|c*|ε|e*',
 'ε|ε|c*|ε|ε',
 'ε|ε|ε|d*|e*',
 'ε|ε|ε|d*|ε',
 'ε|ε|ε|ε|e*']

Example 6: A Sed-like Substitution

In [35]:
e2 = q2.expression('(a{+}|x + b{+}|y)*')
e2
Out[35]:
$\left( \left. a \, {a}^{*} \middle| x \right. + \left. b \, {b}^{*} \middle| y \right. \right)^{*}$
In [36]:
e2.expansion()
Out[36]:
$\left\langle 1\right\rangle \oplus a|x \odot \left[\left( \left. {a}^{*} \middle| \varepsilon \right. \right) \, \left( \left. a \, {a}^{*} \middle| x \right. + \left. b \, {b}^{*} \middle| y \right. \right)^{*}\right] \oplus b|y \odot \left[\left( \left. {b}^{*} \middle| \varepsilon \right. \right) \, \left( \left. a \, {a}^{*} \middle| x \right. + \left. b \, {b}^{*} \middle| y \right. \right)^{*}\right]$
In [37]:
a2 = e2.derived_term()
a2
Out[37]:
%3 I0 0 (aa*|x+bb*|y)* I0->0 F0 F1 F2 0->F0 1 (a*|ε)(aa*|x+bb*|y)* 0->1 a|x 2 (b*|ε)(aa*|x+bb*|y)* 0->2 b|y 1->F1 1->1 a|ε, a|x 1->2 b|y 2->F2 2->1 a|x 2->2 b|ε, b|y

Again, the extracted expression is less readable than the one we started from.

In [38]:
a2.expression()
Out[38]:
$\varepsilon + \left(a|x\right) \, \left(a|\varepsilon + a|x\right)^{*} + \left(b|y + \left(a|x\right) \, \left(a|\varepsilon + a|x\right)^{*} \, \left(b|y\right)\right) \, \left(b|\varepsilon + b|y + \left(a|x\right) \, \left(a|\varepsilon + a|x\right)^{*} \, \left(b|y\right)\right)^{*} \, \left(\varepsilon + \left(a|x\right) \, \left(a|\varepsilon + a|x\right)^{*}\right)$

A More Complex Expression

The previous examples often looked like sed-like substitutions, in the sense that the first tape was often a composite expression, but the second tape a simple label. There is no such limitation.

In [39]:
e = q2.expression('(<2>[ab])* | (<3>[xy])*')
e
Out[39]:
$ \left. \left( \left\langle 2 \right\rangle \,\left(a + b\right)\right)^{*} \middle| \left( \left\langle 3 \right\rangle \,\left(x + y\right)\right)^{*} \right. $
In [40]:
a = e.derived_term()
a
Out[40]:
%3 I0 0 (⟨2⟩(a+b))*|(⟨3⟩(x+y))* I0->0 F0 F1 F2 0->F0 0->0 ⟨6⟩[a|xa|yb|xb|y] 1 ε|(⟨3⟩(x+y))* 0->1 ⟨3⟩ε|x, ⟨3⟩ε|y 2 (⟨2⟩(a+b))*|ε 0->2 ⟨2⟩a|ε, ⟨2⟩b|ε 1->F1 1->1 ⟨3⟩ε|x, ⟨3⟩ε|y 2->F2 2->2 ⟨2⟩a|ε, ⟨2⟩b|ε
In [41]:
a.shortest(20)
Out[41]:
$\varepsilon|\varepsilon \oplus \left\langle 3\right\rangle \varepsilon|\mathit{x} \oplus \left\langle 3\right\rangle \varepsilon|\mathit{y} \oplus \left\langle 2\right\rangle \mathit{a}|\varepsilon \oplus \left\langle 6\right\rangle \mathit{a}|\mathit{x} \oplus \left\langle 6\right\rangle \mathit{a}|\mathit{y} \oplus \left\langle 2\right\rangle \mathit{b}|\varepsilon \oplus \left\langle 6\right\rangle \mathit{b}|\mathit{x} \oplus \left\langle 6\right\rangle \mathit{b}|\mathit{y} \oplus \left\langle 9\right\rangle \varepsilon|\mathit{xx} \oplus \left\langle 9\right\rangle \varepsilon|\mathit{xy} \oplus \left\langle 9\right\rangle \varepsilon|\mathit{yx} \oplus \left\langle 9\right\rangle \varepsilon|\mathit{yy} \oplus \left\langle 18\right\rangle \mathit{a}|\mathit{xx} \oplus \left\langle 18\right\rangle \mathit{a}|\mathit{xy} \oplus \left\langle 18\right\rangle \mathit{a}|\mathit{yx} \oplus \left\langle 18\right\rangle \mathit{a}|\mathit{yy} \oplus \left\langle 18\right\rangle \mathit{b}|\mathit{xx} \oplus \left\langle 18\right\rangle \mathit{b}|\mathit{xy} \oplus \left\langle 18\right\rangle \mathit{b}|\mathit{yx}$

Section 5.3: Derived-Term Automaton with Composition

In [42]:
f1 = zmin2.expression('([ab] + ⟨1⟩(\e|I + [ab]|S))∗')
f2 = zmin2.expression('([ab] + I|[ab] + S|\e)∗')
f = f1 @ f2
f.derived_term()
Out[42]:
%3 I0 0 (a|a+b|b+⟨1⟩(ε|I+(a+b)|S))*@(S|ε+a|a+b|b+I|(a+b))* I0->0 ⟨0⟩ F0 0->F0 ⟨0⟩ 0->0 ⟨1⟩[ε|aε|ba|εb|ε], ⟨0⟩a|a, ⟨0⟩b|b

Here we show, as will be discussed in Section 6.2, the role played by identities. Here, they are fully disabled (via the 'none' argument).

In [43]:
f1 = zmin2.expression('([ab] + ⟨1⟩(\e|I + [ab]|S))∗', 'none')
f2 = zmin2.expression('([ab] + I|[ab] + S|\e)∗', 'none')
f = f1 @ f2
f.derived_term()
Out[43]:
%3 I0 0 ((a|a+b|b)+⟨1⟩(ε|I+(a+b)|S))*@(((a|a+b|b)+I|(a+b))+S|ε)* I0->0 ⟨0⟩ F0 F1 0->F0 ⟨0⟩ 1 ε((a|a+b|b)+⟨1⟩(ε|I+(a+b)|S))*@ε(((a|a+b|b)+I|(a+b))+S|ε)* 0->1 ⟨1⟩[ε|aε|ba|εb|ε], ⟨0⟩a|a, ⟨0⟩b|b 1->F1 ⟨0⟩ 1->1 ⟨1⟩[ε|aε|ba|εb|ε], ⟨0⟩a|a, ⟨0⟩b|b

Then we show an example with the endmarker (as discussed in Section 6.4).

In [44]:
f1 = zmin2.expression('([ab] + ⟨1⟩(\e|I + [ab]|S))∗   ($|$)')
f2 = zmin2.expression('([ab] + S|[ab] + S|\e)∗       ($|$)')
f = f1 @ f2
f.derived_term()
Out[44]:
%3 I0 0 (a|a+b|b+⟨1⟩(ε|I+(a+b)|S))*($|$)@(S|ε+a|a+b|b+S|(a+b))*($|$) I0->0 ⟨0⟩ F1 0->0 ⟨1⟩[a|εa|bb|εb|a], ⟨0⟩a|a, ⟨0⟩b|b 1 ε 0->1 ⟨0⟩$|$ 1->F1 ⟨0⟩

Example 8

In this example we used symbolic weights, $k, h$. To simulate this we introduce expressions whose weights are expressions (whose weights are in $\mathbb{Q}$):

In [45]:
eset2 = vcsn.context('lat<lan(ab), lan(ab)>, expressionset<lan(kh), q>')
eset2
Out[45]:
$(\{a, b\})^? \times (\{a, b\})^?\to\mathsf{RatE}[(\{h, k\})^?\to\mathbb{Q}]$

Then we build and display the expression, its expansion, and its derived-term automaton.

In [46]:
e = eset2.expression('(⟨k⟩\e|a)∗ @ (⟨h⟩aa|\e)∗', 'trivial')
vcsn.display(e, e.expansion(), e.derived_term())
$\left( \left. \left\langle k \right\rangle \,\varepsilon \middle| a \right. \right)^{*}@\left( \left. \left\langle h \right\rangle \,a \, a \middle| \varepsilon \right. \right)^{*}$
$\left\langle \varepsilon\right\rangle \oplus \varepsilon|\varepsilon \odot \left[\left\langle k \, h\right\rangle \left( \left. \left\langle k \right\rangle \,\varepsilon \middle| a \right. \right)^{*}@\left(a|\varepsilon\right) \, \left( \left. \left\langle h \right\rangle \,a \, a \middle| \varepsilon \right. \right)^{*}\right]$
%3 I0 0 (⟨k⟩ε|a)*@(⟨h⟩aa|ε)* I0->0 F0 0->F0 1 (⟨k⟩ε|a)*@(a|ε)(⟨h⟩aa|ε)* 0->1 ⟨kh⟩ε|ε 1->0 ⟨k⟩ε|ε
In [47]:
vcsn.setenv(OLDWAY=0, DENORM=1)
e = eset2.expression('(⟨k⟩\e|a)∗ @ (⟨h⟩aa|\e)∗', 'trivial')
vcsn.display(e, e.expansion(), e.derived_term())
$\left( \left. \left\langle k \right\rangle \,\varepsilon \middle| a \right. \right)^{*}@\left( \left. \left\langle h \right\rangle \,a \, a \middle| \varepsilon \right. \right)^{*}$
$\left\langle \varepsilon\right\rangle \oplus \varepsilon|\varepsilon \odot \left[\left\langle k\right\rangle \left(\varepsilon|a\right) \, \left( \left. \left\langle k \right\rangle \,\varepsilon \middle| a \right. \right)^{*}@\varepsilon \oplus \left\langle h\right\rangle \varepsilon@\left(a|\varepsilon\right) \, \left(\left(a|\varepsilon\right) \, \left( \left. \left\langle h \right\rangle \,a \, a \middle| \varepsilon \right. \right)^{*}\right) \oplus \left\langle k \, h\right\rangle \left( \left. \left\langle k \right\rangle \,\varepsilon \middle| a \right. \right)^{*}@\left(a|\varepsilon\right) \, \left( \left. \left\langle h \right\rangle \,a \, a \middle| \varepsilon \right. \right)^{*}\right]$
%3 I0 0 (⟨k⟩ε|a)*@(⟨h⟩aa|ε)* I0->0 F0 0->F0 1 (ε|a)(⟨k⟩ε|a)*@ε 0->1 ⟨k⟩ε|ε 2 ε@(a|ε)((a|ε)(⟨h⟩aa|ε)*) 0->2 ⟨h⟩ε|ε 3 (⟨k⟩ε|a)*@(a|ε)(⟨h⟩aa|ε)* 0->3 ⟨kh⟩ε|ε 1->1 ε|ε 2->2 ε|ε 3->0 ⟨k⟩ε|ε 4 ε@(a|ε)(⟨h⟩aa|ε)* 3->4 ε|ε 4->4 ε|ε

Section 6.1: Discussion on the Validity of Automata

The following function takes a rational expression, and generates derived-term automata from denormalized expansions (simpler equation, but generates many spontaneous transitions), and from "normal" expansions. The former are often simpler, but may generate invalid automata.

In [48]:
vcsn.setenv(DENORM=0, OLDWAY=0)
import vcsn
def compare(e, name=None, ctx='lan, q'):
    if not isinstance(e, vcsn.expression):
        e = vcsn.context(ctx).expression(e, identities='trivial')
    if name:
        e_named = e.name(name)
    else:
        e_named = e
    a_norm = e_named.derived_term()
    vcsn.setenv(NAIVE_MUL=1, NAIVE_STAR=1)
    a_denorm = e_named.derived_term()
    vcsn.setenv(NAIVE_MUL=0, NAIVE_STAR=0)
    return vcsn.table([['Expression', 'Denormalized', 'Normal'], [e, a_denorm, a_norm]])
compare('a*b*c*')
Out[48]:
Expression Denormalized Normal
${a}^{*} \, \left({b}^{*} \, {c}^{*}\right)$ %3 I0 0 a*(b*c*) I0->0 F2 0->0 a 1 b*c* 0->1 ε 1->1 b 2 c* 1->2 ε 2->F2 2->2 c %3 I0 0 a*(b*c*) I0->0 F0 F1 F2 0->F0 0->0 a 1 b*c* 0->1 b 2 c* 0->2 c 1->F1 1->1 b 1->2 c 2->F2 2->2 c
In [49]:
compare('(<1/2>\e)*')
Out[49]:
Expression Denormalized Normal
$\left( \left\langle \frac{1}{2} \right\rangle \,\varepsilon\right)^{*}$ %3 I0 0 (⟨1/2⟩ε)* I0->0 F0 0->F0 0->0 ⟨1/2⟩ε %3 I0 0 (⟨1/2⟩ε)* I0->0 F0 0->F0 ⟨2⟩
In [50]:
compare('(a*b* + <-1>\e)*', name='E')
Out[50]:
Expression Denormalized Normal
$\left({a}^{*} \, {b}^{*} + \left\langle -1 \right\rangle \,\varepsilon\right)^{*}$ %3 I0 0 E I0->0 F0 0->F0 0->0 ⟨-1⟩ε 1 b*E 0->1 ε 2 (a*b*)E 0->2 a 1->0 ε 1->1 b 2->1 ε 2->2 a %3 I0 0 E I0->0 F0 F1 F2 0->F0 1 (a*b*)E 0->1 a 2 b*E 0->2 b 1->F1 1->1 ⟨2⟩a 1->2 ⟨2⟩b 2->F2 2->1 a 2->2 ⟨2⟩b
In [51]:
compare('((<1/2>\e)* + <-2>\e)*', name='E')
Out[51]:
Expression Denormalized Normal
$\left(\left( \left\langle \frac{1}{2} \right\rangle \,\varepsilon\right)^{*} + \left\langle -2 \right\rangle \,\varepsilon\right)^{*}$ %3 I0 0 E I0->0 F0 0->F0 0->0 ⟨-1⟩ε 1 (⟨1/2⟩ε)*E 0->1 ⟨1/2⟩ε 1->0 ε 1->1 ⟨1/2⟩ε %3 I0 0 E I0->0 F0 0->F0
In [52]:
compare('((\e|ab @ ab|\e) + <-1>\e)*', ctx='lat<lan, lan>, q', name='E')
Out[52]:
Expression Denormalized Normal
$\left( \left. \varepsilon \middle| a \, b \right. @ \left. a \, b \middle| \varepsilon \right. + \left\langle -1 \right\rangle \,\varepsilon\right)^{*}$ %3 I0 0 E I0->0 F0 0->F0 0->0 ⟨-1⟩ε|ε 1 (ε|b@b|ε)E 0->1 ε|ε 1->0 ε|ε %3 I0 0 E I0->0 F0 F1 0->F0 0->0 ⟨-1⟩ε|ε 1 (ε|b@b|ε)E 0->1 ε|ε 1->F1 1->0 ⟨-1⟩ε|ε 1->1 ε|ε

Section 6.2: Discussion on Identities

In [53]:
vcsn.setenv(DENORM=0, OLDWAY=0)
def dterm(e):
    '''The derived-term of `e`, or the first line of the error if
    it fails.'''
    try:
        return e.derived_term()
    except RuntimeError as e:
        return 'Error: ' + str(e).split('\n')[0]
def compare(e, ctx='lan, q'):
    exp = vcsn.context(ctx).expression
    return vcsn.table([['No identities: Expression', 'No identities: Automaton'],
                       [exp(e, 'none'), dterm(exp(e, 'none'))],
                       ['Trivial identities: Expression', 'Trivial identities: Automaton'],
                       [exp(e, 'trivial'), dterm(exp(e, 'trivial'))]])
compare('a*')
Out[53]:
No identities: Expression No identities: Automaton
${a}^{*}$ %3 I0 0 a* I0->0 F0 F1 0->F0 1 εa* 0->1 a 1->F1 1->1 a
Trivial identities: Expression Trivial identities: Automaton
${a}^{*}$ %3 I0 0 a* I0->0 F0 0->F0 0->0 a
In [54]:
compare('a + ⟨0⟩\e* + \z\e*')
Out[54]:
No identities: Expression No identities: Automaton
$\left(a + \left\langle 0 \right\rangle \,{\varepsilon}^{*}\right) + \emptyset \, {\varepsilon}^{*}$ Error: Q: value is not starrable: 1
Trivial identities: Expression Trivial identities: Automaton
$a$ %3 I0 0 a I0->0 F1 1 ε 0->1 a 1->F1
In [55]:
compare('abc\z')
Out[55]:
No identities: Expression No identities: Automaton
$a \, \left(b \, \left(c \, \emptyset\right)\right)$ %3 I0 0 a(b(c∅)) I0->0 1 ε(b(c∅)) 0->1 a 2 ε(c∅) 1->2 b 3 ε∅ 2->3 c
Trivial identities: Expression Trivial identities: Automaton
$\emptyset$ %3 I0 0 I0->0

Runtime Performances

We compare the quality of the generated automata (smaller is better) and the duration of the generation procedure (again, smaller is better). Vcsn feature two generation procedures for multitape expressions: derived-term, and "inductive".

"Inductive" is a simple recursive translation of the rational expression that applies the corresponding operation on the automata. For single-tape expressions, it uses the "standard" operations on automata, which are the ones used to implement the expression.standard algorithm. The comparison will be performed on twenty randomly generated expressions.

In [56]:
zmin2 = vcsn.context('lat<lan, lan>, zmin')

from vcsn.tools import _timeit
def timeit(fun):
    'Run `fun` a number of times, and return its fastest run in milliseconds.'
    return '{}ms'.format(round(_timeit(lambda: fun())[1]))

title = ['Rational Expression',
         'DT: transitions', 'Ind: transitions',
         'DT: states', 'Ind: states',
         'DT: duration', 'Ind: duration']

def compare(e):
    # Make sure this is an object expression, not just a string.
    if not isinstance(e, vcsn.expression):
        e = zmin2.expression(e)
    # Compute its derived-term automaton.
    a1 = e.automaton('expansion')
    t1 = timeit(lambda: e.automaton('expansion'))
    # Compute another automaton by mapping operators to one of their natural implementation.
    # In the case of single-tape automata, this yield the standard (aka Glushkov) automaton.
    a2 = e.automaton('inductive')
    t2 = timeit(lambda: e.automaton('inductive'))
    # Return their numbers of states.
    return [e,
            a1.info('number of transitions'), a2.info('number of transitions'), 
            a1.state_number(), a2.state_number(),
            t1, t2]
In [57]:
# ignore cell
zmin2 = vcsn.context('lat<lan(ab), lan(ab)>, zmin')
res = [title]
for i in range(20):
    e = zmin2.random_expression('+, ., *=.2, w., .w, w="min=0, max=3", |=.5, @', length=20)
    c = compare(e)
    res.append(c)
vcsn.table(res)
Out[57]:
Rational Expression DT: transitions Ind: transitions DT: states Ind: states DT: duration Ind: duration
$\left(\varepsilon|b\right) \, \left(\varepsilon|a\right) + \left. \left\langle 3 \right\rangle \,a \middle| a \right. $ 3 3 3 4 35ms 29ms
$\left(\varepsilon + \varepsilon|b + a|a\right) \, \left(b|\varepsilon\right)@ \left\langle 1 \right\rangle \,\left( \left. \left\langle 2 \right\rangle \,\left(a \, b\right) \middle| \left\langle 2 \right\rangle \,\left(\varepsilon + a\right) \right. \right)$ 0 5 0 6 45ms 85ms
$\left(b|b\right) \, \left(\left(\varepsilon|b\right) \, \left(\left(\varepsilon + \varepsilon|b\right)@\varepsilon|a\right)@\left(\varepsilon|a + b|a\right)\right)$ 0 3 0 4 29ms 70ms
$ \left\langle 6 \right\rangle \,\left( \left. \left( \left\langle 5 \right\rangle \,\left(b \, a\right)\right)^{*} \middle| \left(\varepsilon + {a}^{*}\right) \right. \right)$ 11 11 6 6 154ms 43ms
$ \left\langle 4 \right\rangle \,\left(\left(\varepsilon|b\right) \, \left( \left\langle 2 \right\rangle \,\varepsilon + \left. \left\langle 1 \right\rangle \,\left(a \, a\right) \middle| \left( \left\langle 1 \right\rangle \,\varepsilon + a\right) \right. \right)^{*}\right)$ 4 7 3 5 47ms 61ms
$ \left\langle 3 \right\rangle \,\left(\left(\varepsilon@\left(a|\varepsilon\right) \, \left(\varepsilon|a\right)\right) \, \left(\left( \left\langle 3 \right\rangle \,\left(a|\varepsilon\right) + b|b\right)@\left(\varepsilon + b|\varepsilon\right)\right)\right)$ 0 0 0 3 15ms 73ms
$ \left\langle 3 \right\rangle \,\left( \left\langle 1 \right\rangle \,\left( \left\langle 2 \right\rangle \,\varepsilon + a|b\right)@ \left. \left\langle 2 \right\rangle \,\varepsilon \middle| \left(\varepsilon + b\right) \right. \right)$ 1 1 2 2 36ms 49ms
$ \left. \left( \left\langle 6 \right\rangle \,\left(\varepsilon + a + b\right)\right)^{*} \middle| \left\langle 2 \right\rangle \,\left(a + \left( \left\langle 3 \right\rangle \,b\right)^{*}\right) \right. $ 16 27 5 9 150ms 58ms
$ \left\langle 2 \right\rangle \,\left( \left. b \middle| \left\langle 3 \right\rangle \,b \right. @\left(\varepsilon + b|b\right) \, \left(b|\varepsilon\right)\right)$ 1 2 2 3 35ms 52ms
$ \left\langle 5 \right\rangle \,\left( \left. \left(\varepsilon + \left\langle 8 \right\rangle \,b\right) \middle| \left\langle 4 \right\rangle \,\left(\varepsilon + a\right) \right. \right)$ 3 3 2 4 40ms 34ms
$ \left. \varepsilon \middle| \left\langle 5 \right\rangle \,\left(a \, b \, a\right) \right. $ 3 3 4 4 48ms 29ms
$ \left. \left\langle 3 \right\rangle \,\left( \left\langle 1 \right\rangle \,a + {\varepsilon}^{*}\right) \middle| \left(\varepsilon + b\right) \right. $ 3 3 2 4 39ms 30ms
$ \left\langle 6 \right\rangle \,\left( \left\langle 3 \right\rangle \,\left(b|a\right) + \left\langle 5 \right\rangle \,\left(\left(\varepsilon|a\right)^{2} \, \left(b|a\right)\right)\right)$ 4 4 4 5 29ms 39ms
$ \left\langle 3 \right\rangle \,\left( \left. \left( \left\langle 3 \right\rangle \,b\right)^{*} \middle| \varepsilon \right. @ \left. \left\langle 2 \right\rangle \,\varepsilon \middle| \left\langle 8 \right\rangle \,a \right. \right)$ 3 4 3 4 70ms 47ms
$ \left\langle 4 \right\rangle \,\left(b|\varepsilon\right) + \left\langle 8 \right\rangle \,\left(\left(a|b\right) \, \left(\varepsilon|b + a|\varepsilon\right)\right)$ 4 4 3 5 25ms 36ms
$\emptyset$ 0 0 0 1 12ms 2ms
$ \left. \left\langle 5 \right\rangle \,\varepsilon \middle| {b}^{*} \right. $ 2 2 2 2 49ms 15ms
$\varepsilon$ 0 0 1 1 17ms 3ms
$\varepsilon + a|b + \left\langle 7 \right\rangle \,\left(\varepsilon + \left(b|\varepsilon\right) \, \left(\varepsilon + \varepsilon|a\right)^{*}\right)$ 3 4 3 4 26ms 56ms
$ \left\langle 3 \right\rangle \,\left(\varepsilon|a + a|a + \left. \left\langle 1 \right\rangle \,\varepsilon \middle| \left\langle 1 \right\rangle \,a \right. \right)$ 2 3 2 4 42ms 39ms

Unfortunately, random expressions are rarely interesting (as can be seen by the number of transitions).

Now we compare the behavior of both algorithms on a familly of rational expressions: $n \mapsto [ab]^* (a|b)^n [ab]^*$.

In [58]:
res = [title]
for i in range(20):
    e = zmin2.expression('[ab]*  (a|b){{{}}}  [ab]*'.format(i))
    res.append(compare(e))
vcsn.table(res)
Out[58]:
Rational Expression DT: transitions Ind: transitions DT: states Ind: states DT: duration Ind: duration
${\left(a|a + b|b\right)^{*}}^{2}$ 6 16 2 5 39ms 46ms
$\left(a|a + b|b\right)^{*} \, \left(a|b\right) \, \left(a|a + b|b\right)^{*}$ 5 15 2 6 34ms 55ms
$\left(a|a + b|b\right)^{*} \, \left(a|b\right)^{2} \, \left(a|a + b|b\right)^{*}$ 6 16 3 7 37ms 67ms
$\left(a|a + b|b\right)^{*} \, \left(a|b\right)^{3} \, \left(a|a + b|b\right)^{*}$ 7 17 4 8 41ms 80ms
$\left(a|a + b|b\right)^{*} \, \left(a|b\right)^{4} \, \left(a|a + b|b\right)^{*}$ 8 18 5 9 46ms 94ms
$\left(a|a + b|b\right)^{*} \, \left(a|b\right)^{5} \, \left(a|a + b|b\right)^{*}$ 9 19 6 10 51ms 108ms
$\left(a|a + b|b\right)^{*} \, \left(a|b\right)^{6} \, \left(a|a + b|b\right)^{*}$ 10 20 7 11 55ms 116ms
$\left(a|a + b|b\right)^{*} \, \left(a|b\right)^{7} \, \left(a|a + b|b\right)^{*}$ 11 21 8 12 58ms 146ms
$\left(a|a + b|b\right)^{*} \, \left(a|b\right)^{8} \, \left(a|a + b|b\right)^{*}$ 12 22 9 13 64ms 150ms
$\left(a|a + b|b\right)^{*} \, \left(a|b\right)^{9} \, \left(a|a + b|b\right)^{*}$ 13 23 10 14 72ms 166ms
$\left(a|a + b|b\right)^{*} \, \left(a|b\right)^{10} \, \left(a|a + b|b\right)^{*}$ 14 24 11 15 74ms 193ms
$\left(a|a + b|b\right)^{*} \, \left(a|b\right)^{11} \, \left(a|a + b|b\right)^{*}$ 15 25 12 16 78ms 189ms
$\left(a|a + b|b\right)^{*} \, \left(a|b\right)^{12} \, \left(a|a + b|b\right)^{*}$ 16 26 13 17 86ms 200ms
$\left(a|a + b|b\right)^{*} \, \left(a|b\right)^{13} \, \left(a|a + b|b\right)^{*}$ 17 27 14 18 89ms 229ms
$\left(a|a + b|b\right)^{*} \, \left(a|b\right)^{14} \, \left(a|a + b|b\right)^{*}$ 18 28 15 19 91ms 250ms
$\left(a|a + b|b\right)^{*} \, \left(a|b\right)^{15} \, \left(a|a + b|b\right)^{*}$ 19 29 16 20 95ms 263ms
$\left(a|a + b|b\right)^{*} \, \left(a|b\right)^{16} \, \left(a|a + b|b\right)^{*}$ 20 30 17 21 101ms 278ms
$\left(a|a + b|b\right)^{*} \, \left(a|b\right)^{17} \, \left(a|a + b|b\right)^{*}$ 21 31 18 22 103ms 299ms
$\left(a|a + b|b\right)^{*} \, \left(a|b\right)^{18} \, \left(a|a + b|b\right)^{*}$ 22 32 19 23 110ms 328ms
$\left(a|a + b|b\right)^{*} \, \left(a|b\right)^{19} \, \left(a|a + b|b\right)^{*}$ 23 33 20 24 110ms 348ms

Section 6.6: Discussion on the Performances

In this section, we compare the size of automata generated either using the "standard automaton" construction (aka, Glushkov automata), or the expansion-based derived-term automaton as exposed in the paper.

In [59]:
def exp(e, ctx=None):
    '''Make an expression from `e`, using the context `ctx`
    if `e` is not already an expression.'''
    if isinstance(e, vcsn.expression):
        return e
    else:
        if not isinstance(ctx, vcsn.context):
            ctx = vcsn.context(ctx)
        return ctx.expression(e)

def compare_one(e):
    '''Compare the sizes of the derived-term, denormalized derived-term, and inductive
    automata from exprssion `e`.'''
    e = exp(e, ctx=zmin2)
    ind = e.inductive()
    dte = e.derived_term()
    vcsn.setenv(OLDWAY=1, DENORM=1)
    den = e.derived_term()
    vcsn.setenv(OLDWAY=0, DENORM=0)
    return [e, 
            dte.info('number of states'), dte.info('number of transitions'),
            den.info('number of states'), den.info('number of transitions'),
            ind.info('number of states'), ind.info('number of transitions')]

def compare(*es):
    '''Compare the sizes of the derived-term, denormalized derived-term, and inductive
    automata for each expression in `es`.'''
    res = [['Expression',
            'DTerm states', 'Dterm transitions',
            'Denorm DTerm states', 'Denorm Dterm transitions',
            'Inductive states', 'Inductive transitions']]
    for e in es:
        res.append(compare_one(e))
    return res

zmin = vcsn.context('lan, zmin')
zmin2 = zmin|zmin
eset2 = vcsn.context('lat<lan, lan>, expressionset<lan, q>')
q = vcsn.context('lan, q')
q3 = vcsn.context('lat<lan, lan, lan>, q')

res = compare(# Introduction
              exp('([ab] + ⟨1⟩(\e|[ab] + [ab]|\e))∗', zmin2),
              exp('[ab]∗  (⟨1⟩(\e|[ab] + [ab]|\e) + ⟨2⟩(a|b + b|a))∗', zmin2),
              exp('([ab] + ⟨1⟩(\e|I + [ab]|S))∗', zmin2),
              exp('([ab] + I|[ab] + S|\e)∗', zmin2),
              exp('(([ab] + ⟨1⟩(\e|I + [ab]|S))∗)  @  (([ab] + I|[ab] + S|\e)∗)', zmin2),
              exp('⟨4⟩ade*|x + ⟨3⟩bde*|x + ⟨2⟩ace*|xy + ⟨6⟩bce*|xy', zmin2),
              exp('a + ⟨2⟩bc∗', ctx=zmin), # Section 3.
              exp('a*|b*|c*', ctx=q3), # Example 4.
              exp('a*|b*|c*|d*|e*', ctx='lat<lan(a), lan(b), lan(c), lan(d), lan(e)>, q'), # Example 4
              exp('(a{+}|x + b{+}|y)*', zmin2),  # Example 5
              exp('(⟨k⟩\e|a)∗ @ (⟨h⟩aa|\e)∗', ctx=eset2), # Example 8, Section 5.3,
              exp('a*b*c*', ctx=q), # Section 6.1
              exp('(<1/2>\\e)*', ctx=q), # Section 6.1
              exp('(a*b* + <-1>\\e)*', ctx=q), # Section 6.1
              exp('((<1/2>\\e)* + <-2>\\e)*', ctx=q), # Section 6.1
#              exp('((\\e|ab @ ab|\\e) + <-1>\\e)*', ctx='lat<lan, lan>, q'), # Section 6.1, invalid automaton.
             )
t = vcsn.table(res)
t
Out[59]:
Expression DTerm states Dterm transitions Denorm DTerm states Denorm Dterm transitions Inductive states Inductive transitions
$\left(a|a + b|b + \left\langle 1 \right\rangle \,\left( \left. \varepsilon \middle| \left(a + b\right) \right. + \left. \left(a + b\right) \middle| \varepsilon \right. \right)\right)^{*}$ 1 6 1 6 7 42
$\left(a|a + b|b\right)^{*} \, \left( \left\langle 2 \right\rangle \,\left(a|b + b|a\right) + \left\langle 1 \right\rangle \,\left( \left. \varepsilon \middle| \left(a + b\right) \right. + \left. \left(a + b\right) \middle| \varepsilon \right. \right)\right)^{*}$ 2 14 2 14 9 60
$\left(a|a + b|b + \left\langle 1 \right\rangle \,\left(\varepsilon|I + \left. \left(a + b\right) \middle| S \right. \right)\right)^{*}$ 1 5 1 5 6 30
$\left(S|\varepsilon + a|a + b|b + \left. I \middle| \left(a + b\right) \right. \right)^{*}$ 1 5 1 5 6 30
$\left(a|a + b|b + \left\langle 1 \right\rangle \,\left(\varepsilon|I + \left. \left(a + b\right) \middle| S \right. \right)\right)^{*}@\left(S|\varepsilon + a|a + b|b + \left. I \middle| \left(a + b\right) \right. \right)^{*}$ 1 6 11 26 7 42
$ \left. \left\langle 4 \right\rangle \,\left(a \, d \, {e}^{*}\right) \middle| x \right. + \left. \left\langle 3 \right\rangle \,\left(b \, d \, {e}^{*}\right) \middle| x \right. + \left. \left\langle 2 \right\rangle \,\left(a \, c \, {e}^{*}\right) \middle| x \, y \right. + \left. \left\langle 6 \right\rangle \,\left(b \, c \, {e}^{*}\right) \middle| x \, y \right. $ 4 7 4 7 13 16
$a + \left\langle 2 \right\rangle \,\left(b \, {c}^{*}\right)$ 3 3 3 3 4 4
$ \left. {a}^{*} \middle| {b}^{*} \middle| {c}^{*} \right. $ 7 19 7 19 8 26
$ \left. {a}^{*} \middle| {b}^{*} \middle| {c}^{*} \middle| {d}^{*} \middle| {e}^{*} \right. $ 31 211 31 211 32 242
$\left( \left. a \, {a}^{*} \middle| x \right. + \left. b \, {b}^{*} \middle| y \right. \right)^{*}$ 3 8 3 8 5 14
$\left( \left. \left\langle k \right\rangle \,\varepsilon \middle| a \right. \right)^{*}@\left( \left. \left\langle h \right\rangle \,a \, a \middle| \varepsilon \right. \right)^{*}$ 2 2 5 8 3 3
${a}^{*} \, {b}^{*} \, {c}^{*}$ 3 6 3 6 4 9
$\left( \left\langle \frac{1}{2} \right\rangle \,\varepsilon\right)^{*}$ 1 0 1 0 1 0
$\left( \left\langle -1 \right\rangle \,\varepsilon + {a}^{*} \, {b}^{*}\right)^{*}$ 3 6 3 6 3 6
$\left( \left\langle -2 \right\rangle \,\varepsilon + \left( \left\langle \frac{1}{2} \right\rangle \,\varepsilon\right)^{*}\right)^{*}$ 1 0 1 0 1 0