Rational expressions, or expressions for short, denote (rational) languages in a compact way. Since Vcsn supports weighted expressions, they actually can denoted rational series.

This page documents the syntax and transformations (called identities) that are applied to every expression. The list of available algorithms using expression is in the Algorithms page.


The syntax for rational expressions is as follows (with increasing precedence):

  • \z, the empty expression.
  • \e, the empty word.
  • a, the letter a.
  • 'l', the label l (useful, for instance, when labels are words, or to denote a letter which is an operator: '+' denotes the + letter).
  • [abcd], letter class, equivalent to (a+b+c+d).
  • [a-d], one letter of the current alphabet between a and d. If the alphabet is $\{a, d, e\}$, [a-d] denotes [ad], not [abcd].
  • [^a-dz], one letter of the current alphabet that is not part of [a-dz].
  • [^], any letter of the current alphabet ("any letter other that none").
  • (e), e.
  • e+f, the addition (disjunction, union) of e and f (note the use of +, | is not accepted).
  • e&f, the conjunction (intersection) of e and f.
  • e:f, the shuffle product (interleaving) of e and f.
  • e&:f, the infiltration of e and f.
  • ef and e.f, the multiplication (concatenation) of e and f.
  • <k>e, the left exterior product (left-scalar product) of e by k.
  • e<k>, the right exterior product (right-scalar product) of e by k.
  • e* and e{*}, any number of repetitions of e (the Kleene closure of e).
  • e{n}, the power (repeated multiplication) of e n times: ee ... e.
  • e{n,m}, any repetition of e between n and m, i.e., the sum of the powers of e between n and m: e{n}+e{n+1}+ ... +e{m}.
  • e{n,}, the sum of powers of e at least n times: e{n}e*.
  • e{,m}, at most m repetitions of e: e{0,m}.
  • e{+}, at least one e: e{1,}.
  • e?, e{?}, e optional: e{0,1}.
  • e{c}, the complement of e.

where e and f denote expressions, a a label, k a weight, and n and m natural numbers.

Please note that contrary to "regexps" (as in grep, perl, etc.):

  • spaces are ignored
  • + denotes the choice, not |
  • . denotes the concatenation, use [^] to mean "any letter"


Some rewriting rules are applied on the expressions to "simplify" them. The strength of this simplification is graduated.

  • none: no identities at all. Some algorithms, such as derived_term, might not terminate.
  • trivial: the trivial identities only are applied.
  • associative: the associative identities are added.
  • linear: the linear identities are added. This is the default.
  • distributive: the distributive identities are added.

Trivial Identities

$$ \newcommand{\eword}{\varepsilon} \newcommand{\lmul}[2]{\bra{#1}{#2}} \newcommand{\rmul}[2]{#1\bra{#2}} \newcommand{\lmulq}[2]{\bra{#1}^?{#2}} \newcommand{\rmulq}[2]{#1\bra{#2}^?} \newcommand{\bra}[1]{\langle#1\rangle} \newcommand{\K}{\mathbb{K}} \newcommand{\zed}{\mathsf{0}} \newcommand{\und}{\mathsf{1}} \newcommand{\zeK}{0_{\K}} \newcommand{\unK}{1_{\K}} \newcommand{\Ed}{\mathsf{E}} \newcommand{\Fd}{\mathsf{F}} \newcommand{\Gd}{\mathsf{G}} \newcommand{\AND}{\mathbin{\&}} \begin{gather*} % \tag{add} \Ed+\zed \Rightarrow \Ed \quad \zed+\Ed \Rightarrow \Ed \\[.7ex] %\tag{kmul} \begin{aligned}[t] \lmul{\zeK}{\Ed} & \Rightarrow \zed & \lmul{\unK}{\Ed} & \Rightarrow \Ed & \lmul{k}{\zed} & \Rightarrow \zed & \lmul{k}{\lmul{h}{\Ed}} &\Rightarrow \lmul{kh}{\Ed} \\ \rmul{\Ed}{\zeK} & \Rightarrow \zed & \rmul{\Ed}{\unK} & \Rightarrow \Ed & \rmul{\zed}{k} & \Rightarrow \zed & \rmul{\rmul{\Ed}{k}}{h} &\Rightarrow \rmul{\Ed}{kh} \end{aligned}\\ \rmul{(\lmul{k}{\Ed})}{h} \Rightarrow \lmul{k}{(\rmul{\Ed}{h})} \quad \rmul{\ell}{k} \Rightarrow \lmul{k}{\ell} \\ %\tag{mul} \Ed \cdot \zed \Rightarrow \zed \quad \zed \cdot \Ed \Rightarrow \zed \\ (\lmulq{k}{\und}) \cdot \Ed \Rightarrow \lmul{k}{\Ed} \quad \Ed \cdot (\lmulq{k}{\und}) \Rightarrow \rmul{\Ed}{k} \\ %\tag{star} \zed^\star \Rightarrow \und \\ % \tag{and} \zed \AND \Ed \Rightarrow \zed \quad \Ed \AND \zed \Rightarrow \zed \\ \zed^c \AND \Ed \Rightarrow \Ed \quad \Ed \AND \zed^c \Rightarrow \Ed \\ (\lmulq{k}{\ell}) \AND (\lmulq{h}{\ell}) \Rightarrow \lmul{kh}{\ell} \quad (\lmulq{k}{\ell}) \AND (\lmulq{h}{\ell'}) \Rightarrow \zed \\ (\lmul{k}{\Ed})^{c} \Rightarrow \Ed^{c} \quad (\rmul{\Ed}{k})^{c} \Rightarrow \Ed^{c} \\ {\Ed^c}^c \Rightarrow \Ed \text{ if the weights are Boolean ($\mathbb{B}$ or $\mathbb{F}_2$)} \end{gather*} $$

where $\Ed$ stands for any rational expression, $a \in A$ is any letter, $\ell, \ell'\in A \cup \{\eword\}$ denote two different labels, $k, h\in \K$ are weights, and $\lmulq{k}{\ell}$ denotes either $\lmul{k}{\ell}$, or $\ell$ in which case $k = \unK$ in the right-hand side. Any subexpression of a form listed to the left of a '$\Rightarrow$' is rewritten as indicated on the right.

Associative Identities

In addition to the trivial identities, the binary operators (addition, conjunction, multiplication) are made associative. Actually, they become variadic instead of being strictly binary. $$ \begin{align} \Ed + (\Fd + \Gd) & \Rightarrow \Ed+\Fd+\Gd\\ \Ed(\Fd\Gd) & \Rightarrow \Ed\Fd\Gd\\ \Ed\AND(\Fd\AND\Gd) & \Rightarrow \Ed\AND\Fd\AND\Gd\\ \end{align} $$

Linear Identities

In addition to the associative identities, the addition is made commutative. Actually, members of sums are now sorted, and weights of equal terms are added. Some identities requires the weightset to be a commutative semiring (i.e., the product of scalars is commutative). $$ \begin{align} \Fd+\Ed & \Rightarrow \Ed+\Fd && \text{if $\Ed < \Fd$} \\ \lmul{k}{\Ed}+\lmul{h}{\Ed} & \Rightarrow \lmul{k+h}{\Ed}\\ \rmul{\Ed}{k} & \Rightarrow \lmul{k}{\Ed} && \text{if commutative} \\ \lmul{k}{\Ed}\lmul{h}{\Fd} & \Rightarrow \lmul{kh}{(\Ed\Fd)} && \text{if commutative} \\ \end{align} $$

Distributive Identities

In addition to the linear identities, the multiplication and multiplication by a scalar are distributed on the sum. $$ \begin{gather*} \lmul{k}{(\Ed+\Fd)} \Rightarrow \lmul{k}{\Ed} + \lmul{k}{\Fd} \\ \Ed(\Fd+\Gd) \Rightarrow \Ed\Fd + \Ed\Gd \qquad (\Ed+\Fd)\Gd \Rightarrow \Ed\Gd + \Fd\Gd \\ \end{gather*} $$


In [1]:
import vcsn
import pandas as pd
pd.options.display.max_colwidth = 0

The following helping routine takes a list of expressions as text (*es), converts them into genuine expression objects (ctx.expression(e, id)) for each id, formats them into LaTeX, and puts them in a DataFrame for display.

In [2]:
ids = ['trivial', 'associative', 'linear', 'distributive']
ctx = vcsn.context('lal_char(a-z), b')
def example(*es):
    res = []
    for e in es:
        res.append([e] + ['$' + ctx.expression(e, id).format('latex') + '$' for id in ids])
    return pd.DataFrame(res, columns=['Input'] + list(map(str.title, ids)))
In [3]:
example('a', 'a+b+c', 'a+(b+c)', 'a+b+c+d', 'b+a', '[ab][ab]')
Input Trivial Associative Linear Distributive
0 a $a$ $a$ $a$ $a$
1 a+b+c $\left(a + b\right) + c$ $a + b + c$ $a + b + c$ $a \oplus b \oplus c$
2 a+(b+c) $a + \left(b + c\right)$ $a + b + c$ $a + b + c$ $a \oplus b \oplus c$
3 a+b+c+d $\left(\left(a + b\right) + c\right) + d$ $[a\textrm{-}d]$ $[a\textrm{-}d]$ $[a\textrm{-}d]$
4 b+a $b + a$ $b + a$ $a + b$ $a \oplus b$
5 [ab][ab] $\left(a + b\right)^{2}$ $\left(a + b\right)^{2}$ $\left(a + b\right)^{2}$ $a \, a \oplus a \, b \oplus b \, a \oplus b \, b$

A few more examples, with weights in $\mathbb{Q}$:

In [4]:
ctx = vcsn.Q
example('a', 'a+a+a', 'a+a+b', 'a+b+a', '<2>(a+b)', '([ab]+[ab]){2}', '<2>ab<3>cd<5>')
Input Trivial Associative Linear Distributive
0 a $a$ $a$ $a$ $a$
1 a+a+a $\left(a + a\right) + a$ $a + a + a$ $ \left\langle 3 \right\rangle \,a$ $ \left\langle 3 \right\rangle \,a$
2 a+a+b $\left(a + a\right) + b$ $a + a + b$ $ \left\langle 2 \right\rangle \,a + b$ $ \left\langle 2 \right\rangle \,a \oplus b$
3 a+b+a $\left(a + b\right) + a$ $a + b + a$ $ \left\langle 2 \right\rangle \,a + b$ $ \left\langle 2 \right\rangle \,a \oplus b$
4 <2>(a+b) $ \left\langle 2 \right\rangle \,\left(a + b\right)$ $ \left\langle 2 \right\rangle \,\left(a + b\right)$ $ \left\langle 2 \right\rangle \,\left(a + b\right)$ $ \left\langle 2 \right\rangle \,a \oplus \left\langle 2 \right\rangle \,b$
5 ([ab]+[ab]){2} $\left(\left(a + b\right) + \left(a + b\right)\right)^{2}$ $\left(a + b + a + b\right)^{2}$ $\left( \left\langle 2 \right\rangle \,a + \left\langle 2 \right\rangle \,b\right)^{2}$ $ \left\langle 4 \right\rangle \,\left(a \, a\right) \oplus \left\langle 4 \right\rangle \,\left(a \, b\right) \oplus \left\langle 4 \right\rangle \,\left(b \, a\right) \oplus \left\langle 4 \right\rangle \,\left(b \, b\right)$
6 <2>ab<3>cd<5> $ \left\langle 2 \right\rangle \,a \, \left(b \, \left( \left\langle 3 \right\rangle \,c \, \left\langle 5 \right\rangle \,d\right)\right)$ $ \left\langle 2 \right\rangle \,a \, b \, \left\langle 3 \right\rangle \,c \, \left\langle 5 \right\rangle \,d$ $ \left\langle 30 \right\rangle \,\left(a \, b \, c \, d\right)$ $ \left\langle 30 \right\rangle \,\left(a \, b \, c \, d\right)$

Try it!

The following piece of code defines an interactive function for you to try your own expression. Enter an expression in the text area, then click on the "Run" button.

In [5]:
from ipywidgets import interact_manual
from IPython.display import display
es = []
def interactive_example(expression = "[ab]{3,}"):
Input Trivial Associative Linear Distributive
0 [ab]{3,} $\left(\left(\left(a + b\right)^{2}\right) \, \left(a + b\right)\right) \, \left(a + b\right)^{*}$ $\left(a + b\right)^{3} \, \left(a + b\right)^{*}$ $\left(a + b\right)^{3} \, \left(a + b\right)^{*}$ ${a}^{3} \, \left(a \oplus b\right)^{*} \oplus a \, a \, b \, \left(a \oplus b\right)^{*} \oplus a \, b \, a \, \left(a \oplus b\right)^{*} \oplus a \, b \, b \, \left(a \oplus b\right)^{*} \oplus b \, a \, a \, \left(a \oplus b\right)^{*} \oplus b \, a \, b \, \left(a \oplus b\right)^{*} \oplus b \, b \, a \, \left(a \oplus b\right)^{*} \oplus {b}^{3} \, \left(a \oplus b\right)^{*}$