Questions taken from Stackoverflow

This page lists questions about automata and other rational/regular expressions that were asked on Stackoverflow, and where Vcsn seems to be an appropriate tool to compute the answer.

The set of all strings beginning with 101 and ending with 01010.

First, let's define our "context": we work with "labels are letters" (lal), on the alphabet $\{0, 1\}$. We don't use weights, or rather, we use the traditional Boolean weights: $\mathbb{B}$.

In [1]:
import vcsn
c = vcsn.context('lal_char(01), b')
c
Out[1]:
$\{0, 1\}\to\mathbb{B}$

Then, we build our expression using an unusual operator: $\mathsf{E} \& \mathsf{F}$ denotes the conjunction of expressions $\mathsf{E}$ and $\mathsf{F}$. In this case (unweighted/Boolean automata), it denotes exactly the intersection of languages.

In [2]:
e = c.expression('(101[01]*)&([01]*01010)')
e
Out[2]:
$1 \, 0 \, 1 \, \left(0 + 1\right)^{*} \& \left(0 + 1\right)^{*} \, 0 \, 1 \, 0 \, 1 \, 0$

We want to normalize this extended expression (it has conjunction and complement operators) into a basic expression. To this end, we first convert it to an automaton.

In [3]:
a = e.automaton()
a
Out[3]:
%3 I0 0 0 I0->0 F9 1 1 0->1 1 2 2 1->2 0 3 3 1->3 0 6 6 2->6 1 4 4 3->4 1 4->4 0, 1 5 5 4->5 0 5->6 1 7 7 6->7 0 8 8 7->8 1 9 9 8->9 0 9->F9

and then we convert this automaton into a basic expression:

In [4]:
a.expression()
Out[4]:
$1 \, \left(0 \, 1 + 0 \, 1 \, \left(0 + 1\right)^{*} \, 0 \, 1\right) \, 0 \, 1 \, 0$

Or, in ASCII:

In [5]:
print(a.expression())
1(01+01(0+1)*01)010

Regular expression to match text that doesn't contain a word?

I'd like to know if it's possible to match lines that don't contain a specific word (e.g. hede) using a regular expression?

First, let's define that alphabet we work on: from $a$ to $z$ for instance.

In [6]:
import vcsn
c = vcsn.context('lal_char(a-z), b')
c
Out[6]:
$\{a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z\}\to\mathbb{B}$

Then we define our expression, which is extended (it uses the complement operator), so to normalize it, we first convert it into automaton (with _expression_.automaton), from which we extract a basic expression (with _automaton_.expresion).

In [7]:
e = c.expression('(hede){c}')
e
Out[7]:
$\left(h \, e \, d \, e\right)^{c}$
In [8]:
a = e.automaton()
a
Out[8]:
%3 I0 0 0 I0->0 F0 F1 F2 F3 F4 0->F0 1 1 0->1 [^h] 2 2 0->2 h 1->F1 1->1 [^] 2->F2 2->1 [^e] 3 3 2->3 e 3->F3 3->1 [^d] 4 4 3->4 d 4->F4 4->1 [^e] 5 5 4->5 e 5->1 [^]
In [9]:
a.expression()
Out[9]:
$\varepsilon + h \, \left(\varepsilon + e \, \left(\varepsilon + d\right)\right) + \left([\hat{}h] + h \, \left([\hat{}e] + e \, \left([\hat{}d] + d \, \left([\hat{}e] + e \, [\hat{}]\right)\right)\right)\right) \, {[\hat{}]}^{*}$

Or, in ASCII (+ is usually denoted |; \e denotes the empty word; and [^] denotes any character, usually written .):

In [10]:
print(a.expression())
\e+h(\e+e(\e+d))+([^h]+h([^e]+e([^d]+d([^e]+e[^]))))[^]*

Convert finite state machine to regular expression

Is there a tool (or an algorithm) to convert a finite state machine into a regular expression?

Vcsn is tool for rational expressions and automata. See http://vcsn.lrde.epita.fr.

In [11]:
import vcsn

Build a random automaton $a$ of 4 states, labeled by letters to choose in ${a, b, c}$. Then "lift" it into an automaton $b$ labeled by expressions.

In [12]:
a = vcsn.context('lal(abc), b').random_automaton(4)
b = a.lift()
b
Out[12]:
%3 I0 0 0 I0->0 F1 2 2 0->2 ⟨c⟩ 1 1 1->F1 1->1 ⟨a⟩ 3 3 2->3 ⟨a+b⟩ 3->1 ⟨b⟩

Eliminate state 2, then 3, etc. The order does not matter for correction (any order gives a correct result), but the quality of the result does depend on this order.

In [13]:
b = b.eliminate_state(2)
b
Out[13]:
%3 I0 0 0 I0->0 F1 3 3 0->3 ⟨c(a+b)⟩ 1 1 1->F1 1->1 ⟨a⟩ 3->1 ⟨b⟩
In [14]:
b = b.eliminate_state(3); b
Out[14]:
%3 I0 0 0 I0->0 F1 1 1 0->1 ⟨c(a+b)b⟩ 1->F1 1->1 ⟨a⟩
In [15]:
b = b.eliminate_state(1); b
Out[15]:
%3 I0 0 0 I0->0 F0 0->F0 ⟨c(a+b)ba*⟩

Eliminate the last state, and read the answer.

In [16]:
b = b.eliminate_state(0); b
Out[16]:
%3 Ipost Fpre Ipost->Fpre ⟨c(a+b)ba*⟩

Alternatively, you can use the method automaton.expression to ask for the result.

In [17]:
a.expression()
Out[17]:
$c \, \left(a + b\right) \, b \, {a}^{*}$

You may see this algorithm run "interactively" using %demo.

In [18]:
a = vcsn.context('lal(abc), b').random(10)
b = a.lift()
%demo b eliminate_state
%3 I0 0 0 I0->0 F8 1 1 0->1 ⟨b⟩ 3 3 0->3 ⟨a+b+c⟩ 5 5 1->5 ⟨b⟩ 6 6 1->6 ⟨c⟩ 9 9 1->9 ⟨b+c⟩ 2 2 7 7 2->7 ⟨b+c⟩ 8 8 2->8 ⟨a+c⟩ 4 4 3->4 ⟨b⟩ 4->7 ⟨a+b+c⟩ 5->2 ⟨a+c⟩ 5->7 ⟨a+c⟩ 6->6 ⟨c⟩ 7->1 ⟨b⟩ 7->7 ⟨b+c⟩ 8->F8 8->4 ⟨a+c⟩ 9->0 ⟨b⟩ 9->9 ⟨a+b⟩

Constructing a Regular Expression from a Finite Automata

I'm trying to construct a regular expression from a Finite Automaton but found my self completely stuck with this one.

In [19]:
%%automaton a
$  -> q3
q1 -> q2 a
q1 -> q3 b
q2 -> q2 a, b
q3 -> q4 a
q3 -> q3 b
q4 -> q4 a
q4 -> q1 b
q3 -> $
q4 -> $
%3 I0 0 q3 I0->0 F0 F3 0->F0 0->0 b 3 q4 0->3 a 1 q1 1->0 b 2 q2 1->2 a 2->2 a, b 3->F3 3->1 b 3->3 a
In [20]:
a.expression()
Out[20]:
$\left(b + a \, {a}^{*} \, b \, b\right)^{*} \, \left(\varepsilon + a \, {a}^{*}\right)$