Questions taken from Stackoverflow

This page lists questions about automata and other rational/regular expressions that were asked on Stackoverflow, and where Vcsn seems to be an appropriate tool to compute the answer.

In [1]:
import os
# This trick ensures that we always use the same random seed,
# hence running this documentation always gives the same result.
os.environ['VCSN_SEED'] = '1'

Build a Regular Expression and Finite Automata

The set of all strings beginning with 101 and ending with 01010.

First, let's define our "context": we work with "labels are letters" (lal), on the alphabet $\{0, 1\}$. We don't use weights, or rather, we use the traditional Boolean weights: $\mathbb{B}$.

In [2]:
import vcsn
c = vcsn.context('lal_char(01), b')
c
Out[2]:
$\{0, 1\}\to\mathbb{B}$

Then, we build our expression using an unusual operator: $\mathsf{E} \& \mathsf{F}$ denotes the conjunction of expressions $\mathsf{E}$ and $\mathsf{F}$. In this case (unweighted/Boolean automata), it denotes exactly the intersection of languages.

In [3]:
e = c.expression('(101[01]*)&([01]*01010)')
e
Out[3]:
$1 \, 0 \, 1 \, \left(0 + 1\right)^{*} \& \left(0 + 1\right)^{*} \, 0 \, 1 \, 0 \, 1 \, 0$

We want to normalize this extended expression (it has conjunction and complement operators) into a basic expression. To this end, we first convert it to an automaton.

In [4]:
a = e.automaton()
a
Out[4]:
%3 I0 0 0 I0->0 F9 1 1 0->1 1 2 2 1->2 0 3 3 1->3 0 6 6 2->6 1 4 4 3->4 1 4->4 0, 1 5 5 4->5 0 5->6 1 7 7 6->7 0 8 8 7->8 1 9 9 8->9 0 9->F9

and then we convert this automaton into a basic expression:

In [5]:
a.expression()
Out[5]:
$1 \, \left(0 \, 1 + 0 \, 1 \, \left(0 + 1\right)^{*} \, 0 \, 1\right) \, 0 \, 1 \, 0$

Or, in ASCII:

In [6]:
print(a.expression())
1(01+01(0+1)*01)010

Regular expression to match text that doesn't contain a word?

I'd like to know if it's possible to match lines that don't contain a specific word (e.g. hede) using a regular expression?

First, let's define that alphabet we work on: from $a$ to $z$ for instance.

In [7]:
import vcsn
c = vcsn.context('lal_char(a-z), b')
c
Out[7]:
$\{a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z\}\to\mathbb{B}$

Then we define our expression, which is extended (it uses the complement operator), so to normalize it, we first convert it into automaton (with _expression_.automaton), from which we extract a basic expression (with _automaton_.expresion).

In [8]:
e = c.expression('(hede){c}')
e
Out[8]:
$\left(h \, e \, d \, e\right)^{c}$
In [9]:
a = e.automaton()
a
Out[9]:
%3 I0 0 0 I0->0 F0 F1 F2 F3 F4 0->F0 1 1 0->1 [^h] 2 2 0->2 h 1->F1 1->1 [^] 2->F2 2->1 [^e] 3 3 2->3 e 3->F3 3->1 [^d] 4 4 3->4 d 4->F4 4->1 [^e] 5 5 4->5 e 5->1 [^]
In [10]:
a.expression()
Out[10]:
$\varepsilon + h \, \left(\varepsilon + e \, \left(\varepsilon + d\right)\right) + \left([\hat{}h] + h \, \left([\hat{}e] + e \, \left([\hat{}d] + d \, \left([\hat{}e] + e \, [\hat{}]\right)\right)\right)\right) \, {[\hat{}]}^{*}$

Or, in ASCII (+ is usually denoted |; \e denotes the empty word; and [^] denotes any character, usually written .):

In [11]:
print(a.expression())
\e+h(\e+e(\e+d))+([^h]+h([^e]+e([^d]+d([^e]+e[^]))))[^]*

Convert finite state machine to regular expression

Is there a tool (or an algorithm) to convert a finite state machine into a regular expression?

Vcsn is tool for rational expressions and automata. See http://vcsn.lrde.epita.fr.

In [12]:
import vcsn

Build a random automaton $a$ of 4 states, labeled by letters to choose in ${a, b, c}$. Then "lift" it into an automaton $b$ labeled by expressions.

In [13]:
a = vcsn.context('lal(abc), b').random_automaton(4, num_final=2)
b = a.lift()
b
Out[13]:
%3 I0 0 0 I0->0 F0 F3 0->F0 0->0 ⟨b+c⟩ 2 2 0->2 ⟨c⟩ 1 1 3 3 1->3 ⟨b⟩ 2->0 ⟨b⟩ 2->1 ⟨b⟩ 2->2 ⟨b+c⟩ 3->F3 3->2 ⟨a+b⟩

Eliminate state 2, then 3, etc. The order does not matter for correction (any order gives a correct result), but the quality of the result does depend on this order.

In [14]:
b = b.eliminate_state(2)
b
Out[14]:
%3 I0 0 0 I0->0 F0 F3 0->F0 0->0 ⟨b+c+c(b+c)*b⟩ 1 1 0->1 ⟨c(b+c)*b⟩ 3 3 1->3 ⟨b⟩ 3->F3 3->0 ⟨(a+b)(b+c)*b⟩ 3->1 ⟨(a+b)(b+c)*b⟩
In [15]:
b = b.eliminate_state(3); b
Out[15]:
%3 I0 0 0 I0->0 F0 F1 0->F0 0->0 ⟨b+c+c(b+c)*b⟩ 1 1 0->1 ⟨c(b+c)*b⟩ 1->F1 ⟨b⟩ 1->0 ⟨b(a+b)(b+c)*b⟩ 1->1 ⟨b(a+b)(b+c)*b⟩
In [16]:
b = b.eliminate_state(1); b
Out[16]:
%3 I0 0 0 I0->0 F0 0->F0 ⟨ε+c(b+c)*b(b(a+b)(b+c)*b)*b⟩ 0->0 ⟨b+c+c(b+c)*b+c(b+c)*b(b(a+b)(b+c)*b)*b(a+b)(b+c)*b⟩

Eliminate the last state, and read the answer.

In [17]:
b = b.eliminate_state(0); b
Out[17]:
%3 Ipost Fpre Ipost->Fpre ⟨(b+c+c(b+c)*b+c(b+c)*b(b(a+b)(b+c)*b)*b(a+b)(b+c)*b)*(ε+c(b+c)*b(b(a+b)(b+c)*b)*b)⟩

Alternatively, you can use the method automaton.expression to ask for the result.

In [18]:
a.expression()
Out[18]:
$\left(b + c + c \, \left(b + c + b \, b \, \left(a + b\right)\right)^{*} \, b\right)^{*} \, \left(\varepsilon + c \, \left(b + c + b \, b \, \left(a + b\right)\right)^{*} \, b \, b\right)$

You may see this algorithm run "interactively" using %demo.

In [19]:
a = vcsn.context('lal(abc), b').random_automaton(10)
b = a.lift()
%demo b eliminate_state

Constructing a Regular Expression from a Finite Automata

I'm trying to construct a regular expression from a Finite Automaton but found my self completely stuck with this one.

In [20]:
import vcsn
In [21]:
%%automaton a
$  -> q3
q1 -> q2 a
q1 -> q3 b
q2 -> q2 a, b
q3 -> q4 a
q3 -> q3 b
q4 -> q4 a
q4 -> q1 b
q3 -> $
q4 -> $
%3 I0 0 q3 I0->0 F0 F3 0->F0 0->0 b 3 q4 0->3 a 1 q1 1->0 b 2 q2 1->2 a 2->2 a, b 3->F3 3->1 b 3->3 a
In [22]:
a.expression()
Out[22]:
$\left(b + a \, {a}^{*} \, b \, b\right)^{*} \, \left(\varepsilon + a \, {a}^{*}\right)$

How to find the right quotient of a language given two languages?

(This is the question with typos in $L_1$ and $L_2$ fixed.)

If $L_1= \{a^n b^m \mid n \geqslant 1, m \geqslant 0 \} \cup \{ba\}$ and $L_2= \{b^m \mid m \geqslant 1 \}$. I am not getting how the DFA for $L_1/L_2$ is constructed in the second figure using the DFA for $L_1$, please tell me the approach.

First, let's define our "context": we work with "labels are letters" (lal), on the alphabet $\{a, b\}$. We don't use weights, or rather, we use the traditional Boolean weights: $\mathbb{B}$.

In [23]:
import vcsn
c = vcsn.context('lal(ab), b')
c
Out[23]:
$\{a, b\}\to\mathbb{B}$

Then define the first automaton, Figure 4.1.

In [24]:
%%automaton a1
$  -> q0
q1 -> $
q0 -> q1 a
q0 -> q3 b
q1 -> q1 a
q1 -> q2 b
q2 -> $
q2 -> q2 b
q2 -> q5 a
q3 -> q4 a
q3 -> q5 b
q4 -> $
q4 -> q5 a, b
q5 -> q5 a, b
%3 I0 0 q0 I0->0 F1 F3 F5 1 q1 0->1 a 2 q3 0->2 b 1->F1 1->1 a 3 q2 1->3 b 4 q5 2->4 b 5 q4 2->5 a 3->F3 3->3 b 3->4 a 4->4 a, b 5->F5 5->4 a, b
In [25]:
a1.is_deterministic() and a1.is_complete()
Out[25]:
True

Automaton $\mathcal{A}_2$ represents the language $L_2$:

In [26]:
a2 = c.expression('b{+}').automaton()
a2
Out[26]:
%3 I0 0 0 I0->0 F1 1 1 0->1 b 1->F1 1->1 b

We may now compute the quotient, which is indeed the automaton of Figure 4.2. It is better looking once trimmed.

In [27]:
a1 / a2
Out[27]:
%3 I0 0 0 I0->0 F1 F3 1 1 0->1 a 2 2 0->2 b 1->F1 1->1 a 3 3 1->3 b 4 4 2->4 b 5 5 2->5 a 3->F3 3->3 b 3->4 a 4->4 a, b 5->4 a, b
In [28]:
(a1 / a2).trim()
Out[28]:
%3 I0 0 0 I0->0 F1 F3 1 1 0->1 a 1->F1 1->1 a 3 3 1->3 b 3->F3 3->3 b