# Questions taken from Stackoverflow¶

This page lists questions about automata and other rational/regular expressions that were asked on Stackoverflow, and where Vcsn seems to be an appropriate tool to compute the answer.

## Build a Regular Expression and Finite Automata¶

The set of all strings beginning with 101 and ending with 01010.

First, let's define our "context": we work with "labels are letters" (lal), on the alphabet $\{0, 1\}$. We don't use weights, or rather, we use the traditional Boolean weights: $\mathbb{B}$.

In [1]:
import vcsn
c = vcsn.context('lal_char(01), b')
c

Out[1]:
$\{0, 1\}\to\mathbb{B}$

Then, we build our expression using an unusual operator: $\mathsf{E} \& \mathsf{F}$ denotes the conjunction of expressions $\mathsf{E}$ and $\mathsf{F}$. In this case (unweighted/Boolean automata), it denotes exactly the intersection of languages.

In [2]:
e = c.expression('(101[01]*)&([01]*01010)')
e

Out[2]:
$1 \, 0 \, 1 \, \left(0 + 1\right)^{*} \& \left(0 + 1\right)^{*} \, 0 \, 1 \, 0 \, 1 \, 0$

We want to normalize this extended expression (it has conjunction and complement operators) into a basic expression. To this end, we first convert it to an automaton.

In [3]:
a = e.automaton()
a

Out[3]:

and then we convert this automaton into a basic expression:

In [4]:
a.expression()

Out[4]:
$1 \, \left(0 \, 1 + 0 \, 1 \, \left(0 + 1\right)^{*} \, 0 \, 1\right) \, 0 \, 1 \, 0$

Or, in ASCII:

In [5]:
print(a.expression())

1(01+01(0+1)*01)010


## Regular expression to match text that doesn't contain a word?¶

I'd like to know if it's possible to match lines that don't contain a specific word (e.g. hede) using a regular expression?

First, let's define that alphabet we work on: from $a$ to $z$ for instance.

In [6]:
import vcsn
c = vcsn.context('lal_char(a-z), b')
c

Out[6]:
$\{a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z\}\to\mathbb{B}$

Then we define our expression, which is extended (it uses the complement operator), so to normalize it, we first convert it into automaton (with _expression_.automaton), from which we extract a basic expression (with _automaton_.expresion).

In [7]:
e = c.expression('(hede){c}')
e

Out[7]:
$\left(h \, e \, d \, e\right)^{c}$
In [8]:
a = e.automaton()
a

Out[8]:
In [9]:
a.expression()

Out[9]:
$\varepsilon + h \, \left(\varepsilon + e \, \left(\varepsilon + d\right)\right) + \left([\hat{}h] + h \, \left([\hat{}e] + e \, \left([\hat{}d] + d \, \left([\hat{}e] + e \, [\hat{}]\right)\right)\right)\right) \, {[\hat{}]}^{*}$

Or, in ASCII (+ is usually denoted |; \e denotes the empty word; and [^] denotes any character, usually written .):

In [10]:
print(a.expression())

\e+h(\e+e(\e+d))+([^h]+h([^e]+e([^d]+d([^e]+e[^]))))[^]*


## Convert finite state machine to regular expression¶

Is there a tool (or an algorithm) to convert a finite state machine into a regular expression?

Vcsn is tool for rational expressions and automata. See http://vcsn.lrde.epita.fr.

In [11]:
import vcsn


Build a random automaton $a$ of 4 states, labeled by letters to choose in ${a, b, c}$. Then "lift" it into an automaton $b$ labeled by expressions.

In [12]:
a = vcsn.context('lal(abc), b').random(4)
b = a.lift()
b

Out[12]:

Eliminate state 2, then 3, etc. The order does not matter for correction (any order gives a correct result), but the quality of the result does depend on this order.

In [13]:
b = b.eliminate_state(2)
b

Out[13]:
In [14]:
b = b.eliminate_state(3); b

Out[14]:
In [15]:
b = b.eliminate_state(1); b

Out[15]:

In [16]:
b = b.eliminate_state(0); b

Out[16]:

Alternatively, you can use the method automaton.expression() to ask for the result.

In [17]:
a.expression()

Out[17]:
$\left(a + c\right) \, c \, \left(b + c\right) \, \left(a + c + a \, \left(a + c\right) \, c \, \left(b + c\right)\right)^{*}$

## Constructing a Regular Expression from a Finite Automata¶

I'm trying to construct a regular expression from a Finite Automaton but found my self completely stuck with this one.

In [18]:
%%automaton a
$-> q3 q1 -> q2 a q1 -> q3 b q2 -> q2 a, b q3 -> q4 a q3 -> q3 b q4 -> q4 a q4 -> q1 b q3 ->$
q4 -> $ In [19]: a.expression()  Out[19]:$\left(b + a \, {a}^{*} \, b \, b\right)^{*} \, \left(\varepsilon + a \, {a}^{*}\right)\$