context.levenshtein¶

Generate a transducer encoding the Levenshtein distance for two alphabets.

The resulting transducer can be composed with transducers representing two languages to compute the distance between the two languages (among other things, like the edit-distance automaton).

Preconditions:

• the labelset has exactly two tapes
• the labelsets must have the empty word
• the weightset must be nmin ($\langle \mathbb{N}, \min, +\rangle$)

References:

Examples¶

In [1]:
import vcsn
c = vcsn.context('lat<lan_char(abc), lan_char(bce)>, nmin')
l = c.levenshtein()
l

Out[1]:

The Levenshtein automaton only has one state, but has $n^2$ transitions, for a common alphabet between tapes of size $n$.

To show its use, we will create automata for two languages, a1 and a2, and compute the edit-distance automaton.

In [2]:
a1 = vcsn.context('lan_char(abc), b').expression("bac+cab").derived_term().strip().partial_identity()
a1

Out[2]:
In [3]:
a2 = vcsn.context('lan_char(bce), b').expression("bec+bebe").automaton().cominimize().strip().partial_identity()
a2

Out[3]:
In [4]:
edit = a1.compose(l).compose(a2).trim()
edit

Out[4]:

The automaton can be evaluated on one tape to get the edit distance between a word and a language.

In [5]:
exp = edit.lift(1).proper().eval("bac")
exp

Out[5]:
$\left\langle 3 \right\rangle \,\left(\left(b\right) \, \left(e\right) \, \left(b\right) \, \left(e\right)\right) + \left\langle 1 \right\rangle \,\left(\left(b\right) \, \left(e\right) \, \left(c + \left\langle 2 \right\rangle \,\left(\left(b\right) \, \left(e\right)\right)\right)\right) + \left\langle 2 \right\rangle \,\left(\left(b\right) \, \left(e\right) \, \left( \left\langle 1 \right\rangle \,\left(c\right) + \left\langle 2 \right\rangle \,\left(\left(b\right) \, \left(e\right)\right)\right)\right) + \left\langle 1 \right\rangle \,\left( \left\langle 3 \right\rangle \,\left(\left(b\right) \, \left(e\right) \, \left(b\right) \, \left(e\right)\right) + \left\langle 1 \right\rangle \,\left(\left(b\right) \, \left(e\right) \, \left( \left\langle 1 \right\rangle \,\left(c\right) + \left\langle 2 \right\rangle \,\left(\left(b\right) \, \left(e\right)\right)\right)\right)\right)$
In [6]:
vcsn.context("lan_char(bce), nmin").series(exp.format('text'))

Out[6]:
$\left\langle 1 \right\rangle \,\left(b \, e \, c\right) \oplus \left\langle 3 \right\rangle \,\left(b \, e \, b \, e\right)$