`context`.`levenshtein`¶

Generate a transducer encoding the Levenshtein distance for two alphabets.

The resulting transducer can be composed with transducers representing two languages to compute the distance between the two languages (among other things, like the edit-distance automaton).

Preconditions:

the labelset has exactly two tapes
the labelsets must have the empty word
the weightset must be nmin ($\langle \mathbb{N}, \min, +\rangle$)

References:

mohri.2002.ciaa for more details on the edit-distance automaton

Examples¶

import vcsn
c = vcsn.context('lat<lan_char(abc), lan_char(bce)>, nmin')
l = c.levenshtein()
l

The Levenshtein automaton only has one state, but has $n^2$ transitions, for a common alphabet between tapes of size $n$.

To show its use, we will create automata for two languages, a1 and a2, and compute the edit-distance automaton.

a1 = vcsn.context('lan_char(abc), b').expression("bac+cab").derived_term().strip().partial_identity()
a1

a2 = vcsn.context('lan_char(bce), b').expression("bec+bebe").automaton().cominimize().strip().partial_identity()
a2

edit = a1.compose(l).compose(a2).trim()
edit

The automaton can be evaluated on one tape to get the edit distance between a word and a language.

exp = edit.lift(1).proper().evaluate("bac")
exp

vcsn.context("lan_char(bce), nmin").expression(exp.format('text'), 'distributive')

context.levenshtein¶

Examples¶

`context`.`levenshtein`¶