Personal tools

Vcsn/News File


Jump to: navigation, search


This file describes user visible changes in the course of the development of Vcsn, in reverse chronological order. On occasions, significant changes in the internal API may also be documented.

Vcsn 2.5 (2017-01-28)

The Vcsners are proud to announce the release of Vcsn 2.5, aka the k-lightest release!

Noteworthy changes include:

  • two new implementations to compute the k-lightest paths (aka "best" paths: with the smallest weights) in tropical automata.
  • several new demos showing how to use the C++ library, including a translator from (French) text messages (aka SMS) into French.
  • a new means for the users to configure Vcsn. This adds a new prerequisite to build Vcsn: libyaml-cpp.
  • a much better and faster caching system of runtime generated C++ code. Users of dyn (and therefore of Python) should note an improved amortization of the compilation costs.
  • several portability issues reported by users were fixed.

For more information, please, see the detailed news below.

People who worked on this release:

  • Akim Demaille
  • Clément Démoulins
  • Clément Gillard
  • Sarasvati Moutoucomarapoulé
  • Sébastien Piat
  • Younes Khoudli


Improved error messages

Parse errors were improved with caret-style reports for expressions and automata in dot (Graphviz) and daut syntax.

In [5]: vcsn.Q.expression('<1/2>a + <1/0>b')
RuntimeError: 1.10-14: Q: null denominator
<1/2>a + <1/0>b
  while reading expression: <1/2>a + <1/0>b

In [6]: vcsn.automaton('''
   ...: $ -> 0
   ...: 0 -> 1 <1/2>a, b
   ...: 1 -> $''')
RuntimeError: 3.1-16: B: unexpected trailing characters: /2
  while reading: 1/2
  while reading: <1/2>a, b
0 -> 1 <1/2>a, b
  while reading automaton

In [7]: vcsn.automaton('digraph{vcsn_context="lal, b" 0->1[label="<2>a"]}')
RuntimeError: 1.35-48: B: invalid value: 2
  while reading: 2
  while reading: <2>a
digraph{vcsn_context="lal, b" 0->1[label="<2>a"]}
  while reading automaton


"auto" automaton file format

Pass "auto" to read_automaton (in dyn, static, Python or the Tools) to let the system guess the automaton file format (daut, dot, etc.).



New Natural Language Processing demonstration of the computations of the lightest paths. This application is a translator from SMS (i.e., text messages) to proper French. The implementation is based on Rewriting the orthography of SMS messages, François Yvon, In Natural Language Engineering, volume 16, 2010. The translator uses pre-trained automata and through compositions with the automaton representing the text message, generates all possible translations of the word. The best solution is then found with a shortest path algorithm. An example is given in the Sms2fr.ipynb notebook.


A richer dyn

The vcsn::dyn API was enriched. All the dyn types now support the usual operators: comparisons (==, !=, <, <=, >, >=), and compositions (+, *, &). New functions facilitate the creation of dyn values from strings (make_word, etc.). The file tests/demos/ shows several examples, explained in the C++-Library.ipynb notebook. For instance:

// A simple automaton.
auto a1 = make_automaton("context = lal, q\n"
                         "$ 0 <1/2>\n"
                         "0 1 <2>a, <6>b\n"
                         "1 $\n", "daut");
// Its context.
auto ctx = context_of(a1);

// Evaluate it.
assert(evaluate(a1, make_word(ctx, "a")) == make_weight(ctx, "1"));
assert(evaluate(a1, make_word(ctx, "b")) == make_weight(ctx, "3"));

// Concatenate to itself.
auto a2 = a1 * a1;
assert(evaluate(a2, make_word(ctx, "ab")) == make_weight(ctx, "3"));

// Self-conjunction, aka "power 2".
auto a3 = a1 & a1;
assert(evaluate(a3, make_word(ctx, "b")) == make_weight(ctx, "9"));



Vcsn now supports configuration files. They will be used to provide users with a means to customize Vcsn, for instance to tune the graphical rendering of the automata, to tailor the display of expressions, etc.

For a start, it provides a simple means to get the configuration information, including from Vcsn Tools.

$ vcsn ipython --no-banner
In [1]: import vcsn

In [2]: vcsn.config('configuration.cxxflags')
Out[2]: '-Qunused-arguments -O3 -g -std=c++1z'

$ vcsn configuration configuration.cxxflags
-Qunused-arguments -O3 -g -std=c++1z

This adds a new dependency: libyaml-cpp. Beware that version 0.5.2 is buggy and will not work properly. Use 0.5.1, or 0.5.3 or more recent.


compare: new algorithm

Three-way comparison is now available in all the layers, as compare, for automata, expressions, labels, polynomials and weights. This is used in Python to implement the comparisons (<, <=, >, =>, ==, !=) of expressions, and of automata (<, <=, >, =>).

However, since the comparison on automata is performed on the list of transitions, automata that are "very much alike" (i.e., different only by superficial details) will be considered different, so == and != are still using a safer, but much slower, comparison.

In [2]: exp = vcsn.Q.expression

In [3]: exp('a').compare(exp('b'))
Out[3]: -1

In [4]: exp('a').compare(exp('a'))
Out[4]: 0

In [5]: exp('<3>a').compare(exp('a'))
Out[5]: 1


k shortest paths algorithms

Two algorithms for searching k shortest paths in graphs have been implemented: "Eppstein" and "Yen". Hence, we can now search for the k lightest (smallest with respect to weights, aka "best" paths) paths in an automaton through the "lightest" algorithm. "lightest" used to compute these paths with an inefficient heap-based implementation:

aut.lightest(5, "auto")

Note that "auto" does not use the latter algorithm when returning only one path, as simpler shortest path algorithms would apply to the situation and be more efficient. It then uses Dijkstra's algorithm. It can now be called with the given Yen and Eppstein algorithms as follows:

aut.lightest(5, "eppstein")
aut.lightest(5, "yen")

Yen's algorithm requires the given automaton to have no cycles, while Eppstein supports any type of automaton. For small values of k, Yen's algorithm has better performances than Eppstein, but with increasing values of k, Eppstein is always more efficient.

Vcsn 2.4 (2016-11-16)

The Vcsn team is happy to announce the release of Vcsn 2.4, code-named "the quotient tools"!

Noteworthy changes include, besides a few bug fixes:

  • an overhaul of the "Vcsn Tools" (previously known as TAF-Kit). Because the tools are now automatically generated, they are much more extensive than previously: (almost) all of the dyn algorithms are now available from the shell. It now also supports the 'daut' format for automata.

    $ vcsn thompson -Ee '[ab]*c' | vcsn proper | vcsn determinize | vcsn minimize | vcsn to-expression
    $ vcsn random-expression -C 'lal(abc), z' '+, ., w., length=20, w="min=-2, max=10"'
  • an new method to construct an automaton from an extended expression: expression.inductive. This provides an alternative to expression.derived_term. Currently provides a single flavor: generation of standard automata.

    In [2]: vcsn.B.expression('! [ab]*a[ab]*').inductive().expression()
    Out[2]: +bb*
    In [3]: vcsn.B.expression('! [ab]*a[ab]*').automaton().expression()
    Out[3]: b*
  • full support for quotient operators on all entities: labels, expressions, automata, expansions, etc.

    In [2]: c = vcsn.context('lan, q')
       ...: c
    Out[2]: {...}? -> Q
    In [3]: label = vcsn.context('law, q').label
       ...: label('abc') / label('c')
    Out[3]: ab
    In [4]: exp = c.expression
    In [5]: exp('ab').ldivide(exp('ab*c'))
    Out[5]: ab{\}ab*c
    In [6]: e = exp('ab').ldivide(exp('ab*c'))
       ...: e
    Out[6]: ab{\}ab*c
    In [7]: e.automaton().expression()
    Out[7]: b*c

Operators {\} (left quotient) and {/} (right quotient) are available in the rational expressions:

In [8]: e = exp('ab {\} abc*')
   ...: e
Out[8]: ab{\}abc*

In [9]: e.expansion()
Out[9]: \e.[b{\}bc*]

In [10]: e.derived_term().expression()
Out[10]: c*

In [11]: e.inductive().expression()
Out[11]: \e+cc*
  • automaton.evaluate works properly on non-free automata, including multitape automata:

    In [2]: c = vcsn.context('lan(a-z), nmin')
       ...: a = (c|c).levenshtein()
       ...: a('foo|bar')
    Out[2]: 3
  • input/output support for FAdo's format for transducers, and improved compatibility with OpenFST.

For more information, please, see the detailed news below.

People who worked on this release:

  • Akim Demaille
  • Clément Gillard
  • Lucien Boillod
  • Sarasvati Moutoucomarapoulé
  • Sébastien Piat
  • Younes Khoudli

People who have influenced this release:

  • Alexandre Duret-Lutz
  • Jacques Sakarovitch
  • Luca Saiu
  • Sylvain Lombardy


random_automaton now generates weights

context.random_automaton now takes an optional weights parameter, allowing to set how the weights are generated. The syntax is the same as the param string of random_weights.

In [1]: import vcsn
        ctx = vcsn.context('lal_char(ab), z')
        a = ctx.random_automaton(3, weights='min=0, max=20')
context = letterset<char_letters(ab)>, z
$ -> 0
0 -> $
0 -> 2 <17>b
1 -> 1 <13>b
1 -> 2 <11>b
2 -> 0 <18>a, <13>b
2 -> 1 <12>a


eval is renamed evaluate

For consistency with the remainder of the API, we use the full, unabbreviated, name: evaluate.


weight_zero and weight_one are now available in Python

These methods return the "zero" and "one" weights of a context.

In [1]: import vcsn
        ctx = vcsn.context('lal_char, zmin')
In [2]: ctx.weight_one()
Out[2]: 0

In [3]: ctx.weight_zero()
Out[3]: ∞


Left and right divisions are now supported on labels

It is now possible to call left and right divisions on labels from Python, using // and / operators (respectively).

In [1]: import vcsn
        ctx = vcsn.context('law_char, b')
In [2]: l = ctx.label('a')
        r = ctx.label('abc')
        l // r # == l.ldivide(r)
Out[2]: bc

In [3]: l = ctx.label('abc')
        r = ctx.label('bc')
        l / r # == l.rdivide(r)
Out[3]: a


TAF-Kit is replaced by Tools

The new command line interface is now automatically generated from the algorithms in dyn, allowing it to support a lot more functions than previously.

The supported algorithms are:

accessible add ambiguous-word are-equivalent are-isomorphic cat
cerny coaccessible codeterminize cominimize complement complete
component compose concatenate condense conjugate conjunction
constant-term context-of copy costandard cotrie de-bruijn
delay-automaton derivation derived-term determinize difference
divkbaseb eliminate-state eval expand expression-one expression-zero
factor focus get-format has-bounded-lag has-lightening-cycle
has-twins-property identities-of inductive infiltrate insplit
is-accessible is-ambiguous is-coaccessible is-codeterministic
is-complete is-costandard is-cycle-ambiguous is-deterministic
is-empty is-eps-acyclic is-functional is-letterized is-normalized
is-out-sorted is-partial-identity is-proper is-realtime is-standard
is-synchronized is-synchronized-by is-synchronizing is-trim
is-useless is-valid join ladybird ldivide less-than letterize
levenshtein lgcd lift lightest lightest-automaton lweight
make-context make-word-context minimize multiply normalize
num-components num-tapes pair partial-identity prefix project proper
push-weights quotkbaseb random-automaton
random-automaton-deterministic random-expression random-weight
rdivide realtime reduce rweight scc set-format shortest shuffle sort
split standard star star-height star-normal-form strip subword
suffix synchronize synchronizing-word thompson to-automaton
to-expansion to-expression transpose transposition trie trim tuple
type u universal weight-series zpc

To get more information about a particular algorithm, you can type vcsn COMMAND -h or --help:

$ vcsn eval --help
usage: vcsn eval [OPTIONS...] [ARGS...]

Available versions:
 eval: AUT:automaton P:polynomial -> weight
    Evaluate P on AUT.

 eval: AUT:automaton L:word -> weight
    Evaluate L on AUT.

Try 'vcsn tools --help' for more information.

You can for example generate the Thompson automaton that accepts ab*:

$ vcsn thompson 'ab*' -O daut
$ -> 0
0 -> 1 a
1 -> 4 \e
2 -> 3 b
3 -> 2 \e
3 -> 5 \e
4 -> 2 \e
4 -> 5 \e
5 -> $

For more information, please see the Executables documentation page, and vcsn tools -h.


FAdo: transducers and comments support

It is now possible to read and produce transducers in FAdo format. Comments are also supported in the parser.

In [1]: a = vcsn.automaton(data='''
        @Transducer 0 2 * 0 # Final * Initial
        0 0 @epsilon 1
        0 0 0 0
        0 1 @epsilon 1
        0 1 1 0
        1 @epsilon 0 2
        1 @epsilon 1 2
        1 0 0 1
        1 1 1 1''')

In [2]: print(a.format('fado'))
@Transducer 0 2 * 0
0 0 @epsilon 1
0 0 0 0
0 1 @epsilon 1
0 1 1 0
1 @epsilon 0 2
1 @epsilon 1 2
1 0 0 1
1 1 1 1


daut native parser and producer

Daut is a simplified Dot syntax for automata. This format was only available in Python. It is now possible to read and produce it in C++.

$ vcsn cat -A -I daut -O daut -f lal_char_q.daut
context = letterset<char_letters(abc)>, q
$ -> 0 <3>
0 -> 1 <1/2>a, <1/3>b
1 -> $ <2>


Improved compatibility with newer OpenFST

As OpenFST only supports a single initial state, pre is showed in case of several ones, with spontaneous transitions to them. Pre was represented by a very large integer which was read as a negative one in newer version of OpenFST, thus raising an error. The state number immediately after the highest state number is now used.


automaton.eval supports polynomials

It is now possible to evaluate polynomials of words on automata.


make_word_context is exposed in Python

It is now possible to call context.word_context() to get the context of the words of any context.


automaton.eval supports non-free labelsets

It is now possible to evaluate words on automata with non-free labelsets.

For example, we can compute the edit distance between two words:

In [2]: c = vcsn.context('lan(a-z), nmin')
        a = (c|c).levenshtein()
Out[2]: 3

In [3]: a('bar|baz')
Out[3]: 1

In [4]: a('qux|quuux')
Out[4]: 2


expression: inductive

Implemented as a hidden feature in Vcsn 2.3, expression.inductive is a new way of constructing automata from expressions, based on the algorithm given as argument. The only algorithm implemented yet is "standard" which uses standard operations to construct a standard automaton. The difference with expression.standard is that it handles extended expressions.

For example, we can compute the automaton equivalent of such expressions with the inductive method whereas we cannot with the standard one:

In [2]: vcsn.B.expression('! [ab]*a[ab]*').inductive().expression()
Out[2]: \e+bb*

expression.derived_term supports multitape expressions

Vcsn 2.3 already supports multitape expressions with the derived-term algorithm, but it was restricted to the expansion-based computation. The derivative-based computation is now also supported.

This is only a proof of concept: the implementation is more complex and much slower than the expansion-based approach.


expression.derivation works on multitape expressions

It is now possible to compute derivatives wrt labels such as a|x, a|\e or \e|x. It is however forbidden wrt \e|\e.

2016-07-23 levels of detail

Instead of a Boolean argument detailed, info now features an integer argument details, defaulting to 2. A new level, 1, contains only basic information (number of states and transitions).


expression: partial_identity

The partial_identity algorithm is now available on expressions too.

tuple: applies to contexts

It is now possible from dyn and Python to tuple several contexts. For instance vcsn.B | vcsn.Q is lat<lal, lal>, q.

Vcsn 2.3 (2016-07-08)

About four hundred commits and five months after Vcsn 2.2, we are proud to announce the release of Vcsn 2.3, code-named "the tuple release"!

As usual, many bugs were fixed (some quite old yet unnoticed so far!). Noteworthy changes include:

  • a particular effort was put on the documentation: there are thirty-five new documentation pages, and about forty others were improved.

  • full support for a "tuple" operator on all entities: expressions, polynomials, automata, etc.

    In [13]: aut = lambda e: vcsn.context('lan, q').expression(e).automaton()
    In [14]: a = aut('[ab]*') | aut('x')
    In [15]: a.shortest(6)
    Out[15]: |x + a|x + b|x + aa|x + ab|x + ba|x
  • It is also available in the rational expressions themselves:

    In [16]: c = vcsn.context('lat<lan, lan>, q'); c
    Out[16]: {...}? x {...}? -> Q
    In [17]: e = c.expression('[ab]*|x'); e
    Out[17]: (a+b)*|x
    In [18]: e.shortest(6)
    Out[18]: \e|x + a|x + b|x + aa|x + ab|x + ba|x

    The derived-term algorithm supports this operator, and generates equivalent multitape automata.

  • many error messages were improved, to help users understand their mistakes. For instance, instead of</p>
    In [2]: vcsn.Q.expression('a**').derivation('a')
    RuntimeError: q: star: invalid value: 1
  • we now display:

    In [2]: vcsn.Q.expression('a**').derivation('a')
    RuntimeError: Q: value is not starrable: 1
      while computing derivative of: a**
                    with respect to: a
  • in addition to %automaton a, which allows interactive edition of automata, the notebooks now feature two new interactive editors: %context c to edit/create context c, and %expression e for expressions (with an interactive display of the generated automata).
  • one may now generate random rational expressions and control the operators and their probabilities.
  • a lot of code improvement and consistency enforcement, both in C++ and in Python.

People who worked on this release:

  • Akim Demaille
  • Clément Gillard
  • Lucien Boillod
  • Raoul Billion
  • Sébastien Piat
  • Thibaud Michaud

People who have influenced this release:

  • Alexandre Duret-Lutz
  • Jacques Sakarovitch
  • Luca Saiu
  • Sylvain Lombardy


Command line executables

The shell tools (formerly known as TAF-Kit) such as vcsn standard, vcsn determinize, etc. have finally been documented! Several issues have been fixed too.


expression widget

You can now use %expression in the notebook to interactively edit an expression and its context while seeing the automaton it builds. You are also able to chose the identities you want to use for the expression, and the automaton generating algorithm used to render the automaton.


proper: more consistent signatures

The Python binding of automaton.proper was different from the static and dyn:: versions for no sound reason: instead of a direction argument taking backward or forward, it had a backward argument taking an Boolean, and the prune and direction (well, backward) arguments were swapped.

The signature in Python is now consistent: aut.proper(direction="backward", prune=False, algo="auto", lazy=False).



The following operations have been renamed, in all APIs (vcsn::, dyn::, Python, and TAF-Kit), and on all applicable types (automaton, expansion, expression, polynomial, weight).

infiltration -> infiltrate
ldiv         -> ldivide
left_mult    -> lweight
rdiv         -> rdivide
right_mult   -> rweight
sum          -> add


random_automaton: new name for random

Random generation of automata is now named random_automaton.


polynomials: more basic operations

Polynomials now support (in the three layers) the addition, multiplication, exterior products, and conjunction.

In [4]: p = vcsn.context('law, q').polynomial

In [5]: p0 = p('<2>a + <3>b + <4>c'); p1 = p('<5>a + <6>b + <7>d')

In [6]: p0 + p1
Out[6]: <7>a + <9>b + <4>c + <7>d

In [7]: p0 * p1
Out[7]: <10>aa + <12>ab + <14>ad + <15>ba + <18>bb + <21>bd + <20>ca + <24>cb + <28>cd

In [8]: p0 * 2
Out[8]: <4>a + <6>b + <8>c

In [9]: 2 * p0
Out[9]: <4>a + <6>b + <8>c

In [10]: p0 & p1
Out[10]: <10>a + <18>b


tuple: applies to more types

The Cartesian product, dubbed "tuple" in Vcsn, was already available on expansions and expressions in the three layers (static, dyn, Python as |). It is now also available on automata and on polynomials.

In [2]: exp = vcsn.context('lan, q').expression

In [3]: a = exp('(<2>a)*').automaton()
In [4]: b = exp('x+y').automaton()

In [5]: (a|b).shortest(8)
Out[5]: \e|x + \e|y + <2>a|x + <2>a|y + <4>aa|x + <4>aa|y + <8>aaa|x + <8>aaa|y

In [6]: a.shortest(4) | b.shortest(2)
Out[6]: \e|x + \e|y + <2>a|x + <2>a|y + <4>aa|x + <4>aa|y + <8>aaa|x + <8>aaa|y


quotkbaseb: new algorithm

From a context, a divisor k, and a base b, gives a transducer that, when given a number in b divisible by k, outputs the quotient of the division of that number by k in b.

In [2]: c = vcsn.context('lat<lal_char(0-9), lal_char(0-9)>, b')

In [3]: c.quotkbaseb(3, 2).shortest(10)
Out[3]: \e|\e + 0|0 + 00|00 + 11|01 + 000|000 + 011|001 + 110|010 + 0000|0000 + 0011|0001 + 0110|0010

In [4]: c.quotkbaseb(7, 10).shortest(10)
Out[4]: \e|\e + 0|0 + 7|1 + 00|00 + 07|01 + 14|02 + 21|03 + 28|04 + 35|05 + 42|06


expansions: support for &

In addition to automata, expressions and polynomials, the conjunction can now be performed on expansions.


sum: optimized version for deterministic automata

The sum of deterministic Boolean automata is now based on a synchronized product-like approach.

dyn: vast overhaul and factoring

The implementation of dyn:: values was revised and factored. No visible change.



One may now generate random expressions.

In [2]: rand = vcsn.context('lal(abc), b').random_expression

In [3]: rand('+,.', length=10)
Out[3]: c(b+c+ca)c

In [4]: rand('+,.', length=10)
Out[4]: c+abcac

In [5]: rand('+=1, .=1, *=.2', length=10)
Out[5]: (b(a+b+c))*

In [6]: rand('+=1, .=1, *=.2', length=10)
Out[6]: bb*ca

In [7]: rand('+=1, .=1, *=.2, &=1', length=10)
Out[7]: c

In [8]: rand('+=1, .=1, *=.2, &=1', length=10)
Out[8]: a*cb*

In [9]: rand('+=1, .=1, *=.2, &=1', length=10)
Out[9]: a((a+a*)&b*)*


project: support for expressions and expansions

One may now project (multitape) automata, contexts, expansions, expressions, labels and polynomials.


expression: a dot output

Expressions now feature a "dot" format to display graphically the structure of the expression. There are actually two flavors: "dot,logical" (the default) which shows the semantic tree, and "dot,physical" which shows the DAG that is used to implement the expression (i.e., nodes used multiple times are displayed only once).

Under IPython, experiment exp.SVG() (logical) and exp.SVG(True) (physical).

Python: format

The Python objects now support the format protocol. For instance:

In [2]: c = vcsn.context('lal(abc), q')

In [3]: for i in range(4):
            e = c.expression('[ab]*a[ab]{{{i}}}'.format(i=i))
            print('{i:3d} | {e:t:20} | {e:u:20} | {n:3d}'
                  .format(i=i, e=e,
  0 | (a+b)*a              | (a+b)*a              |   2
  1 | (a+b)*a(a+b)         | (a+b)*a(a+b)         |   4
  2 | (a+b)*a(a+b){2}      | (a+b)*a(a+b)²        |   8
  3 | (a+b)*a(a+b){3}      | (a+b)*a(a+b)³        |  16


ldiv, rdiv: compute quotient between automata

automaton.ldiv and automaton.rdiv compute the left and right quotients of two automata.

In [1]: import vcsn
        ctx = vcsn.context('lal_char, b')
        aut = lambda e: ctx.expression(e).automaton()

In [2]: aut('ab').ldiv(aut('abcd')).expression()
Out[2]: cd

In [3]: (aut('abcd') / (aut('cd')).expression()
Out[3]: ab

Vcsn 2.2 (2016-02-19)

We are very happy to announce the release of Vcsn 2.2! This version, code-named "the lazy release", concludes the work from Antoine and Valentin, who left EPITA for their final internship.

In addition to the usual load of improvements (more doc and less bugs), this version features some noteworthy changes:

  • several algorithms now offer a lazy variant: compose, conjunction, derived_term, determinize, insplit, and proper. Instead of completing the construction on invocation, the result is built incrementally, on demand, e.g., when requested by an evaluation.

This is especially useful for large computations a fraction of which is actually needed (e.g., composition of two large automata and then with a small one), or for computations that would not terminate (e.g., determinization of some weighted automata).

  • the functions automaton.lightest and automaton.lightest_automaton explore the computations (i.e., paths of accepted words) with the smallest weights (dubbed "shortest paths" for tropical-min semirings). They feature several implementations controlled via the algo argument.
  • rational expressions now support UTF-8 operators in input and output. They also learned a few tricks to be better looking (e.g., aaa => ).
  • several new algorithms or improvements or generalizations of existing ones.
  • a number of performance improvements.

Please, see the details below.

People who worked on this release:

  • Akim Demaille
  • Antoine Pietri
  • Lucien Boillod
  • Nicolas Barray
  • Raoul Billion
  • Sébastien Piat
  • Thibaud Michaud
  • Valentin Tolmer

People who have influenced this release:

  • Alexandre Duret-Lutz
  • Jacques Sakarovitch
  • Luca Saiu
  • Sylvain Lombardy


operations on automata: deterministic versions

In addition to the "general", "standard" and "auto" variants, multiply, sum and star now support a "deterministic" variant, which guarantees a deterministic result.

Beware that with infinite semirings, some (deterministic) operations might not terminate.


conjugate: new algorithm

The automaton.conjugate function builds an automaton whose language is the set of conjugates (i.e., "rotations", or "circular permutations") of the words accepted by the input automaton.

In [3]: vcsn.context('lan_char, b')         \
            .expression('(ab)*')            \
            .automaton()                    \
            .conjugate()                    \
Out[3]: (ab)*+(ab)*{2}+(ba)*ba(ba)*

demangle: add a gdb pretty-printer for Vcsn types

vcsn gdb runs gdb with a pretty-printer, which will automatically Vcsn's demangle() function when a variable with a vcsn type is printed. Symbols are then much easier to read.

insplit: support for on-the-fly construction

The implementation of insplit now supports a lazy argument. This is especially useful when composing a small transducer with a big one, to avoid computing the insplit of the right transducer for many states, that are not going to be useful.

It is used in the composition algorithm for this very reason.

compose: support for on-the-fly construction

The implementation of compose now supports a lazy argument. This is especially useful when composing two big transducers, and then compose with several small ones. The result of the first composition, although huge, will never be completely computed, but the required states are computed and cached.


determinize: support for on-the-fly construction

The implementation of determinize now supports a lazy argument. This is especially useful when working on automata that do not admit a (finite) deterministic automaton.

For instance:

In [2]: a = vcsn.Z.expression('a*+(<2>a)*').automaton().determinize(lazy=True)

In [3]: print(a.as_boxart())
──> │ 0 │

In [4]: a('')
Out[4]: 2

In [5]: print(a.as_boxart())
    ╭───╮  a   ╭─────────╮
──> │ 0 │ ───> │ 1, ⟨2⟩2 │
    ╰───╯      ╰─────────╯
      │ ⟨2⟩

In [6]: a('aa')
Out[6]: 5

In [7]: print(a.as_boxart())
    ╭───╮  a   ╭─────────╮  a   ╭─────────╮  a   ╭─────────╮
──> │ 0 │ ───> │ 1, ⟨2⟩2 │ ───> │ 1, ⟨4⟩2 │ ───> │ 1, ⟨8⟩2 │
    ╰───╯      ╰─────────╯      ╰─────────╯      ╰─────────╯
      │            │                │
      │ ⟨2⟩        │ ⟨3⟩            │ ⟨5⟩
      ∨            ∨                ∨


lightest: algorithms for "shorter paths"

The automaton.lightest looks for words (possibly several) whose evaluation is the smallest one in the automaton. In the case of ℕmin and other tropical semiring, this is often referred to as "shortest paths", but it applies to other semirings as well. It features the same interface as automaton.shortest (which looks for shortest accepted words), but offers several variants, such as "dijkstra", "bellman-ford", "auto"...

The automaton.lightest_automaton algorithm returns a slice of the automaton corresponding to the evaluation with the small weight. It also offers several variants.


derived-term: support for on-the-fly construction

The implementation of derived_term now supports a lazy argument. This is especially useful when working on "infinite derived-term automata" (which happens when requesting a deterministic automaton, or when using the complement operator).

For instance:

In [2]: a = vcsn.Z.expression('a*+(<2>a)*').derived_term(deterministic=True, lazy=True)

In [3]: print(a.as_boxart())
 ──> │ a*+(⟨2⟩a)* │

In [4]: a('')
Out[4]: 2

In [5]: print(a.as_boxart())
    ╭────────────╮  a   ╭───────────────╮
──> │ a*+(⟨2⟩a)* │ ───> │ a*+⟨2⟩(⟨2⟩a)* │
    ╰────────────╯      ╰───────────────╯
      │ ⟨2⟩

In [6]: a('aa')
Out[6]: 5

In [7]: print(a.as_boxart())
    ╭────────────╮  a   ╭───────────────╮  a   ╭───────────────╮  a   ╭───────────────╮
──> │ a*+(⟨2⟩a)* │ ───> │ a*+⟨2⟩(⟨2⟩a)* │ ───> │ a*+⟨4⟩(⟨2⟩a)* │ ───> │ a*+⟨8⟩(⟨2⟩a)* │
    ╰────────────╯      ╰───────────────╯      ╰───────────────╯      ╰───────────────╯
      │                      │                      │
      │ ⟨2⟩                  │ ⟨3⟩                  │ ⟨5⟩
      ∨                      ∨                      ∨


project is available on more structures

One may now project not only multitape automata, but also contexts, labels and polynomials.


has-lightening-cycle: new name for has-negative-cycle

In static, dyn:: and Python, has-negative-cycle is renamed has-lightening-cycle. It makes more sense as we consider (<1/2>a)* to be a lightening cycle.


Operators on expressions and expansions

The dyn::complement function is now available on expansions, in addition to expressions and automata. It is bound to the prefix ~ operator in Python. The dyn::tuple function is available for expressions and expansions. It is bound to | in Python.


Exponents in expressions

In addition to "ASCII exponents" in input (e.g., (ab){4}), we now support them in output, possibly in UTF-8:

In [7]: vcsn.B.expression('aa{2}a²')
Out[7]: a{5}
In [8]: print(vcsn.B.expression('aa{2}a²').format('utf8'))

This is especially nice in derived-term automata.

Tag based dispatch in vcsn::

The treatments for which several algorithms exist (e.g., minimize/cominimize ---hopcropft, moore, weighted, signature, brzozowski, auto---, determinize/codeterminize ---boolean, weighted, auto---, sum/multiply/star ---standard, general, auto---, etc.) now offer a cleaner tag-based interface in vcsn::, the static library. For instance, instead of:

auto m = minimize_moore(a);


auto m = minimize(a, moore_tag{});

dot format

The HTML/XML style strings are now properly supported.

oneset and proper

The oneset labelset has a single label: one. It is used to denote automata with spontaneous transitions only. Applying proper on such automata resulted in an ill-formed automaton (with a single subliminal transition from pre to post). This is now fixed.

Flex 2.6

We are now compatible with the new Flex, which made backward incompatible changes in its API. Previous versions are supported too.

Text format

When displaying value sets, the text format was improved, and the new format utf8 improves on top of it. An new format name sname, replaces previous uses of text:

In [2]: c = vcsn.context('lal_char(abc), expressionset<law_char(xyz), q>')

In [3]: c.format('sname')
Out[3]: 'letterset<char_letters(abc)>, expressionset<wordset<char_letters(xyz)>, q>'

In [4]: c.format('text')
Out[4]: '{abc} -> RatE[{xyz}* -> Q]'

In [5]: c.format('utf8')
Out[5]: '{abc} → RatE[{xyz}* → ℚ]'

When displaying values, the new format utf8 improves the result.

In [6]: e = c.expression('!(<x>a)*')

In [7]: e.format('text')
Out[8]: '(<x>a)*{c}'

In [9]: e.format('utf8')
Out[9]: '(⟨x⟩a)*ᶜ'

In [10]: e.expansion().format('text')
Out[10]: 'a.[(<x>a)*{c}] + b.[\\z{c}] + c.[\\z{c}]'

In [11]: e.expansion().format('utf8')
Out[11]: 'a⊙[(⟨x⟩a)*ᶜ] ⊕ b⊙[∅ᶜ] ⊕ c⊙[∅ᶜ]'


polynomial: conjunction

Conjunction of polynomials is now available in dyn. For polynomials of expressionsets, compute the conjunction of the expressions. Otherwise, keep the common labels and multiply their weights.


Syntactic sugar in expressions

One may now use UTF-8 when entering expressions. The negation may also be denoted by a prefix !, which binds weakly (less than concatenation).

|Sugar     |Plain ASCII          |
|`∅`       |`\z`                 |
|`ε`       |`\e`                 |
|`⟨2⟩a`    |`<2>a`               |
|`a∗`      |`a*`                 |
|`!a`      |`a{c}`               |
|`¬a`      |                     |
|`aᶜ`      |                     |
|`aᵗ`      |`a{T}`               |


weight_series: fix general case

This algorithm can now be applied to any automaton. In case of ℕmin weightset the implementation does not change (shortest path). Otherwise, the result is computed by applying proper on the labels_are_one version of the automaton.

New trivial identities

Two new trivial identities have been added (at the "trivial" level): neutrality of the universal language for conjunction ( & E => E, E & => E), and involutivity of complement on 𝔹 and 𝔽₂ (E{c}{c} => E). It is not applied in the other case, since in ℤ (<2>a)*{c}{c} is actually a*, not (<2>a)*.


left-mult and right-mult

The scalar product now also accept an argument to select the exact algorithm: "standard", "general", "auto". See 2015-11-13 for more details.


to-automaton: trim automata

Now to_automaton (or expression.automaton under Python) always produces a trim automaton.


Improved display of letter classes

So far disjunction of letters were displayed as classes only if all the letters had the same weight. So <1>a + <2>[^a] in an automaton resulted in an explicit list of letters. This is especially inappropriate for Levenshtein automata. We now display <1>a + <2>[^a].


weight_series: new algorithm

Compute the sum of all the weights of the accepted words (the sum of the image of the behavior of the automaton). This algorithm can be applied to any Nmin automaton, but requires acyclic automata for other weightsets.


trie and cotrie are enhanced

Both context.trie and context.cotrie accept data, format and filename arguments. In addition to filename, one can now pass directly a list of words as a data string. Use format to specify whether the lines are monomials or plain words --- for instance whether <2>a is the word a with weight 2, or a four-letter word.

Minimize: new algorithm: hopcroft

There is now a new implementation of the minimization, based on the algorithm of Hopcroft. It can only be used on Boolean automata with a free labelset. On many examples, this algorithm shows better performances than "signature" but is still less efficient than "moore".


Automata: operations now works on all sorts of automata

Usual operations (sum, multiply --which includes the case for repeated multiplication--, star) used to apply only to standard automata. In addition to sum (for standard automata), there was union, which applied to any automata, but nothing for multiplication and star.

Now sum, star and multiply accept an algo argument:

requires standard input automata, builds a standard automaton
applies to any kind of automaton, does not guarantee a standard automaton. In the static API, might require a nullable labelset, in the dyn:: API and Python, might turn the labelset into a nullable one.
"auto" (default)
same as "standard" if input automata are standard, otherwise same as "general".

In Python, the operators +, * and ** use the "auto" strategy.

The union algorithms are removed, and in Python | no longer denotes it. The left and right multiplication by a scalar were already implemented to adapt standard or non standard automata.


Expansions: more operations

Usual operations (addition, multiplication, multiplication by a scalar) are now available on expansions.


has_negative_cycles: new algorithm

Use automaton.has_negative_cycle to check whether an automaton has cycles with negative weights.

Minor bug fixes and improvements

  • Spaces are now ignored in context names (e.g., lal < char (ab) >, b).
  • Expressions that have a single tape but several were expected are now properly rejected.
  • Tuples now display parentheses only when needed. For instance with weights in ℤ ⨉ ℤ, instead of <(1, 2)>a, we display <1, 2>a.

Vcsn 2.1 (2015-10-11)

About 10,000 hours (on the calendar, not of work!) after its first public release, the Vcsn team is very happy to announce the release of Vcsn 2.1!

It is quite hard to cherry-pick a few new features that have been added in Vcsn 2.1, as shown by the 4k+ lines of messages below since 2.0. However, here are a few headlines:

  • Many pages of documentation and examples have been written (see
  • Now provides a live demo.
  • Transducers are much better supported, with improved syntax and several algorithms (e.g., letterize, synchronize, partial_identity, is_functional, etc.)
  • Expressions now offer several sets of identities specifying how they should be normalized. More generally, input/output of expressions have been improved to match most users' expectations. New operators are accepted: & for conjunction, : for shuffle, &: for infiltration, {c} (postfix) for complement, and <+ for deterministic choice.
  • When entering an automaton (e.g., with %%automaton in IPython) user state names are preserved.
  • Of course, many bugs were fixed, many algorithms were sped up, and internal details have been cleaned up.
  • As Easter eggs, many features have also been added, but not advertised, until we are sure of how we want them to look like.

People who worked on this release:

  • Akim Demaille
  • Antoine Pietri
  • Canh Luu
  • Clément Démoulins
  • Lucien Boillod
  • Nicolas Barray
  • Sébastien Piat
  • Sylvain Lombardy
  • Valentin Tolmer
  • Yann Bourgeois--Copigny

People who have influenced this release:

  • Alexandre Duret-Lutz
  • Jacques Sakarovitch
  • Luca Saiu

Vcsn 2's repository has moved

To update your existing repository, run a command similar to:

$ git remote set-url origin


shortest: applies to any automaton

The restriction to free-labelsets is lifted: shortest applies to automata labeled with words, to transducers, etc.


project: new algorithm

Select a single tape from a multitape automaton (transducer).


expressions: spaces are now ignored

In an attempt to look like regexps, spaces were considered so far as characters.


derived-term: deterministic automata

One can now require the construction of a deterministic automaton; there are two new algorithms: expansion,deterministic and derivation,deterministic.


Named automata

It is (finally) possible to keep user state names thanks to name_automaton<Aut>. Now dyn::read_automaton and (in Python) vcsn.automaton accept an additional strip argument (defaulting to true): if set, the user names will be stripped once the automaton is loaded. This applies to all the supported formats: Daut, Dot, Efsm, and FAdo.

However using %%automaton names are kept by default. Pass -s/--strip to strip it.


Build fixes

Fixed (again) nasty shared-library issues at runtime on Ubuntu due to their use of -Wl,--as-needed. This affected Python only.

2015-08-31 more information

Now atom (the number of occurrences of letters) is also reported as width, and the depth (also known as the height: the height of the tree) is now included.


vcsn.automaton can guess the format

The vcsn.automaton function now accepts "auto" as format, which means "try to guess from the content".


levenshtein: new algorithm

The levenshtein algorithm allows to build a transducer encoding the Levenshtein distance between two alphabets. This transducer can then be composed with two languages to obtain the edit-distance algorithm, containing much information on the distance of the languages and their words.

partial_identity: new algorithm

The partial identity turns an automaton into a transducer realizing a partial identity. I.e., for each accepted input, the output will be the same as the input (plus the origin weight from the input automaton).


to_automaton: more algorithms

One can chose between both implementation of derived_term (stripped), using "derivation" or "expansion".


to_expression: more heuristics

There are three new heuristics:

select a state whose removal would contribute a small expression (number of symbols, including +, etc.).
likewise, but count only the number of labels in the expression.
run all the heuristics, and return the shortest result.

The default algorithm remains "auto", which now denotes "best".


to_automaton: conversion from expression to automaton

Vcsn offers several means to build an automaton from an expression: thompson, zpc, standard and derived-term. The latter even builds a decorated automaton, which, often, is not desired.

In C++ dyn::to_automaton and in Python expression.automaton() allow to build a simple automaton from an expression. It accepts an optional string argument to select the conversion algorithm. It defaults to "auto", which currently means the stripped derived-automaton, but eventually, it will pick either "standard" for basic expressions (as it's the fastest algorithm), or "derived-term" for extended expressions (as "standard" does not support these additional operators).


I/O in EFSM format are safer

We now correctly ensure the correspondence between our weightsets and OpenFST's arc-type, and input and output.

As a consequence, we no longer support exchange of "traditional numerical" weightsets (such as "Z", "R", etc.), since, as far as we know, they don't feature a vis-à-vis in OpenFST. It used to be accepted, but was meaningless.

Boolean weights are mapped to "standard", OpenFST's tropical semiring.

When reading EFSM files, we still try to use the smallest corresponding weightset. For instance, if the arc type is "standard" and and the weights are integral we use "zmin", otherwise, we use "rmin". Errors on reading the weight of final states have also been fixed.


automaton.expression, automaton.lift

They now accept an optional argument to specify the desired identities for the expressions.


power, chain have been renamed

Before, "chain" denoted the repeated concatenation for automata and expressions, and "power" meant repeated conjunction for automata and expressions. However, while aut ** 2 meant aut & aut (conjunction), exp ** 2 meant exp * exp (concatenation).

Clearly, from the Python point of view, ** is repeated *, which is what most people would understand as "power".

So, to enforce consistency, and to avoid bad surprises, these functions were renamed.

|Python    |dyn::                |Comment                             |
|`a * b`   |`multiply(a, b)`     |Multiplication for automata and     |
|          |                     |expressions (concatenation),        |
|          |                     |polynomials, weights, labels etc.   |
|`a ** n`  |`multiply(a, n)`     |Repeated multiplication.            |
|`a & b`   |`conjunction(a, b)`  |Conjunction for automata and        |
|          |                     |expressions.                        |
|`a & n`   |`conjunction(a, n)`  |Repeated conjunction.               |


eliminate_state: fixes

Sometimes there was a mismatch between the state numbers as displayed, and as expected by eliminate_state. This is fixed.

The rendering was nonsensical when all the states were removed. This is now fixed.

Finally, passing -1 as argument, or no argument at all, delegates the choice of the state to eliminate to a heuristic.


expressions: new identities sets

Rational expressions in Vcsn are "normalized" according to a set of identities (such that <0>E => \z whatever the expression E is).

Vcsn now supports four different sets of identities:

a minimum set of transformations are applied.
sum and product are made associative, so a+(b+c) and (a+b)+c are equal.
sum is made commutative, and weights are factored, so a+b+a is equal to a+b in B, and to <2>a+b in Z.
"distributive" (or "series")
product and exterior/scalar products are distributed over sum, so [ab]a is equal to aa+ba, and <2>[ab] is equal to <2>a+<2>b.

Previously the default identities were "associative". They are now "linear", to match most users' expectations.

So, for instance we used to report:

In [2]: c = vcsn.context('lal_char(a-z), z')

In [3]: c.expression('r+[a-q]')
Out[3]: r + [a-q]

In [4]: c.expression('[a-q]+r+r')
Out[4]: [^s-z] + r

we now report:

In [3]: c.expression('r+[a-q]')
Out[3]: [^s-z]

In [4]: c.expression('[a-q]+r+r')
Out[4]: [a-q] + <2>r


nmin: new weightset

The new tropical semiring ⟨ℕ, +, min⟩ has been introduced. Compared to zmin, some optimizations can be done, for example in evaluation or in node distance where the absence of negative weights allows to trim some branches.


cotrie: new algorithm

In additional to trie, which builds a deterministic tree-like automaton (single initial state, multiple final states), vcsn now supports cotrie which builds a "reversed trie": a codeterministic reversed tree-like automaton (single final state, multiple initial states).

The main feature of cotrie is that its result is codeterministic, so it only takes a determinization to minimize it. It turns out that in Vcsn determinization is more efficient than minimization:

In [13]: %timeit c.trie('/usr/share/dict/words').minimize()
         1 loops, best of 3: 18.8 s per loop

In [14]: %timeit c.cotrie('/usr/share/dict/words').determinize()
         1 loops, best of 3: 7.54 s per loop

These automata are isomorphic.


expressions: output may use label classes

Rational expressions have long supported label classes in input, e.g., [abc0-9]. Polynomials also support them in output. However expressions never used classes, which may seriously hinder their readability. For instance, to compute an expression describe all words on {a,..., z} except 'hehe', we had:

In [2]: c = vcsn.context('lal_char(a-z), b')
Out[2]: \e+a(\e+b(\e+c))+(b+c+d+e+f+g+h+i+j+k+l+m+n+o+p+q+r+s+t+u+v+w+x+y+z+a(a+c+d+e+f+g+h+i+j+k+l+m+n+o+p+q+r+s+t+u+v+w+x+y+z+b(a+b+d+e+f+g+h+i+j+k+l+m+n+o+p+q+r+s+t+u+v+w+x+y+z+c(a+b+c+e+f+g+h+i+j+k+l+m+n+o+p+q+r+s+t+u+v+w+x+y+z+d(a+b+c+d+e+f+g+h+i+j+k+l+m+n+o+p+q+r+s+t+u+v+w+x+y+z)))))(a+b+c+d+e+f+g+h+i+j+k+l+m+n+o+p+q+r+s+t+u+v+w+x+y+z)*

Now expressions also support letter classes in output:

Out[2]: \e+a(\e+b(\e+c))+([^a]+a([^b]+b([^c]+c([^d]+d[^]))))[^]*

Classes are used only on ranges of at least four (unweighted) letters in strictly increasing order:

In [3]: c.expression('a+a+b+c+d+e+f+x+y+z+w')
Out[3]: a+[a-fx-z]+w

Negated classes are issued only if the letters are in strictly increasing alphabetical order, and are more than two thirds of the whole alphabet (otherwise a regular class is preferred).

In [4]: c.expression('[a-q]')
Out[4]: [a-q]

In [5]: c.expression('[a-q]+r')
Out[5]: [^s-z]

In [6]: c.expression('r+[a-q]')
Out[6]: r + [a-q]

In [7]: c.expression('[a-q]+r+r')
Out[7]: [^s-z] + r


multiply: new name for 'concatenate'

In static, dyn:: and Python, concatenate is renamed multiply.

The "concatenation" of automata or rational expressions would never be called the concatenation in the case of series or polynomials etc. If we were to support automata weighted by automata, the weightset product would be this multiply.


trie: read directly from a file

Building a polynomial from a dictionary stored on disk, and then building the trie is a waste of time. It is now possible to build it directly from a file.

In [2]: t = vcsn.B.trie('/usr/share/dict/words')

In [3]:
Out[3]: {'is codeterministic': False,
         'is complete': True,
         'is deterministic': True,
         'is empty': False,
         'is eps-acyclic': True,
         'is normalized': False,
         'is proper': True,
         'is standard': True,
         'is trim': True,
         'is useless': False,
         'is valid': True,
         'number of accessible states': 792777,
         'number of coaccessible states': 792777,
         'number of codeterministic states': 792777,
         'number of deterministic states': 792777,
         'number of eps transitions': 0,
         'number of final states': 235886,
         'number of initial states': 1,
         'number of states': 792777,
         'number of transitions': 792776,
         'number of useful states': 792777,
         'type': 'mutable_automaton<letterset<char_letters()>, b>'}


trie: new algorithm

From a finite language/series represented as a polynomial (of words), build an automaton that has the shape of a trie (a prefix tree).

In [2]: series = '<2>\e+<3>a+<4>b+<5>abc+<6>abcd+<7>abdc'

In [3]: p = vcsn.context('law_char, z').polynomial(series); p
Out[3]: <2>\e + <3>a + <4>b + <5>abc + <6>abcd + <7>abdc

In [4]: a = p.trie(); a
Out[4]: mutable_automaton<letterset<char_letters(abcd)>, z>

In [5]: a.shortest(100)
Out[5]: <2>\e + <3>a + <4>b + <5>abc + <6>abcd + <7>abdc


delay_automaton: new automaton type

This algorithm transforms the automaton into an equivalent one, but where each state has been split depending on the delay between the tape, i.e. the difference of input length for each of the tapes.

is-synchronized: new algorithm

This algorithm checks whether a transducer is synchronized, i.e. that the input is read on every tape at the same rate for as long as possible.

synchronize: new algorithm

This new algorithm allows to synchronize the tapes of a k-tape transducer, i.e. "push" the letters towards the beginning of the transducer so that the input is read along every tape at the same rate for as long as possible.

has-bounded-lag: new algorithm

This algorithm checks whether a k-tape transducer has a bounded lag, i.e. if there exists a constant D such that the length of the input for each tape differs by at most D; or the output of the evaluation of a word through the transducer differs in length by at most D.


shortest now subsumes enumerate

One can now specify both a maximum number of words, and a maximum size. It is also significantly faster. The Python binding automaton.enumerate was removed.

In [4]: a = vcsn.Z.expression('[01]*1(<2>[01])*').standard()

In [5]: a.shortest()
Out[5]: 1

In [6]: a.shortest(3)
Out[6]: 1 + 01 + <2>10

In [7]: a.shortest(len = 3)
Out[7]: 1 + 01 + <2>10 + <3>11 + 001 + <2>010 + <3>011 + <4>100 + <5>101 + <6>110 + <7>111

In [8]: a.shortest(len = 3, num = 5)
Out[8]: 1 + 01 + <2>10 + <3>11 + 001

In [9]: a.shortest(len = 30, num = 5)
Out[9]: 1 + 01 + <2>10 + <3>11 + 001


product is renamed conjunction

After many hesitations, we finally decided to rename the synchronized product from product to conjunction. There are several reasons in favor of this change.

First product is not adequate:

  • we have several products: the Cauchy product (concatenation), the Hadamard product (synchronized product), the shuffle product, the infiltration product, and there are certainly more.
  • product is already used in the case of words and rational expressions to denote the concatenation, so, to be consistent, product will denote the concatenation of automata too.
  • * is naturally associated to concatenation for words and rational expressions, it was associated to product for automata. It is more consistent to use concatenate in each case.

Second conjunction is a reasonable choice:

  • intersection is acceptable for Boolean automata, but it is not satisfying in the case of weighted automata.
  • synchronized_product is too long, likewise for hadamard_product, etc.
  • hadamard is a proper name, which we try to avoid.
  • the etymology of conjunction is a perfect match with the semantics of the operation.
  • computer scientists are used to the correspondence between conjunction and &.

The preference for verbs as algorithm names leads naturally to conjoin. However conjunction still seems more natural.


are-equivalent: now uses realtime

One may, finally, compare LAL, LAN, and LAW automata.

join: improvements

The "join" of tuplesets now works properly, so one may, for instance, add automata on A? x B and A x B? to get an automata on A? x B?.

In order to facilitate the experiments with join, it is now provided as an algorithm. In Python, it is also available as the infix or operator.

In [2]: c1 = vcsn.context('lat<lal_char(a), lan_char(x)>, z'); c1
Out[2]: {a} × ({x})? → ℤ

In [3]: c2 = vcsn.context('lat<lan_char(b), lal_char(y)>, q'); c2
Out[3]: ({b})? × {y} → ℚ

In [4]: c1 | c2
Out[4]: ({a,b})? × ({x,y})? → ℚ


infiltration: fix long standing bugs

Optimizations in the computation of the shuffle-product broke the infiltration-product. This is fixed. However, the performances of infiltration are then degraded (about 2.5x). Conjunction (synchronized product) and shuffle are unaffected.

Besides, the support for variadic infiltration products was naive, and produced incorrect results. This is fixed by simply repeating the binary infiltration.


is_functional: new algorithm

Whether a transducer is functional, i.e., whether each input words maps to (at most) a single word.

Actually, was implemented in September 2014, but was not registered here.


Python 3 is now required

We have tried hard to remain compatible with Python 2, but support for Unicode is just too hard to get to work properly with both Python 2 and Python 3. Since there were constantly problems arising in one whose fixes break the other, we decided to drop support for Python 2. Given that all major platforms ship Python 3, we don't expect this to be a real problem.


Repeated minimization

Now, minimization and cominimization return automata of the same type --- cominimization used to return an automaton whose type reveals the double transposition. In actually, calling several times the minimization and/or cominimization no longer generates decorators of decorators of etc.: the result is always a single layer decorator. Not only is this, in general, the desired result, it's also more 'economic' as it uses fewer automaton types (hence less runtime compilations).


realtime: new algorithm

Compute the letterized, proper equivalent of an automaton. The result automaton will have only letter transitions (no words, no spontaneous transitions). It comes with the is_realtime algorithm, to check if an automaton is realtime or not.

is_letterized: new algorithm

This algorithm checks whether an automaton is letterized, i.e. whether each transition's label is a single letter (in the sense of the labelset).

In [2]: ctx = vcsn.context("law_char, b")

In [3]: ctx.expression("abc").standard().is_letterized()
Out[3]: False

In [4]: ctx.expression("a*(b+c)").standard().is_letterized()
Out[4]: True


zpc: new algorithm

This algorithm generates the ZPC automaton from an expression. It has a single initial state (whose final weight is the constant term of the expression), a unique distinct final state (which cannot be reached from the initial state via spontaneous transitions), and no cycles of spontaneous transitions.

The expression.zpc function features an optional argument, which, when set to "compact", enables a variant, more compact, construction.

In [2]: vcsn.b.expression('ab').zpc()
    ╭───╮  \e   ╭───╮  a   ╭───╮  \e   ╭───╮  b   ╭───╮  \e   ╭───╮
──> │ 0 │ ────> │ 1 │ ───> │ 2 │ ────> │ 3 │ ───> │ 4 │ ────> │ 5 │ ──>
    ╰───╯       ╰───╯      ╰───╯       ╰───╯      ╰───╯       ╰───╯

In [3]: vcsn.b.expression('ab').zpc('compact')
    ╭───╮  a   ╭───╮  \e   ╭───╮  b   ╭───╮
──> │ 0 │ ───> │ 1 │ ────> │ 2 │ ───> │ 3 │ ──>
    ╰───╯      ╰───╯       ╰───╯      ╰───╯


letterize: new algorithm

Create the equivalent automaton, but with only single-letter transitions. Basically, do the conversion from law to lan. It also works recursively with multitape transducers.

In [2]: c = vcsn.context('lat<law_char, lal_char>, z')

In [3]: a = c.expression("<2>'(abc,x)'").derived_term()

In [4]: print(a.format('daut'))
context = "lat<wordset<char_letters(abc)>, letterset<char_letters(x)>>, z"
$ -> 0
0 -> 1 <2>(abc,x)
1 -> $

In [5]: print(a.letterize().format('daut'))
context = "lat<nullableset<letterset<char_letters(abc)>>, nullableset<letterset<char_letters(x)>>>, z"
$ -> 0
0 -> 2 <2>(a,x)
1 -> $
2 -> 3 (b,\e)
3 -> 1 (c,\e)


Multitape expressions

When typing multitape transducers, it is now no longer necessary to explicit the parenthesis inside (multi-)letters.

In [1]: import vcsn

In [2]: c = vcsn.context('lat<lan_char, lan_char>, b')

In [3]: r = c.expression(r"'a,\e'+'(b,c)'+'d,f'")

has_bounded_lag: new algorithm

This algorithm checks if a transducer has a bounded lag, i.e. if there is a maximum difference of length between the input words and their corresponding outputs.

In [1]: import vcsn

In [2]: c = vcsn.context('lat<lan_char, lan_char>, b')

In [3]: c.expression(r"'a,\e'").standard().has_bounded_lag()
Out[3]: True

In [4]: c.expression(r"'a,\e'*").standard().has_bounded_lag()
Out[4]: False



The Flex program (a scanner generator) should no longer be required for builds from a tarball.

blind -> focus

The blind algorithm (and the blind_automaton structure) has been renamed focus (and focus_automaton) as it is much clearer on its purpose.


scc: new algorithm

Create an automaton whose states correspond with a strongly connected component of the input automaton.


ratexp -> expression

The name "ratexp" is an unattractive jargon, and since Vcsn will not feature other kinds of "expressions", it is hardly justified: it is the only abbreviation we use (automaton, context, expansion, polynomial, etc.).

So everywhere (static, dyn::, Python) "ratexp" was replaced by "expression" (including "make_ratexpset" -> "make_expressionset", etc.).


thompson: the labelset no longer needs to feature a "one"

The Thompson automata count many spontaneous transitions, so require a labelset which supports \e (e.g., lan, law). This is annoying, especially when teaching about Automata Theory: one does not want to dive into the arcane details of contexts.

So now, if the context of the input expression does not support \e, the Thompson automaton will be built with a generalized context which does support it:

In [1]: import vcsn

In [2]: vcsn.context('lan_char, b').ratexp('a').thompson().context()
Out[2]: lan<letterset<char_letters(a)>>, b

In [3]: vcsn.context('law_char, b').ratexp('a').thompson().context()
Out[3]: wordset<char_letters(a)>, b

In [4]: vcsn.context('lal_char, b').ratexp('a').thompson().context()
Out[4]: lan<letterset<char_letters(a)>>, b


filter: new algorithm

Hide states of another automaton, revealing only selected ones. This does not copy the original automaton.


New project name: Vcsn

The Vaucanson 2 project, which was funded by a (French) program (ANR), is now "closed". Of course, it will continue to exist, but now there will be two Vaucansa! One will be led by Sylvain Lombardy and Jacques Sakarovitch, and the other by Alexandre Duret-Lutz and Akim Demaille from EPITA.

This file is about the latter, named "Vcsn".


WARNING: resyntaxed contexts

The syntax for contexts has changed: the separator between the labelset and the weightset is now a comma (possibly with spaces) instead of an underscore.

So for instance:

 -> lal_char(abc), b

 -> lat<lal_char(ab), lal_char(xy)>, lat<q, r>

 -> law_char(a-z), ratexpset<law_char(A-Z), b>

This syntax is not (and has never been) the intended one, it should be considered as the "internal" syntax. However, since the intended syntax is still not implemented, one, unfortunately, still has to deal with this inner syntax.


proper: strip nullableset from labelsets

Our (current) implementation of proper is in-place, and as a consequence, the result features the same context as the input automaton, with a nullable labelset.

Now, proper copies the result in a context with a de-nullabled labelset. So for instance:

In [2]: vcsn.context('lan_char(ab)_b').ratexp('(ab)*').thompson().proper()
Out[2]: mutable_automaton<lal_char(ab)_b>

There is no measurable performance regression. However state numbers are now unrelated to the input automaton.


Python: loading automata from files

The vcsn.automaton constructor now support a filename named argument to load an automaton from a file.

vcsn.automaton(filename = 'a.gv')
vcsn.automaton(filename = 'a.efsm', format = 'efsm')

minimize: default algorithm is "auto"

Now minimize and cominimize both support the "auto" algorithm, which is the default value. It uses the "signature" algorithm for the Boolean automata, otherwise the "weighted" algorithm.

proper: forward closure available in dyn:: and Python

It is now possible to require a forward elimination of spontaneous transitions. Calling a.proper() is equivalent to calling a.proper(prune = True, backward = True).



The static and dyn:: algorithm that was named aut_to_exp is now named to_expression. In Python it is still automaton.ratexp().

costandard, is_costandard: new algorithms

Calling aut.costandard() is equivalent to calling aut.transpose().standard().transpose(), and aut.is_costandard() is the same with aut.transpose().is_standard().

normalize: new algorithm

Composes standard and costandard on an automaton.


Strip vs. transpose

Up to now, stripping a transposed automaton returned the non-transposed automaton, without even stripping it. This has finally been addressed, and a.determinize().transpose().strip() is now strictly equivalent to a.determinize().strip().transpose().

codeterminize, cominimize, is_codeterministic: new algorithms.

Calling aut.cominimize() is equivalent to calling aut.transpose().minimize().transpose(), and likewise for codeterminize.

Available in dyn and Python.


split: now works on polynomials

In addition to splitting (aka, breaking) a ratexp, it is now possible to split a polynomial. This, for instance, provides another way to compute the breaking derivation of a ratexp:

In [1]: r = vcsn.context('lal_char_z').ratexp('a(b+a)+a(a+b)')

In [2]: p = r.derivation('a'); p
Out[2]: a+b + b+a

In [3]: p.split()
Out[3]: <2>a + <2>b

In [4]: r.derivation('a', True)
Out[4]: <2>a + <2>b

derivation: fix a Python bug

ratexp.derivation features a breaking optional argument, which defaulted to True instead of False.

transpose: fix the number of final and initial states

Because num_initials and num_finals were not "transposed", on occasions automaton.is_standard could crash (when it thinks there is one initial state while there is none, but it still wants to check that its initial weight is one).


Automaton conjunction

On occasions, in Python, expressions such as (a & b)('a') failed to behave properly. It does now, and does correspond to the evaluation of the word "a" on the conjunction (synchronized product) of automata a and b.

TikZ: state names

Conversion of automata into the TikZ format now includes the state "names" (e.g., derived-term automata now show the states' rational expressions).


determinize: Empty-In, Empty-Out

Now when given an empty automaton (no states), determinize returns an empty automaton, whereas it used to return an automaton with one state, initial. This is more consistent (e.g., when given a deterministic automaton, determinize now consistently returns the accessible part of its input).


is_cycle_ambiguous: new algorithm

Whether an automaton is cycle-ambiguous (or "exponentially ambiguous").


Build fixes

Portability issues with Mac OS X's own version of Flex are fixed.

Fixed runtime compilations failing to find Boost headers in some cases.

Fixed nasty shared-library issues at runtime on Ubuntu due to their use of -Wl,--as-needed.



I/O with OpenFST now properly supports initial weights and multiple initial states.

Dot I/O of multiple-tape automata with initial weights is fixed.

Dot parse errors now provide locations.

The preconditions of reduce have been relaxed from labels-are-letters to labelset-is-free.


determinize provides a better support for Z

The normalization of states now uses the GCD of the weights, which means that now all the Z-automata that are determinizable can be determinized in Z, without having to convert them as Q-automata.

New weightset: qmp

Initial support for multiprecision rational numbers.

>>> a = vcsn.context('lal_char(abc)_qmp').ratexp('<2/3>a').standard() & 70
>>> a('a')

Requires the GMP library.


compatibility with Boost 1.56

Some changes in Boost require adjustments.


ambiguous_word: new algorithm

Returns an ambiguous word of an automaton, or raises if the automaton is unambiguous.


Several fixes

The reduce algorithm applied only to not-decorated automata. Support for F2 was broken.


Error messages about failed algorithm instantiations are much clearer: they display failed preconditions.

Blind is no longer limited to tape 0 or tape 1.

Vaucanson 2.0 (2014-07-25)

More than two years after the initial commit in our repository, the Vaucanson team is happy to announce the first release of Vaucanson 2!

People who worked on this release:

  • Akim Demaille
  • Alexandre Duret-Lutz
  • Alfred M. Szmidt
  • Antoine Pietri
  • Canh Luu
  • Jacques Sakarovitch
  • Luca Saiu
  • Sylvain Lombardy
  • Valentin Tolmer


"series" are now supported (static, dyn, Python)

Ratexpsets now take a constructor parameter specifying which set of identities to enforce, currently with two possible values: "trivial" and "series". Old-style ratexps only support trivial identities, while series also support sum commutativity and product distributivity over sum. Heterogeneous operations involving both trivial ratexps and series ratexps are possible.

In case of a ratexpset belonging to a context, the default value for the optional parameter is always trivial. The "seriesset" alias, which does not accept an identity parameter, is also supported.

"Series" is technically a misnomer: the data structure is a better approximation of the mathematical concept of series and in particular yields a semiring structure, but actual series equality remains undecidable in the general case.

Series currently depend on the commutativity of their weightset, and only support a subset of the available operations.

>>> vcsn.context('lal_char(a)_ratexpset<lal_char(x)_b>(series)') # series
>>> vcsn.context('lal_char(a)_seriesset<lal_char(x)_b>')         # series
>>> vcsn.context(ratexpset<lal_char(x)_b>_z')                    # trivial
>>> vcsn.context('lal_char(a)_ratexpset<lal_char(x)_b>(trivial)')# trivial
>>> e = vcsn.context('lal_char(a-c)_z').ratexp('b+a+b'); e
>>> s = vcsn.context('lal_char(a-c)_z').series('b+a+b'); s
>>> (e+s).is_series()

has_twins_property: new algorithm

Computes whether an automaton has the twins property.


determinize supports weighted automata

The determinize algorithm (dyn and Python) now accepts an optional 'algo' argument.

When 'algo' is:

the fast Boolean-only implementation is used. It always terminates.
any weightset is supported. On some inputs, it may not terminate.
"auto" (default value)
"boolean" if the automaton is Boolean, "weighted" otherwise.


are-equivalent is generalized

The 'are_equivalent' algorithm (a1.is_equivalent(a2) in Python) now supports weights in fields (e.g., Q or R), but also on Z. The labelset still must be free.


reduce: new algorithm on automata

Implements the Schützenberger algorithm for reduction of representations for any skew field. As a special case, automata with weights in Z are also supported.


determinization is more robust to large alphabets

In order to optimize (Boolean) determinization, 'determinize' no longer accepts an optional 'complete' argument to require a deterministic complete automaton. One must now call 'complete()' afterwards.

The speed improvements are (erebus: OS X i7 2.9GHz 8GB, Clang 3.5 -O3 -DNDEBUG):

  (1)   (2)   (3)
 7.93s 7.80s 7.43s: a.determinize()      # a = ladybird(21)
 6.64s 6.45s 0.84s: a.determinize()      # a = lal(a-zA-Z0-9).ladybird(18)

where (1) is the "original" version, (2) is the version without the optional completion of the determinized automaton (yes, it was set to False in the bench of (1)), and (3) is the current version, which avoids considering unused letters.


prefix, suffix, factor, subword: new algorithms

These four new functions (static, dyn and Python) take an automaton and return another that accepts a superset of its language:

  • suffix makes each accessible state final (with unit weight)
  • prefix makes each coaccessible state initial
  • factor makes each useful state initial and final
  • subword applies the Magnus transform: for each non spontaneous transition (src, label, weight, dst), it adds a spontaneous transition (src, one, weight, dst).


minimize: new subset decorators, except for the "brzozowski" variant

Minimizing an automaton now yields a decorated automaton keeping track of source state names. The new "subset decorator" code is decoupled from minimization and is intended to be used for other algorithms as well.

>>> a = vcsn.context('lal_char(a-z)_b').ratexp('a+b*e+c+dc').standard(); a
>>> a.minimize()

The minimize algorithm no longer recognizes the "brzozowski" variant at the Static level, as it would require a very different, and likely uninteresting, decorator; the user can still directly call minimize_brzozowski at the Static level. We still support "brzozowski" as a variant at the Dyn and Python levels.


efsm: support for transducers

Exchange with OpenFST via efstcompile/efstdecompile now supports transducers.


TAF-Kit is phased out

Existing commands for TAF-Kit are left and are occasionally useful. However, today the last existing TAF-Kit test has been converted to Python: TAF-Kit is no longer checked at all by the test suite.

lift: now bound in Python

One can now lift automata and ratexps from Python.


Minimize: extend 'weighted' and 'signature' variants

Lift arbitrary restrictions on the labelset of the 'weighted' and 'signature' minimization variants.

>>> ctx = vcsn.context('law_char(a-z)_b')
>>> ctx.ratexp('ab+<3>cd+ac').standard().minimize('weighted')


More systematic use of decorators

In addition to the venerable transpose_automaton, several automaton decorator types have been introduced (blind_automaton, determinized_automaton, pair_automaton, product_automaton, ratexp_automaton), and are now used in many algorithms, including: compose, determinize, product, shuffle, infiltration, synchronizing_word, etc.

Coupled with the fact that automata can now display state names (as opposed to state numbers) in Dot output, one gets rich displays of automata. For instance:

>>> ctx = vcsn.context('lal_char(ab)_b')
>>> ctx.ratexp('aa+ab').derived_term().determinize()

now displays an automaton whose states are labeled as sets of ratexps: "{aa+ab}", "{a, b}", and "{}".

Use automaton.strip() to remove state names.


dot2tex format

Automata now support for "dot2tex" format, meant to be used with the dot2tex program. It allows to combine TikZ's nice rendering of automata, LaTeX's math mode to render state and transition labels, with dot automatic layout.

IPython: an interactive display

Starting with IPython 2.0, running 'aut.display()' provides an interactive display of an automaton, with means to select the display mode (e.g., "dot", "dot2tex", etc.).

shuffle: now available on ratexps

Available in static, dyn, and Python.


Optimization: composing two automata transposition yields the identity

Transposing an automata twice now yields the original automata, instead of an automata wrapped by two decorator layers.

is-isomorphic: extend to any context and any automata

Lift the previous limitation of is-isomorphic to deterministic lal automata. The sequential case keeps its linear complexity, but the new generic code has a worst-case complexity of O((n+1)!); however the common case is much faster, as we heuristically classify states according to in- and out-transitions, restricting brute-force search to states which are possible candidates for isomorphism.

>>> ctx = vcsn.context('lal_char(a-z)_b')
>>> a = ctx.ratexp('ab+<3>ab+ab+ac').standard()
>>> b = ctx.ratexp('ab+ac+<3>ab+ab').standard()
>>> a.is_isomorphic(b)
>>> at = ctx.ratexp('abc').standard().transpose()
>>> c = ctx.ratexp('cba').standard()
>>> at.is_isomorphic(c)


Automata are now handled via shared pointers

Types such as vcsn::mutable_automaton<Ctx> used to support the so called "move semantics" only. In particular, simple assignment between automata was not possible. This intentional limitation was introduced to avoid accidental lose of performance by unexpected deep copies of automata, while keeping partly "value semantics".

This, however, resulted in too many constraints, especially when one wants to embed automata in other structures (which is for instance the case of transpose_automaton which wraps an automaton).

To address these issues, types such as vcsn::mutable_automaton<Ctx> are now std::shared_ptr. This does change the programming style, both when instantiating an automaton (typically now using make_mutable_automaton), and when using it (with -> instead of .):

Instead of:

automaton_t aut{ctx};
auto s1 = aut.new_state();


auto aut = vcsn::make_mutable_automaton(ctx);
// or: auto aut = vcsn::make_shared_ptr<automaton_t>(ctx);
auto s1 = aut->new_state();


Polynomials are now usable as weights

Polynomialsets are now usable as a generic weightset. Polynomials are mostly useful on law and ratexpset.

>>> ctx = vcsn.context('lal_char(abc)_polynomialset<law_char(xyz)_z>')
>>> ctx.ratexp('<x + xy + x + \e>a')
<\e + <2>x + xy>a


ratexps: support for negated letter classes

We may now use '[^...]' to denote a letter other than the listed ones. The special case '[^]' denotes any character of the alphabet.

>>> c = vcsn.context('lal_char(0-9)_b')
>>> c.ratexp('0+[^0][^]*')

ratexps: invalid letter classes are rejected

Instead of being ignored, invalid intervals, or empty classes, are now rejected.

>>> c.ratexp('[9-0]')
RuntimeError: invalid letter interval: 9-0
>>> c.ratexp('[]')
RuntimeError: invalid empty letter class

ratexps: improved support for letter classes

Previously letter classes were supported only for context on top of a simple alphabet (LAL, LAN and LAW). Generators of more complex contexts such as LAL x LAN are now supported:

>>> c = vcsn.context('lat<lal_char(abc),lan_char(xyz)>_b')
>>> c.ratexp("['(a,x)''(c,z)']")

>>> c.ratexp("[^'(a,x)''(c,z)']")

>>> c.ratexp("['(a,x)'-'(a,z)']")

>>> c.ratexp("[^]")


compose: new algorithm (static, dyn, Python)

A new algorithm has been introduced to allow the composition of two transducers. It computes the accessible part of the transducer resulting from the composition of the second tape of the first transducer with the first tape of the second one.

>>> c1 = vcsn.context('lat<lan_char(abc),lan_char(ijk)>_b')
>>> c2 = vcsn.context('lat<lan_char(ijk),lan_char(xyz)>_b')
>>> t1 = c1.ratexp("('(a,i)'+'(b,j)'+'(c,k)')*").thompson()
>>> t2 = c2.ratexp("('(i,x)'+'(j,y)'+'(k,z)')*").standard()
>>> t1.compose(t2).proper().shortest(8)
(\e,\e) + (a,x) + (b,y) + (c,z) + (aa,xx) + (ab,xy) + (ac,xz) + (ba,yx)


insplit: new algorithm (static, dyn, Python)

It is now possible to do the in-splitting of an automaton, i.e. to get the equivalent automaton such that each state have either only incoming epsilon-transitions or no incoming epsilon-transitions. This is the first step in a product algorithm that supports epsilon-transitions.


Multiplication by a weight no longer requires a standard automaton

The right-multiplication by a weight uselessly required a standard automaton; it now accepts any automaton. The left-multiplication keeps its specification for standard automata, but now supports non-standard automata, in which case the weight is put on the initial arrows.

Multiplication by 0 is fixed

The multiplication of an automaton by the null weight results in the "zero automaton": a single state, initial, and no transitions.


Python syntactic sugar

Multiplication by a scalar on the left, and on the right, can be performed with implicit conversion of the weights. Instead of

>>> z = vcsn.context('lal_char(ab)_z')
>>> r = z.ratexp('[ab]*')
>>> z.weight(2) * r * z.weight(3)

one can now write

>>> 2 * r * 3

and similarly for automata.

Besides, repeated &-product can be denoted with the same symbol, &: 'a & 3' denotes 'a & a & a'.


Vaucanson 2 has moved

Vaucanson 2 is now also hosted on It is also renamed vaucanson.git, rather than vaucanson2.git. To update your existing repository, run a command similar to:

$ git remote set-url origin

or edit your .git/config file to update the URL.

You may also visit

dyn::label is born

The family of dynamic object (which includes dyn::automaton, dyn::weight, dyn::polynomial, dyn::ratexp, and others) now features a dyn::label. Several algorithms dealing with labels used the "std::string" C++ type to this end. This was inadequate, and the corresponding signatures have been cleaned bottom up.

For instance "derivation" used to have the following signatures


template <typename RatExpSet>
derivation(const RatExpSet& rs, const typename RatExpSet::value_t& e,
           const std::string& word, bool breaking = false)


polynomial derivation(const ratexp& exp, const std::string& s,
                      bool breaking = false);

These signatures are now:

template <typename RatExpSet>
derivation(const RatExpSet& rs, const typename RatExpSet::value_t& e,
           const typename RatExpSet::labelset_t::word_t& word,
           bool breaking = false)

polynomial derivation(const ratexp& exp, const label& l,
                      bool breaking = false);

At the Python level, derivation was adjusted so that one may still pass a string, and see it upgraded, so both these calls work:

>>> ctx = vcsn.context('lal_char(ab)_z')
>>> r = ctx.ratexp('(<2>a)*')
>>> r.derivation(ctx.word('aa'))
>>> r.derivation('aa')

Context extraction

It is now possible to obtain an automaton or a ratexp's context. In dyn, use "dyn::context_of(obj)", in Python use "obj.context()".

Copy can now change the type of the automaton

vcsn::copy used to support only "homogeneous" duplications. It also offers access to the the origins (a map from resulting states to origin states).

Variadic product of automata

The synchronized product of automata is now variadic: the product of n-automata directly builds an automaton labeled with n-tuples of original states. The Python operator, &, is modified to pretend it is variadic (rather than binary): it delays the computation until the result is needed. The ".value()" method allows to force the evaluation.

>>> ctx = vcsn.context('lal_char(ab)_z')
>>> a1 = ctx.de_bruijn(5)
>>> a = ctx.ratexp('a{5}').derived_term()
>>> import timeit
>>> timeit.timeit(lambda: (((a1&a1).value() & a1).value() & a).value(),
>>> timeit.timeit(lambda: (a1 & a1 & a1 & a).value(), number=1000)

Be aware that if the result is not needed, then it is simply not computed at all (hence, appears to be blazingly fast):

>>> timeit.timeit(lambda: a1 & a1 & a1 & a, number=1000)

"Fine grain" runtime compilation works

So far dyn:: provided support for "context" runtime compilation, i.e., if a context is unknown (e.g., "lat<lal_char(ab), lal_char(xy)>_b"), C++ code is emitted, compiled, and loaded. Now Vaucanson also supports per-algorithm runtime compilation.

Brzozowski minimization.

It is now possible to ask for the Brzozowski's minimization:

>>> a = vcsn.context('lal_char(ab)_b').ratexp('a+b').standard()
>>> a.minimize('moore').info()['number of states']
>>> a.minimize('signature').info()['number of states']
>>> a.minimize('brzozowski').info()['number of states']


GCC is back in business

GCC compiles Vaucanson incorrectly (i.e., Vaucanson appears to behave incorrectly, but it is actually the compiler that is incorrect). For this reason, some features were disabled with such a compiler, and recently GCC support was completely dropped.

Efforts were put in finding reasonable workarounds for these bugs, and now GCC is, again, supported. Parts that used to be disabled are now supported.

For more information on this bug, see the following problem report


Vaucanson 1 has moved

Vaucanson 1 is now hosted on It is also renamed vaucanson1.git, rather than just vaucanson.git. To update your existing repository, run a command similar to:

$ git remote set-url origin

or edit your .git/config file to update the URL.

You may also visit


conjunctions of ratexps

The intersection operation on ratexps is renamed as conjunction.


Improve treatment of nullable labelsets

As introduced earlier, nullablesets are now written lan<...>, for instance lan<lal_char(ab)>_b. The former syntax, e.g., lan_char(ab)_b, is kept for backward compatibility and ease of use.

Up to now, lan<> would only work on lal and labelsets with already a one provided, making the improvement quite useless. Now lan<...> will manufacture a one for the labelsets that don't have one, so it can be wrapped around anything.

In addition to that, simplifications are applied; for instance lan<lan<lan_char(ab)>>_b actually builds lan<lal_char(ab)>_b. Similarly, lan<lat<lan_char(ab), law_char(ab), ratexpset<lal_char(ab)_b>>>_b generates lat<lan<lal_char(ab)>, law_char(ab), ratexpset<lal_char(ab)_b>>_b, since the tuple wrapped in the outer lan already has a one, given that all of its components have one.


concatenation and conjunction of ratexps accept more heterogeneous arguments

Conversions of both labels and weights are performed if needed.

>>> a = vcsn.context('ratexpset<lal_char(xy)_b>_z').ratexp("<2>'x*'")
>>> b = vcsn.context('lal_char(b)_q')              .ratexp('<1/3>b')
>>> a * b
>>> (a*b).info()['type']

>>> ab = vcsn.context('lal_char(ab)_z').ratexp('(a+b)*')
>>> bc = vcsn.context('lal_char(bc)_b').ratexp('(b+c)*')
>>> ab & bc
>>> (ab & bc).info()['type']


left-mult and right-mult are bound in Python

Both work on automata and ratexps. Left multiplication now has its arguments in a more natural order in the C++ API: (weight, automaton), instead of the converse previously. TAF-Kit still has it the old way. The Python operator * (left-associative) is overloaded to provide syntactic sugar.

>>> z = vcsn.context('lal_char(abc)_z')
>>> a = z.weight("12") * z.ratexp('ab').standard() * z.weight("23")
>>> a.ratexp()

>>> z.weight("12") * z.ratexp('ab') * z.weight("23")


sum of rational expressions accepts more heterogeneous arguments

Conversions of both labels and weights are performed if needed.

>>> a = vcsn.context('ratexpset<lal_char(xy)_b>_z').ratexp("<2>'x*'")
>>> b = vcsn.context('lal_char(b)_q')              .ratexp('<1/3>b')
>>> a + b

concatenation accepts more heterogeneous arguments

As for products and union, it is now possible to compute the concatenation of automata with different types:

>>> z = vcsn.context('lal_char(a)_z').ratexp('<2>a')  .derived_term()
>>> q = vcsn.context('lal_char(b)_q').ratexp('<1/3>b').derived_term()
>>> r = vcsn.context('lal_char(c)_r').ratexp('<.4>c') .derived_term()
>>> (z*q*r).ratexp()


Bug: standardization of automata

The standardization of an automaton no longer leaves former initial states that became non-accessible.

Bug: right-mult on automata

Its behavior was completely wrong.


parse and display letter classes in transitions

It is now possible to directly write labels of transitions with letter classes. For instance '[a-ky]' denotes every letter between a and k in the alphabet, or y.

Transition labels with equal weights are displayed this way. For instance 'g, a, b, d, c, f' becomes [a-dfg] and '<2>a, <2>b, <2>c" is displayed '<2>[abc]', but 'a, b' stays in this form.


union accepts more heterogeneous arguments

As for products (Hadamard, shuffle, infiltration), it is now possible to compute the union of automata with different types:

>>> z = vcsn.context('lal_char(a)_z').ratexp('<2>a')  .derived_term()
>>> q = vcsn.context('lal_char(b)_q').ratexp('<1/3>b').derived_term()
>>> r = vcsn.context('lal_char(c)_r').ratexp('<.4>c') .derived_term()
>>> (z|q|r).ratexp()

automaton product optimization.

An optimization enabled by state renumbering.

Score changes on Luca's workstation (before/after):

4.60s  2.87s: a.product(a)         # a = std([a-e]?{50})
2.34s  2.37s: a.shuffle(a)         # a = std([a-e]?{50})
4.01s  2.58s: a.infiltration(a)    # a = std([a-e]?{30})
4.17s  2.96s: a**12                # a = std([a-e]*b(<2>[a-e])*)


state renumbering is not automatic any longer

In view of several forthcoming in-place algorithms on automata including edit, we changed the automaton output code not to automatically renumber states at the static, dyn and Python levels. A new "sort" algorithm is available to renumber states (breadth-first and then by outgoing transition label) and reorder transitions (by label) in a predictable way, when explicitly requested. TAF-Kit output is automatically sorted.

Having a predictable numbering will enable future optimizations on automaton product and other algorithms accessing all the transition from a given state by label.


double_ring: New Python binding

Returns a double ring automaton.


ratexpset can be used for LabelSet

Contexts such as ratexpset<lal_char(ab)_b>_b are now valid. The eliminate_state algorithm works properly on automata of such type.

2014-02-10 New Python binding

Like, returns a dictionary of properties of the ratexp.


are-isomorphic/is-isomorphic: new algorithm (static, dyn, TAF-Kit, Python)

Given two automata, check whether they are isomorphic to one another. Currently implemented in the deterministic case only, for lal contexts.

>>> ctx = vcsn.context('lal_char(ab)_z')
>>> ctx.ratexp('a+b*').standard().is_isomorphic(ctx.ratexp('b*+a').standard())

weighted minimization (static, dyn, tafkit, Python)

A third implementation of the minimization algorithm named "weighted" is now available, supporting any lal context.

The new variant is a more widely applicable generalization of the "signature" implementation. The new variant is the default for non-boolean weightsets; it requires a trim automaton but does not rely on determinism or other properties. Preliminary measures show performance to be close to "signature", or even clearly superior in the case of sparse automata such as dictionaries.

Vaucanson 2b.3 (2014-02-03)

Release of our fourth beta, vaucanson-2b.3. Available on MacPorts as "vaucanson".


A Virtual Machine for easy experiments

Installing Vaucanson 2 on some platforms can be tedious. Clément Démoulins contributed a Virtual Machine that makes it easy to experiment with Vaucanson 2 without having to compile it. He also contributed a Vagrantfile to make it even easier to deploy.

To install a Vaucanson virtual machine, please follow this procedure:

1. Install VirtualBox
   From your distro, or from

2. Install Vagrant
   From your distro, or from

3. Download this Vagrantfile and save it somewhere.

   For instance
   $ mkdir ~/src/vcsn2
   $ cd ~/src/vcsn2
   $ wget

4. Run Vagrant (first time will be slow: let it download the VM)

   $ cd ~/src/vcsn2
   $ vagrant up

   Vaucanson is running!

5. Open http://localhost:8888 in your favorite browser.

6. Experiment!  (Hit Shift-Enter to evaluate):
   import vcsn

7. Turn your VM off when you are done
   $ vagrant halt


ratexp difference: new algorithm (static, dyn, Python)

The "difference" algorithms generates a ratexp that accepts words of the left-hand side that are not accepted by the right-hand side ratexp. Also bound as the "%" operator in Python.

>>> ctx = vcsn.context('lal_char(abc)_b')
>>> l = ctx.ratexp('[abc]*')
>>> r = ctx.ratexp('[ab]*')
>>> l.difference(r)
>>> l % r


ratexp intersection: new algorithm (static, dyn, Python)

The "intersection" algorithm computes a ratexp that denotes the Hadamard product of two rational expressions. Also bound as the "&" operator in Python.

>>> ctx = vcsn.context('lal_char(abc)_b')
>>> r = ctx.ratexp('[abc]*')
>>> r.intersection(r)
>>> r & r

ratexp concatenation: new algorithm (static, dyn, Python)

The "concatenate" algorithm computes a ratexp that denotes the concatenation of two rational expressions. Also bound as the "*" operator in Python.

>>> ctx = vcsn.context('lal_char(abc)_b')
>>> r = ctx.ratexp('[abc]*')
>>> r.concatenate(r)
>>> r * r

Python API: Operators overloading on automata

The Python API now overloads the following operators for automata:

  • +, sum of two automata
  • &, product of two automata
  • ~, complement of an automaton
  • *, concatenation of two automata
  • %, difference between two automata
  • |, union of two automata
  • **, power of an automaton


ratexp sum: new algorithm (static, dyn, Python)

Compute the sum of two rational expressions. Also bound as the + operator in Python.

>>> ctx = vcsn.context('lal_char(abc)_b')
>>> r = ctx.ratexp('[abc]*')
>>> r.sum(r)
>>> r + r

proper: an optional argument to avoid state deletion

The proper algorithm eliminates the states that become inaccessible in the course of spontaneous-transition elimination. This can be disabled by passing "false" as additional argument to proper (static, dyn, Python).


star_height: new algorithm (static, dynamic, Python)

Computes the star-height of an expression.

>>> vcsn.context('lal_char(ab)_b').ratexp('(a***+a**+a*)*').star_height()

Bison is no longer needed

To install Vaucanson from a tarball, Bison is no longer needed. Of course, modifying one of grammar files will fail, unless Bison 3.0 is installed.


Letter classes in context specifications

It is now possible to use ranges to define alphabets. For instance in Python,


builds a context whose alphabet covers letters, digits, and underscore.

Vaucanson 2b.2 (2014-01-10)

Release of our third beta, vaucanson-2b.2. Available on MacPorts as "vaucanson".

Letter classes

Letter classes are available in ratexps: [a-d0-9_] is expanded into (a+b+c+d+0+1+2+3+4+5+6+7+8+9+_). The negation, [^...], is not supported.


The '.' operator is no longer printed

The pretty-printing of (non-LAW) ratexps is simplified.


$ vcsn ladybird 2 | vcsn determinize | vcsn aut-to-exp
$ vcsn derived-term -Ee 'a:b:c' | vcsn aut-to-exp


$ vcsn ladybird 2 | vcsn determinize | vcsn aut-to-exp
$ vcsn derived-term -Ee 'a:b:c' | vcsn aut-to-exp

Shortlex order is now used for ratexps

This changes the pretty-printing of polynomials of ratexps, for instance the result of derivations.


$ vcsn derivation -e '(a*+b*)a(a*+b*)' aa
a*+b* + a*.a.(a*+b*) + a*


$ vcsn derivation -e '(a*+b*)a(a*+b*)' aa
a* + a*+b* + a*a(a*+b*)

ratexp implementation overhaul

Although internal, this change is documented as it deeply changes the way ratexps are handled in C++ code.

The abstract-syntax tree of the rational expressions now matches the usual (abstract) grammar:

E ::= \z | \e | a | E+F | E.F | E* | kE | Ek       k is a weight

In other words, there are now 'left-weight' and 'right-weight' nodes that exist, whereas before, the six first cases carried left and right weights. Trivial identities are enforced, and, for instance, no tree for 'a k' exist: it is converted to 'k a'.


Many of these listed features were actually developed over the last month.

products accept more heterogeneous arguments

It is already possible to compute the (regular, infiltration and shuffle) products of automata with different types of "basic" weightsets (e.g. B, Z, Q, R).

It is now also possible when weights are ratexps. For instance the product of an automaton with weights in Q and an automaton with weights in RatE yields an automaton with weights in Q-RatE.

New ratexp operators: & (intersection) and : (shuffle)

The operator & denotes the intersection in the case of Boolean weights, or more generally, the Hadamard product. Only "derived_term" can compute an automaton from it, in which case:

derived_term(E & F) = product(derived_term(E), derived_term(F))

The operator : denotes the shuffle product, aka interleave. For instance a:b:c denotes the language of the permutations of "abc". Only "derived_term" can compute an automaton from it, in which case:

derived_term(E : F) = shuffle(derived_term(E), derived_term(F))

minimization is much faster

There are now two different implementations of the minimization, namely "moore" and "signature". Both require a trim automaton as input, and "signature" accepts non-deterministic automata.

The "moore" minimization is currently the fastest.

context run-time compilation

When dyn::make_context is presented with an unknown but valid context, it is compiled and loaded dynamically. For instance:

$ vcsn cat -C 'lao_r' -W -e 3.14
# Wait for the context to be compiled...

The compiled context is currently left in /tmp for forthcoming runs.

$ ls /tmp/lao*
/tmp/   /tmp/lao_r.o   /tmp/

Python binding

It is now possible to use Vaucanson from Python. The Python binding is on top of the dyn API, and inherits all its features.

This Python API is object-oriented, contrary to dyn which is a list of types (context, ratexp, automaton, etc.) and functions (derived_term, determinize, etc.):

 dyn:: functions                    |  Python methods
derived_term(ratexp) -> automaton   | ratexp.derived_term() -> automaton
determinize(automaton) -> automaton | automaton.determinize() -> automaton
etc.                                | etc.

Documentation is forthcoming, but for instance:

$ python
Python 2.7.6 (default, Nov 12 2013, 13:26:39)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import vcsn
>>> b = vcsn.context('lal_char(abc)_b')
>>> b
>>> r1 = b.ratexp('(a+b)*')
>>> r1
>>> a1 = r1.derived_term()
>>> a2 = b.ratexp('(b+c)*').standard()
>>> a1.product(a2).ratexp()


products accept heterogeneous arguments

It is now possible to compute the (regular, infiltration and shuffle) products of a (B, Z, Q, R) automaton with a (B, Z, Q, R) automaton.

$ vcsn derived-term -C 'lal_char(ab)_q' -e '(<1/2>a+<1/3>b)*' > q.gv
$ vcsn derived-term -C 'lal_char(ab)_z' -e '(<2>a+<3>b)*'     > z.gv
$ vcsn derived-term -C 'lal_char(ab)_r' -e '(<.2>a+<.3>b)*'   > r.gv

$ vcsn product -f z.gv q.gv | vcsn aut-to-exp
$ vcsn product -f q.gv z.gv | vcsn aut-to-exp
$ vcsn product -f q.gv z.gv q.gv | vcsn aut-to-exp
$ vcsn product -f q.gv z.gv z.gv | vcsn aut-to-exp
$ vcsn product -f q.gv r.gv | vcsn aut-to-exp
$ vcsn product -f z.gv r.gv | vcsn aut-to-exp
$ vcsn product -f z.gv r.gv q.gv | vcsn aut-to-exp


TikZ output now uses standalone

The TikZ output can now both be compiled as a standalone document, and be used inlined (as LaTeX source) in another document. Read the document of the "standalone" package (e.g., run 'texdoc standalone').

Data library path

TAF-Kit is now able to find installed files. Control the search path via the environment variable VCSN_DATA_PATH (a ":"-separated list of directories).

$ vcsn aut-to-exp -f lal_char_z/c1.gv

dot format parsing is much faster

We now require Bison 3.0 to build Vaucanson. In exchange, we get is very significant speed-up. On erebus (Mac OS X i7 2.9GHz 8GB, GCC 4.8 -O3) reading standard('a?{500}') goes from 30s to 1.2s.


minimize: new algorithm (static, dynamic, TAF-Kit)

Given a deterministic Boolean lal automaton, compute its minimal equivalent automaton using Moore's algorithm.


$ vcsn standard -e '(a+b+c+d){100}' -o 100.gv
$ vcsn cat -f 100.gv -O info | grep 'number of states'
number of states: 401
$ vcsn minimize -f 100.gv -o 100min.gv
$ vcsn cat -f 100min.gv -O info | grep 'number of states'
number of states: 101
$ vcsn are-equivalent -f 100.gv 100min.gv

fraction parsing bug fixes

Negative denominators used to be silently mangled, and zero denominators were accepted without failing.


split: new algorithm (static, dynamic, TAF-Kit)

Implements the breaking of a ratexp into a polynomial of ratexps, as required by breaking derivation, and broken derived terms. Not named "break", as this is a C++ keyword.


products accept heterogeneous arguments

It is now possible to compute the (regular, infiltration and shuffle) products of a B-automaton with any other kind of automaton.

$ vcsn standard -C 'lal_char(ab)_b' -e 'a' -o a.gv
$ vcsn standard -C 'lal_char(ab)_ratexpset<lal_char(uv)_b>' \
                -e '(<u>a+<v>b)*' -o ab.gv
$ vcsn product -f ab.gv a.gv | vcsn shortest -f - 4
$ vcsn shuffle -f ab.gv a.gv | vcsn shortest -f - 4
a + <u+u>aa + <v>ab + <v>ba
$ vcsn infiltration -f ab.gv a.gv | vcsn shortest -f - 4
<u+\e>a + <u.u+u+u>aa + <u.v+v>ab + <v.u+v>ba


shortest: accepts the number of words as argument (defaults to 1)

$ vcsn shortest -O text -Ee '(a+b)*' 2
\e + a
$ vcsn enumerate -O text -Ee '(a+b)*' 2
\e + a + b + aa + ab + ba + bb

difference: new algorithm (static, dynamic, TAF-Kit)

Computes the difference between an automaton and a B-automaton (on LAL only):

$ vcsn derived-term -e '(?@lal_char(ab)_z)(<2>a+<3>b)*' -o lhs.gv
$ vcsn derived-term -e '(a+b)*b(a+b)*' -o rhs.gv
$ vcsn difference -f lhs.gv rhs.gv | vcsn aut-to-exp
$ vcsn difference -f rhs.gv rhs.gv | vcsn aut-to-exp

product accepts heterogeneous arguments

It is now possible to compute the product of (for instance) a Z-automaton with a B-automaton.

$ vcsn standard -C 'lal_char(abc)_z' -e '(<2>a+<3>b+<5>c)*' -o 1.gv
$ vcsn standard -C 'lal_char(b)_b' -e 'b*' -o 2.gv
$ vcsn product -f 1.gv 2.gv | vcsn aut-to-exp
$ vcsn product -f 2.gv 1.gv | vcsn aut-to-exp

Vaucanson 2b.1 (2013-11-13)

Release of our second beta, vaucanson-2b.1. Available on MacPorts as "vaucanson".


TAF-Kit: infiltration, product, shuffle: accept multiple arguments

You may pass several (one or more) arguments to vcsn product, infiltration and shuffle.

infiltration, product, shuffle: accept non commutative semirings

Beware that the (well defined) behavior of the resulting automata is no longer what one might expect.

$ vcsn standard -C 'lal_char(ab)_ratexpset<lal_char(uv)_b>' \
                -e '<u>a<v>b' -o uv.gv
$ vcsn standard -C 'lal_char(ab)_ratexpset<lal_char(xy)_b>' \
                -e '<x>a<y>b' -o xy.gv
$ vcsn product -f uv.gv xy.gv | vcsn enumerate -f - 4
$ vcsn shuffle -f uv.gv xy.gv | vcsn enumerate -f - 4
$ vcsn infiltration -f uv.gv xy.gv | vcsn enumerate -f - 4

Input/Output fixes

Support for EFSM in input/output is improved to support weights. Output was significantly sped up.


0.64s: standard -Ee 'a?{500}' -O dot  >a500.gv
4.47s: standard -Ee 'a?{500}' -O efsm >a500.efsm
4.04s: standard -Ee 'a?{500}' -O fado >a500.fado
4.83s: standard -Ee 'a?{500}' -O grail>a500.grail
1.03s: standard -Ee 'a?{500}' -O tikz >a500.tikz


0.76s: standard -Ee 'a?{500}' -O dot  >a500.gv
0.37s: standard -Ee 'a?{500}' -O efsm >a500.efsm
0.35s: standard -Ee 'a?{500}' -O fado >a500.fado
0.35s: standard -Ee 'a?{500}' -O grail>a500.grail
0.68s: standard -Ee 'a?{500}' -O tikz >a500.tikz


boolean weight parsing bug fix

The weight parser used to ignore all characters after the first one, so that for example "1-q" was considered valid and equivalent to "1".


product and infiltration-product are much faster

On erebus (MacBook Pro i7 2.9GHz 8GB, GCC 4.8 -O3), with "vcsn standard -E -e 'a?70' -o a70.gv":


   13.01s: product -q -f a70.gv a70.gv
    0.23s: shuffle -q -f a70.gv a70.gv
   16.64s: infiltration -q -f a70.gv a70.gv


    0.77s: product -q -f a70.gv a70.gv
    0.19s: shuffle -q -f a70.gv a70.gv
    0.87s: infiltration -q -f a70.gv a70.gv


TAF-Kit: option name changes

The former options -W and -L (to specify the weightset and labelset) are removed, definitively replaced by -C, to specify the context.

In addition to -A and -E (input is an automaton/ratexp), option -W now replaces option -w (input is a weight), and option -P denotes "polynomials". Note that "polynomials" are actually linear combinations of labels, weighted by weights, so for instance "LAL" contexts accept only single-letter labels, used LAW to accept words.

$ vcsn cat -C 'law_char(ab)_z' -P -e 'a+ab + <-1>a+ab+<10>bb'
<2>ab + <10>bb

internal overhaul

Better names were chosen for the various details of the dyn:: value support:

           abstract_automaton -> automaton_base

             abstract_context -> context_base

          abstract_polynomial -> polynomial_base
 concrete_abstract_polynomial -> polynomial_wrapper

              abstract_ratexp -> ratexp_base
     concrete_abstract_ratexp -> ratexp_wrapper

           abstract_ratexpset -> ratexpset_base
  concrete_abstract_ratexpset -> ratexpset_wrapper

              abstract_weight -> weight_base
     concrete_abstract_weight -> weight_wrapper

Note that _base is slightly misleading, as it actually applies to the wrappers, and not to the static objects. For instance, weightset_wrapper derives from weightset_base, but b, z, and the other (static) weightsets do not derive from weightset_base. It would therefore be more precise to name weightset_base as weightset_wrapper_base, but that might be uselessly long. Names may change again in the future.

Because dyn::polynomial and dyn::weight store the corresponding polynomialset/weightset, there is no need (so far?) for a dyn::polynomialset/dyn::weightset. Therefore, abstract_polynomialset and abstract_weightset were removed.

For consistency (and many other benefits), automaton_wrapper and context_wrapper were introduced. Now, the whole static API is completely independent of the dyn API, and the dyn API consists only of wrappers of "static"-level values---instead of inheritance for automata and contexts.


renamings (static, dynamic, TAF-Kit)

derive        -> derivation
derived-terms -> derived-term
infiltrate    -> infiltration


taf-kit: option -q for "quiet"

Equivalent to "-O null": do not generate output.

star-normal-form: new algorithm (static, dynamic, TAF-Kit)

Compute an equivalent rational expression where starred subexpressions have a non null constant term. Yields the same standard automaton, but built faster.

$ vcsn star-normal-form -e '(a*b*)*'

Valid only for Boolean automata.

ratexp printing: minimize parentheses

Print rational expressions with the minimum required number of parentheses, making some large expressions considerably easier to read. For example


$ vcsn expand -C 'law_char(abc)_z' -e '(a+b)?{2}*'
$ vcsn ladybird 2 | vcsn determinize | vcsn aut-to-exp


$ vcsn expand -C 'law_char(abc)_z' -e '(a+b)?{2}*'
$ vcsn ladybird 2 | vcsn determinize | vcsn aut-to-exp

expand: new algorithm (static, dynamic, TAF-Kit)

Given a rational expression, distribute product over addition (recursively under the starred subexpressions) group and sort the equal monomials.

$ vcsn expand -C 'lal_char(abc)_z' -e '(a+b)?{2}*'

Vaucanson 2b.0 (2013-10-22)

Release of our first beta, vaucanson-2b.0.


shuffle, infiltrate: new algorithms (static, dynamic, TAF-Kit)

New algorithms on LAL automata: computes the shuffle-product and infiltration of two automata. These (and now product as well) require weightsets to be commutative semirings.

$ vcsn standard -C 'lal_char(ab)_z' -E -e 'ab' -o ab.gv
$ vcsn product    -f ab.gv ab.gv | vcsn enumerate -O text -f - 4
$ vcsn shuffle    -f ab.gv ab.gv | vcsn enumerate -O text -f - 4
<4>aabb + <2>abab
$ vcsn infiltrate -f ab.gv ab.gv | vcsn enumerate -O text -f - 4
ab + <2>aab + <2>abb + <4>aabb + <2>abab


derived-terms: new algorithm (static, dynamic, TAF-Kit)

In addition to thompson and standard, this is a third means to build an automaton from a (weighted) rational expression. It corresponds to the Antimirov definition of derivatives (implemented by "derive"). It requires LAL rational expressions.

$ vcsn derived-terms -C 'lal_char(ab)_q' -e '(<1/6>a*+<1/3>b*)*'
  vcsn_context = "lal_char(ab)_q"
  rankdir = LR
    node [style = invis, shape = none, label = "", width = 0, height = 0]
    node [shape = circle]
  I0 -> 0
  0 -> F0 [label = "<2>"]
  0 -> 1 [label = "<1/3>a"]
  0 -> 2 [label = "<2/3>b"]
  1 -> F1 [label = "<2>"]
  1 -> 1 [label = "<4/3>a"]
  1 -> 2 [label = "<2/3>b"]
  2 -> F2 [label = "<2>"]
  2 -> 1 [label = "<1/3>a"]
  2 -> 2 [label = "<5/3>b"]

vcsn eliminate-state

This tool allows one to eliminate states one after the other.

$ vcsn eliminate-state -f lao.gv 2

Its interface is likely to change, or to be completely removed.

Smaller libraries

Some compiler magic was used to reduce the size of the libraries (about 20% on Mac OS X using GCC 4.8). TAF-Kit might start faster.


determinize: by default no longer forces the result to be complete

The determinization (static, dynamic, TAF-Kit) now admits an optional second argument (defaulting to '0'), a Boolean stating whether the result must be complete.


polynomialset fully replaces entryset

The 'entryset' type is removed, as polynomialset now provides a strict superset of its features.


derive: new algorithm (static, dynamic, TAF-Kit)

New algorithm on LAL automata: computes the derivation of rational expressions with respect to a word.

$ vcsn derive -C 'lal_char(a)_z' -e '(<2>a)*' a
$ vcsn derive -C 'lal_char(a)_z' -e '(<2>a)*' aa
$ vcsn derive -C 'lal_char(a)_z' -e '(<2>a)*' aaaa
$ vcsn derive -C 'lal_char(a)_z' -e '(<2>a)*' b

$ vcsn derive -C 'lal_char(ab)_q' -e '(<1/6>a*+<1/3>b*)*' a
$ vcsn derive -C 'lal_char(ab)_q' -e '(<1/6>a*+<1/3>b*)*' aa


enumerate returns a dyn::polynomial

The 'enumerate' algorithm use to cheat, and returned a std::vector<string> (each string being a pretty-printing of the monomial, e.g., "<2>a"). It now returns a dyn::polynomial.

vcsn-enumerate is biased to prefer the 'list' output format:

$ vcsn enumerate -Ee 'a*' 3

however the proper syntax for polynomials can be asked for.

$ vcsn enumerate -O text -Ee 'a*' 3
\e + a + aa + aaa

New dyn:: type: polynomial

A new member joins the dyn:: family of types (i.e., automaton, ratexp, ratexpset, weight, weightset). Currently there are no means to read such a value from TAF-Kit, but there is output support with two different output format:

  • 'text' (aka 'default') Prints the polynomial this usual way, e.g., "<2>a+<3>b".
  • 'list' Prints one monomial per line, e.g. '<2>a <3>b'


mutable_automaton: speed improvement

"set_transition" used to invoke twice "get_transition", which had a serious performance impact on some algorithms. This is fixed.


 9.08s (0.33s+8.75s): ladybird 21 | determinize -O null
16.88s (0.85s+16.03s): thompson -C "lan_char(a)_b" -Ee "a?{2000}" | proper -O null
21.30s (9.76s+11.54s): standard -Ee "(a+b+c+d)?{100}" | aut-to-exp -O null
 7.29s (0.04s+7.25s): standard -C "lal_char(ab)_z" -Ee "(a+b)*b(<2>a+<2>b)*" | power -O null -f- 10
 0.04s (0.04s): standard -E -e "(a?){70}" -o a70.gv
24.20s (24.20s): product -O null -f a70.gv a70.gv


 8.54s (0.13s+8.41s): ladybird 21 | determinize -O null
11.87s (0.86s+11.01s): thompson -C "lan_char(a)_b" -Ee "a?{2000}" | proper -O null
21.03s (9.40s+11.63s): standard -Ee "(a+b+c+d)?{100}" | aut-to-exp -O null
 6.58s (0.06s+6.52s): standard -C "lal_char(ab)_z" -Ee "(a+b)*b(<2>a+<2>b)*" | power -O null -f- 10
 0.07s (0.07s): standard -E -e "(a?){70}" -o a70.gv
13.16s (13.16s): product -O null -f a70.gv a70.gv

Note in particular that the spontaneous transition elimination algorithm is faster, going from 16s to 11s on a MacBook Pro i7 2.9GHz 8GB RAM, on the following sequence.

$ vcsn thompson -C 'lan_char(a)_b' -Ee 'a?{2000}' | vcsn proper -O null


random: new algorithm (static, dynamic, TAF-Kit)

Random generation of automata. Subject to changes. Accepts four arguments: number of states, density (of transitions, defaults to .1, 1 generates a clique), number of initial states (defaults to 1), and number of final states (defaults to 1).

So far LAL and LAN only, no support for random weights.

$ vcsn random -O fado 3
@DFA 0
0 d 1
1 c 2
2 d 2
$ vcsn random -O fado 3
@DFA 2
0 c 1
1 d 2
2 c 0
$ vcsn random -C 'lan_char(ab)_b' -O fado 3
@NFA 1 * 0
0 @epsilon 1
1 b 2
2 b 1


Overhaul of the package

Thanks to Automake 1.14 features, there is now a single Makefile, which significantly speeds up the compilation of the package. The test-suite now uses the Test Anything Protocol, which results in more verbose results.

Beware that because of subtle issues (in the generated Makefile snippets that track dependencies), you are highly recommend to "make clean" after upgrading from the Git repository, and then "make" as usual.


proper: faster implementation

The spontaneous transition elimination algorithm is now faster, going from 102s to 15s on a MacBook Pro i7 2.9GHz 8GB RAM, on the following sequence.

$ vcsn thompson -C 'lan_char(a)_b' -Ee 'a?{2000}' | vcsn proper -O null



New algorithm (static, dynamic, TAF-Kit) on LAL automata: whether some word is the label of at least two successful computations.

$ vcsn is-ambiguous <<\EOF
  I -> 0
  0 -> 1 [label = "a"]
  0 -> 2 [label = "a"]
  1 -> F

$ vcsn is-ambiguous <<\EOF
  I -> 0
  0 -> 1 [label = "a"]
  0 -> 2 [label = "a"]
  1 -> F
  2 -> F

Also reported in the 'info' format for automata.

product: recover the original states

Similarly to 'determinize', the (static version of) 'product' can now be queried to get a map from states of the result to pairs of original states.


Sum of standard automata is fixed

See the previous entry: the computation of the initial transition was wrong, which resulted in the production of non-standard automata. This is fixed:

$ vcsn standard -C 'lal_char(a)_ratexpset<lal_char(x)_b>' -e '<x>a*' > 1.gv
$ vcsn standard -C 'lal_char(b)_ratexpset<lal_char(y)_b>' -e '<y>b*' > 2.gv
$ vcsn sum -f 1.gv 2.gv
  vcsn_context = "lal_char(ab)_ratexpset<lal_char(xy)_b>"
  rankdir = LR
    node [style = invis, shape = none, label = "", width = 0, height = 0]
    node [shape = circle]
  I0 -> 0
  0 -> F0 [label = "<x+y>"]
  0 -> 1 [label = "<x>a"]
  0 -> 2 [label = "<y>b"]
  1 -> F1
  1 -> 1 [label = "a"]
  2 -> F2
  2 -> 2 [label = "b"]


Operations on automata are generalized

Operations (union, sum, concatenate, chain) were uselessly restricted to LAL. Besides, the contexts were improperly computed (both labelset and weightset). This is fixed.

$ vcsn standard -C 'lal_char(a)_ratexpset<lal_char(x)_b>' -e '<x>a*' > 1.gv
$ vcsn standard -C 'lal_char(b)_ratexpset<lal_char(y)_b>' -e '<y>b*' > 2.gv
$ vcsn sum -f 1.gv 2.gv
  vcsn_context = "lal_char(ab)_ratexpset<lal_char(xy)_b>"
  rankdir = LR
    node [style = invis, shape = none, label = "", width = 0, height = 0]
    node [shape = circle]
  I0 -> 0 [label = "<\\e+\\e+\\e>"]
  0 -> F0 [label = "<x+y>"]
  0 -> 1 [label = "<x>a"]
  0 -> 2 [label = "<y>b"]
  1 -> F1
  1 -> 1 [label = "a"]
  2 -> F2
  2 -> 2 [label = "b"]



New algorithm (static, dynamic, TAF-Kit).

$ vcsn double-ring -C 'lal_char(ab)_b' 6 1 3 4 5
  vcsn_context = "lal_char(ab)_b"
  rankdir = LR
    node [style = invis, shape = none, label = "", width = 0, height = 0]
    node [shape = circle]
  I0 -> 0
  0 -> 1 [label = "a"]
  0 -> 5 [label = "b"]
  1 -> F1
  1 -> 0 [label = "b"]
  1 -> 2 [label = "a"]
  2 -> 1 [label = "b"]
  2 -> 3 [label = "a"]
  3 -> F3
  3 -> 2 [label = "b"]
  3 -> 4 [label = "a"]
  4 -> F4
  4 -> 3 [label = "b"]
  4 -> 5 [label = "a"]
  5 -> F5
  5 -> 0 [label = "a"]
  5 -> 4 [label = "b"]

It is advised to pass "layout = circo" to Dot for the rendering.

concatenation, chain

New algorithm on standard automata (static, dynamic, TAF-Kit).


New algorithm on standard automata (static, dynamic, TAF-Kit). Same limitations as left-mult, see below.



New algorithm on standard automata (static, dynamic, TAF-Kit).

The TAF-Kit version is (currently) troublesome, as it does not infer the context to parse the weight from the automaton: be sure to specify -C:

$ vcsn standard -C 'lal_char(a)_z' -e a | vcsn left-mult 12
vcsn-left-mult: left_mult: no implementation available for mutable_automaton<lal_char_z> x b
$ vcsn standard -C 'lal_char(a)_z' -e a | vcsn left-mult -C 'lal_char(a)_z' 12
  vcsn_context = "lal_char(a)_z"
  rankdir = LR
    node [style = invis, shape = none, label = "", width = 0, height = 0]
    node [shape = circle]
  I0 -> 0
  0 -> 1 [label = "<12>a"]
  1 -> F1

taf-kit: option -w for weights as input

Currently only vcsn-cat supports it.

$ vcsn cat -w -e 2
vcsn-cat: invalid Boolean: 2
$ vcsn cat -w -e 1
$ vcsn cat -C 'lal_char(a)_z' -w -e 2

concatenate, star, sum

New algorithms on standard automata (static, dynamic, TAF-Kit).



New algorithm on automata (static as "union_a", dynamic as "union_a", TAF-Kit as "union"). Plain graph union.


New algorithm on rational expressions (static, dynamic, TAF-Kit).

$ vcsn is-valid -C 'lal_char(a)_r' -E -e '(<.5>\e)*'
$ vcsn is-valid -C 'lal_char(a)_r' -E -e '\e*'

For consistency, is-valid is now also available for automata in dyn:: and TAF-Kit (it used to be static only, visible from "info" output).

$ vcsn thompson -C 'lan_char(a)_r' -e '(<.5>\e)*' | vcsn is-valid
$ vcsn thompson -C 'lan_char(a)_r' -e '\e*' | vcsn is-valid


More RatExp quantifiers

In addition to "", there is "?"/"{?}" and "{+}". Support for "{}" is added for symmetry.

$ vcsn cat -Ee 'a?'
$ vcsn cat -Ee 'a?{3}'
$ vcsn cat -Ee '(a+b){+}'


I/O EFSM format support

We are now able to produce and read EFSM format (designed as an interface to OpenFST). This is not (yet) thoroughly tested for weighted automata.

$ vcsn ladybird -O efsm 4 |
   efstcompile | fstdeterminize | efstdecompile |
   vcsn cat -I efsm -O info
type: mutable_automaton<lal_char(abc)_b>
number of states: 15
number of initial states: 1
number of final states: 8
number of accessible states: 15
number of coaccessible states: 15
number of useful states: 15
number of transitions: 43
number of deterministic states: 15
number of eps transitions: 0
is complete: 0
is deterministic: 1
is empty: 0
is eps-acyclic: 1
is normalized: 0
is proper: 1
is standard: 0
is trim: 1
is useless: 0
is valid: 1

Output in EFSM format is improved

State numbers now start appropriately at 0 (instead of 2), and when there is a single initial state, no "pre-initial state" is output; this avoids the introduction of spontaneous transitions in deterministic automata.

Before (see news of 2013-07-01):

$ vcsn ladybird -O efsm 2
#! /bin/sh

cat >transitions.fsm <<\EOFSM
0       2       \e
2       3       a
3       3       b
3       3       c
3       2       c
3       2       a

cat >isymbols.txt <<\EOFSM
\e      0
a       1
b       2
c       3

fstcompile --acceptor --keep_isymbols --isymbols=isymbols.txt transitions.fsm


$ vcsn ladybird -O efsm 2
#! /bin/sh

cat >isymbols.txt <<\EOFSM
\e      0
a       1
b       2
c       3

cat >transitions.fsm <<\EOFSM
0       1       a
1       0       a
1       1       b
1       0       c
1       1       c

fstcompile --acceptor --keep_isymbols --isymbols=isymbols.txt transitions.fsm

I/O FAdo format support

We are now able to produce and read FAdo format.

$ vcsn ladybird -O fado 4 | \
  python -c "from FAdo import fa
nfa = fa.readFromFile('/dev/stdin')[0]
dfa = nfa.toDFA()
fa.saveToFile('dl4.fado', dfa)"

$ vcsn cat -I fado -O info -f dl4.fado
type: mutable_automaton<lal_char(abc)_b>
number of states: 15
number of initial states: 1
number of final states: 8
number of accessible states: 15
number of coaccessible states: 15
number of useful states: 15
number of transitions: 43
number of deterministic states: 15
number of eps transitions: 0
is complete: 0
is deterministic: 1
is empty: 0
is eps-acyclic: 1
is normalized: 0
is proper: 1
is standard: 0
is trim: 1
is useless: 0
is valid: 1


New algorithm on rational expressions (static, dynamic, TAF-Kit).

$ vcsn constant-term -e '(?@lal_char(a)_b)(\e)*'
$ vcsn constant-term -e '(?@lal_char(a)_z)(\e)*'
vcsn-constant-term: z: star: invalid value: 1
$ vcsn constant-term -C 'law_char(ab)_ratexpset<law_char(wxyz)_b>' \
                     -e '<w>(<x>a*+<y>b*)*<z>'


Automaton library

The set of library automata for existing contexts is now complete with respect to what Vaucanson 1 provided. Automata families are not, and will not, be part of this library ; for instance, instead of looking for ladybird-6.gv, use 'vcsn ladybird 6'.

lal_char_b: a1.gv b1.gv evena.gv oddb.gv
lal_char_z: b1.gv binary.gv c1.gv d1.gv
lal_char_zmin: minab.gv minblocka.gv slowgrow.gv

Currently one must specify their path:

$ vcsn evaluate -f share/vcsn/lal_char_zmin/minab.gv aabbba


TAF-Kit: works on standard input by default

The very frequent "-f -" sequence is no longer required: by default the input is stdin.

$ vcsn ladybird 2 | vcsn determinize | vcsn aut-to-exp



New algorithm on automata (static, dynamic, TAF-Kit). Corresponds to "standardize" in Vaucanson 1.


New algorithm on automata (static, dynamic, TAF-Kit). Also reported in "info" format.


New automata factory (static, dynamic, TAF-Kit): Brzozowski's universal witness.


New formats: grail and fado

$ vcsn standard -O fado -e 'a+b'
@DFA 1 2
0       a       1
0       b       2
$ vcsn standard -O fado -e 'a+ab'
@NFA 1 3 * 0
0       a       1
2       b       3
0       a       2
$ vcsn standard -O grail -e 'a+ab'
(START)     |-      0
0       a       1
2       b       3
0       a       2
1       -|      (FINAL)
3       -|      (FINAL)



New algorithm on automata (static, dynamic, TAF-Kit).

$ vcsn standard -e '(?@lal_char(01)_z)(0+1)*1(<2>0+<2>1)*' >binary.gv
$ vcsn power -f binary.gv 0 | vcsn enumerate -f- 2
$ vcsn power -f binary.gv 1 | vcsn enumerate -f- 2
$ vcsn power -f binary.gv 2 | vcsn enumerate -f- 2
$ vcsn power -f binary.gv 4 | vcsn enumerate -f- 2
$ vcsn power -f binary.gv 8 | vcsn enumerate -f- 2



New algorithm (static, dynamic, TAF-Kit).

$ vcsn divkbaseb -C 'lal_char(01)_b' 3 2
  vcsn_context = "lal_char(01)_b"
  rankdir = LR
    node [style = invis, shape = none, label = "", width = 0, height = 0]
    node [shape = circle]
  I0 -> 0
  0 -> F0
  0 -> 0 [label = "0"]
  0 -> 1 [label = "1"]
  1 -> 0 [label = "1"]
  1 -> 2 [label = "0"]
  2 -> 1 [label = "0"]
  2 -> 2 [label = "1"]


enumerate produces a list of weighted words

enumerate now also provides the weight of the words. It is also fixed: it no longer reports words with nul weight (e.g., 'a' in 'a+<-1>a').

$ vcsn standard -e '(?@lal_char(01)_z)(0+1)*1(<2>0+<2>1)*' \
    | vcsn enumerate -f - 3

shortest is fixed and modified similarly.

$ vcsn standard -e '(?@lal_char(ab)_z)(a+<5>bb+<-1>a)' \
      | vcsn shortest -f -


WeightSet: added support for rational weights

We can now use automata with rational numbers as their weights.

$ vcsn standard -e "(?@lal_char(ab)_q)(<1/2>a+<2>b)*" \
  | vcsn evaluate -f - aaabbbb

$ vcsn standard -e "(?@lal_char(ab)_q)(<1/2>a+<2>b)*"\
  | vcsn evaluate -f - aaab


new format: efsm (and new tool: efstcompile)

The former output format ("fsm") is dropped, replaced by an adhoc "extended FSM" format: "efsm". The FSM format focuses only on the transitions, and lacks information about the labels (which are expected to be mapped to numbers), weights, whether it's an acceptor or transducer, etc.

The efsm format is designed to be simple to use with OpenFST: just run "efstcompile" instead of "fstcompile". As a matter fact, the new "efstcompile" tool is rather dumb, as it simply executes the "efsm" file.

$ vcsn ladybird -O efsm 2
#! /bin/sh

cat >transitions.fsm <<\EOFSM
0       2       \e
2       3       a
3       3       b
3       3       c
3       2       c
3       2       a

cat >isymbols.txt <<\EOFSM
\e      0
a       1
b       2
c       3

fstcompile --acceptor --keep_isymbols --isymbols=isymbols.txt transitions.fsm

$ vcsn ladybird -O efsm 8 | efstcompile | fstdeterminize | fstinfo \
    | grep '# of states'
# of states                                       256


new binary: vcsn

To give a flavor of what TAF-Kit should be (a single tool instead of one per command), the new "vcsn" script bounces to the vcsn-* tools. It does not support | as TAK-Kit 1 did.

$ vcsn are-equivalent -Ee '(a*b*)*' '(a+b)*'



Now removes states to which no transition arrive after spontaneous transitions removal.


Only the concatenation yielded an automaton whose projection on Boolean weights was different from the Thompson of the projection of the rational expression on Boolean. This is now fixed.


New algorithm (static, dynamic, TAF-Kit).


identity, unit => one

Labels new define one() and is_one() instead of identity() and is_identity().

We now use LAO, "labels are one", instead of LAU, "labels are unit".

WeightSets now define one() and is_one() instead of unit() and is_unit().


shortest, enumerate

New algorithms (static, dynamic, TAF-Kit).



Now accepts (LAL Boolean) rational expressions.


Now accepts (LAL Boolean) rational expressions. It cannot compare a rational expression with an automaton (or vice-versa). This is a temporary defect which shell be addressed once TAF-Kit is properly developed.

$ vcsn-are-equivalent -Ee '(a*b*)*' '(a+b)*'
$ vcsn-are-equivalent -Ee '(a*b)*' '(a+b)*'

dyn: overhaul

Consistency is enforced in dyn. In particular the very first dynamic/static bridge (which was not identified as such), vcsn::rat::abstract_ratexpset, is now part of dyn::.

dyn::weight now aggregates its WeightSet instead of a Context. More similar conversions were performed, and other are to come.



The output now shows useless states (and their transitions) in gray.



New algorithm (static, dynamic, TAF-Kit). Currently works only for LAL Boolean automata.

$ vcsn-standard -e 'a(ba)*' -o a1.gv
$ vcsn-standard -e '(ab)*a' -o a2.gv
$ vcsn-are-equivalent -f a1.gv a2.gv


is_trim, is_useless, is_empty

New algorithms (static, dynamic, TAF-Kit).

accessible_states, coaccessible_states, useful_states

New algorithms, static only.

num_accessible_states, num_coaccessible_states, num_useful_states

New algorithms, static only. Available in the "info" display.

copy accepts a predicate

It is now possible to filter the states to keep. Either as a predicate, or a set of states. For instance:

template <typename Aut>
Aut trim(const Aut& a)
  return vcsn::copy(a, useful_states(a));


New algorithm (static, dynamic, and TAF-Kit). Requires a LAL Boolean automaton.

$ vcsn-universal -f a1.gv
  vcsn_context = "lal_char(ab)_b"
  rankdir = LR
  node [shape = circle]
    node [style = invis, shape = none, label = "", width = 0, height = 0]
  { 0 1 2 }
  I0 -> 0
  0 -> 0 [label = "a, b"]
  0 -> 1 [label = "a"]
  1 -> 0 [label = "a, b"]
  1 -> 1 [label = "a, b"]
  1 -> 2 [label = "b"]
  2 -> F2
  2 -> 0 [label = "a, b"]
  2 -> 1 [label = "a, b"]
  2 -> 2 [label = "a, b"]


New algorithm (static, dynamic, and TAF-Kit). Requires a complete deterministic LAL Boolean automaton.

$ vcsn-standard -e '(?@lal_char(ab)_b)a' \
    | vcsn-determinize -f- \
    | vcsn-complement -f-  \
    | vcsn-aut-to-exp -f-



It is conforming with the specifications: all the states must be deterministic, not just the reachable ones.

info output adjustments

It now displays whether the automaton "is complete" and the number of deterministic states if LAL, otherwise "N/A". It also reports "is deterministic: N/A" for non-LAL.

Invalid labels are rejected

Under some circumstances, some invalid transitions could be accepted by the Dot parser (e.g., "aa" or "" in LAL). This is fixed.


Support for entries is removed.

Member types, functions, and variables, about entries, have been removed. The services provided by entries in Vaucanson 1 are provided by LAU automata, so entries are not as useful in Vaucanson 2. And anyway, if needed, it should rather be a set of free standing functions.


Refactoring: eps-removal => proper

Eps-removal is renamed to Proper.


Syntax for rational-expressions has changed!

Angular brackets are now used for weights. Instead of

$ vcsn-standard -e '(?@lal_char(01)_z)(0+1)*1({2}1+{2}0)*' -o


$ vcsn-standard -e '(?@lal_char(01)_z)(0+1)*1(<2>1+<2>0)*' -o

Braces are now used instead of (*...) for generalized quantifiers.

a{0} => \e
a{1} => a
a{2} => a.a
a{5} => a.a.a.a.a

a{0,1} => \e+a
a{0,2} => \e+a+a.a
a{0,3} => \e+a+a.a+a.a.a

a{1,2} => a.(\e+a
a{1,3} => a.(\e+a+a.a)

a{2,5} => a.a.(\e+a+a.a+a.a.a)

a{0,} => a*
a{1,} => a.(a*)
a{4,} => a.a.a.a.(a*)


info output

Now displays the number of spontaneous transitions (0 for LAL).


Digits as letters

Recently broken by accident, support for digits as letters is restored.

$ vcsn-standard -e '(?@lal_char(01)_z)(0+1)*1({2}1+{2}0)*' -o
$ vcsn-evaluate -f '11111111'
$ vcsn-evaluate -f '101010'


Dot parser

This parser is now stricter than it used to be: be sure to escape backslashes in input Dot files, i.e.,

  0 -> 2 [label = "\e"]

is now invalid, write

  0 -> 2 [label = "\\e"]

This change was made because Graphviz treats "" (and renders it) exactly like "e".


New algorithm: Thompson

Conversion from rational expression to automata. Requires lan or law. The handling of weights might be changed in the near future.

$ vcsn-thompson -Ee '(?@lan_char(abc)_z){2}(a+{3}b+c)*{5}'
  vcsn_context = "lan_char(abc)_z"
  rankdir = LR
  node [shape = circle]
    node [style = invis, shape = none, label = "", width = 0, height = 0]
  { 0 1 2 3 4 5 6 7 8 9 }
  I8 -> 8
  0 -> 2 [label = "\\e"]
  0 -> 4 [label = "\\e"]
  0 -> 6 [label = "\\e"]
  1 -> 0 [label = "\\e"]
  1 -> 9 [label = "{5}\\e"]
  2 -> 3 [label = "a"]
  3 -> 1 [label = "\\e"]
  4 -> 5 [label = "{3}b"]
  5 -> 1 [label = "\\e"]
  6 -> 7 [label = "c"]
  7 -> 1 [label = "\\e"]
  8 -> 0 [label = "{2}\\e"]
  8 -> 9 [label = "{10}\\e"]
  9 -> F9


standard is the new name for standard-of

The "of" is useless, inconsistent with the other algorithms, and with the TAF-Kit v1 name.


Evaluation is fixed

Several initializations were incorrect, expecting the zero to be 0 (which is not the case for zmin for instance). There might also be some speed up.


New output format: tikz

Layout is dumb, yet this is useful to prepare LaTeX documents.


New output format: info

This pseudo format displays facts about the automaton (number of states and so on) or rational expressions (number of nodes).


Can now be called on LAL automata, for consistency with is-proper and is-valid.



The states are now numbered from 0.

Clang Compatibility

Clang++ (3.2 and 3.3) can now compile Vaucanson.


New LabelSet: nullableset

Initial support for "Labels are nullable".

$ cat
digraph lan
  vcsn_context = "lan_char(a)_b"
  I1 -> 1
  2 -> F2
  1 -> 2 [label = "a"]
  1 -> 2 [label = "\e"]
$ vcsn-cat -Af
  vcsn_context = "lan_char(a)_b"
  rankdir = LR
  node [shape = circle]
    node [style = invis, shape = none, label = "", width = 0, height = 0]
  { 1 2 }
  I1 -> 1
  1 -> 2 [label = "\\e, a"]
  2 -> F2
$ vcsn-eps-removal -Af
  vcsn_context = "lan_char(a)_b"
  rankdir = LR
  node [shape = circle]
    node [style = invis, shape = none, label = "", width = 0, height = 0]
  { 1 2 }
  I1 -> 1
  1 -> F1
  1 -> 2 [label = "a"]
  2 -> F2


New dyn algorithm and tool (vcsn-is-proper).


determinize speed-up

Determinization algorithm is five times faster.


$ time bin/vcsn-de-bruijn 18 | bin/vcsn-determinize -Af - -Onull
real    0m13.049s
user    0m12.773s
sys     0m0.276s
$ time bin/vcsn-de-bruijn 20 | bin/vcsn-determinize -Af - -Onull
real    0m56.780s
user    0m55.775s
sys     0m0.988s


$ time bin/vcsn-de-bruijn 18 | bin/vcsn-determinize -Af - -Onull
real     0m2.181s
user     0m2.048s
sys      0m0.132s
$ time bin/vcsn-de-bruijn 20 | bin/vcsn-determinize -Af - -Onull
real     0m8.653s
user     0m8.177s
sys      0m0.476s


hierarchy and API clean up

dyn:: was cleaned. Some headers were renamed:

vcsn/ctx/abstract_context.hh -> vcsn/dyn/context.hh
vcsn/algos/dyn.hh            -> vcsn/dyn/algos.hh
vcsn/core/automaton.hh       -> vcsn/dyn/automaton.hh

dyn::make_automaton was introduced to hide implementation details.


LabelSet renamings

The name LetterSet, UnitSet, and WordSet were not compliant. They have been renamed as letterset, unitset, and wordset.


Generalized quantifier as syntactic sugar for rational expressions

The new "(* min, max)" quantifier (postfix, like "*") allows to specify "powers" of an expression. For instance:

({a}b)(*0) => \e
({a}b)(*1) => {a}b
({a}b)(*2) => {a}b.{a}b
({a}b)(*5) => {a}b.{a}b.{a}b.{a}b.{a}b

({a}b)(*0,1) => \e+{a}b
({a}b)(*0,2) => \e+{a}b+({a}b.{a}b)
({a}b)(*0,3) => \e+{a}b+({a}b.{a}b)+({a}b.{a}b.{a}b)

({a}b)(*1,2) => {a}b.(\e+{a}b)
({a}b)(*1,3) => {a}b.(\e+{a}b+({a}b.{a}b))

({a}b)(*2,5) => {a}b.{a}b.(\e+{a}b+({a}b.{a}b)+({a}b.{a}b.{a}b))

({a}b)(*0,) => ({a}b)*
({a}b)(*1,) => {a}b.(({a}b)*)
({a}b)(*4,) => {a}b.{a}b.{a}b.{a}b.(({a}b)*)


Comment in rational expressions

The (?#...) construct allows to embed comments in rational expressions. They are discarded. There is no means to include a closing parenthesis in this construct.

Context in rational expressions

The (?@...) allows a rational expression to "carry" its context. Contrast for instance the two following runs.

$ vcsn-cat -C 'lal_char(xyz)_z'  -Ee '{42}x+{51}z'
$ vcsn-cat -Ee '(?@lal_char(xyz)_z){42}x+{51}z'

This will be used, eventually, so that TAF-Kit-like tools propagate the context in runs such as:

$ vcsn-cat -Ee '(?@lal_char(xyz)_z){42}x+{51}z' | vcsn-transpose -Ef -
1.1-5: invalid Boolean: 42

Currently it fails, as the "default" context is "lal_char(abc)_b".

New output format: fsm

An initial, and rough, support for Open FSM's format is provided. Currently there is no support at all for the weights.

$ vcsn-de-bruijn -O fsm 12 | fstcompile | fstdeterminize | wc -l

New output format: null

The output is discarded. This is useful for benching.


New algorithms: is-deterministic and complete

is-deterministic takes an automaton as argument and exits with code status 0 if the given automaton is deterministic, 2 otherwise.

complete also takes an automaton as argument and make it complete. If the given automaton is already complete, then it is unchanged.


Overhaul of the LAU, LAL, LAW implementation

So far a context was a triple: <Kind, LabelSet, WeightSet>, where (for instance), Kind is labels_are_letters, LabelSet is set_alphabet<char_letters>, and WeightSet is zmin). This is troublesome on several regards, the clearest being that the LabelSet makes no sense for LAU.

Now contexts are pairs: <LabelSet, WeightSet>, where this time the LabelSet (same name as before, different concept) can be a instance of UnitSet for LAU, WordSet for LAW, or LetterSet for LAL. These structures, in turn, are parameterized by the effective set of generators to use: for instance, LetterSet<set_alphabet<char_letters>>. Of course UnitSet is not parameterized.

As a first visible consequence, the name of the LAU contexts has changed:

lau_char_br => lau_br
lau_char(xyz)_ratexpset<lal_char(abc)_b> => lau_ratexpset<lal_char(abc)_b>


options renamed

The options to select the input and output format are renamed -I and -O (instead of -i and -o).

new option: -o for output file

The vcsn-* tools now support '-o FILE' to save the output in FILE.


product: strengthened preconditions

The product of automata requires LAL automata. The output alphabet is the intersection of the ones of the operands. This works as expected when the automata have disjoint alphabets.


dyn::product and vcsn-product

They compute the product of automata.


vcsn-determinize and vcsn-evaluate use the common command line options

These tools support -C, -g, etc. like the other tools. See "vcsn-<tool> -h".

context names are now complete

Context names used to describe the "static" structure only (e.g., lal_char_ratexpset<law_char_b>). It now includes the "dynamic" part, currently only the list of generators (e.g., lal_char(abc)_ratexpset<law_char(xyz)_b>).

This context strings are both printed and read by the various tools. For instance:

$ vcsn-standard-of -C 'law_char(xyz)_ratexpset<law_char(abc)_z>' \
      -e '{abc}xyz' | vcsn-aut-to-exp -f -

Note that the second tool, vcsn-aut-to-exp, found the context in its input, the standard automaton in Dot format.


dyn::ratexpset, dyn::context

These are now handled by shared pointer, consistently with dyn::automaton and dyn::ratexp.

Nasty memory management issues have been fixed.


It now supports the same arguments as the other vcsn-* tools. It also no longer requires 'a' and 'b' to be accepted letters, and it uses the whole alphabet. For instance "vcsn-de-bruijn -g 'xyz' 3" generates an automaton for "(x+y+z)x(x+y+z)^3".

dyn::ladybird, vcsn-ladybird

New dynamic algorithm, and new tool (which also supports the common command line options).


contexts are renamed

Contexts have both a name and an identifier. The name is used to display in a readable form the nature of the context, for instance in Dot output. The identifier is used for instance headers, or predefined contexts, and libraries.

So far names and identifiers are equal, but this will change.

As a first step, identifiers/names are now <Kind><LabelSet><WeightSet> instead of <LabelSet><WeightSet><Kind>. For instance:

char_b_lal  => lal_char_b
char_br_lal => lal_char_br
char_zr_lal => lal_char_zr
char_br_lau => lau_char_br
char_br_law => law_char_br
char_zr_law => law_char_zr


pprat is removed

It was designed for the test suite. The vcsn-* tools are now sufficient for the test suite, and are exposed to the user.

vcsn-*: option overhaul The different tools had already too many different calling conventions. They are now (quite) consistent.


Calls the default implementation of aut-to-exp.

$ vcsn-standard-of -Wz     -e '{2}(ab){3}' | vcsn-aut-to-exp -Af -
$ vcsn-standard-of -Wz -Lw -e '{2}(ab){3}' | vcsn-aut-to-exp -Af -


It now also supports lifting rational expressions.

$ vcsn-lift -C char_b_lal -Ee 'abc'


dotty -> dot

The name "dotty" was incorrect (as it denotes a program instead of the format). Therefore, every occurrence of "dotty" is now mapped to "dot".


The Dot output (input and output) now uses ", " as a label separator, instead of " + ".


dyn: input/output

Routines: dyn::read_(automaton|ratexp)_(file|string) and dyn::print take the input format. Available input/output are: - "dotty". - "text". - "xml".


genset is replaced by labelset

Through out the code.


dyn: input

New routines: dyn::read_(automaton|ratexp)_(file|string).

dyn: output

New routines: dyn::print, for both automata and RatExps.

bin: new tools

vcsn-cat, vcsn-transpose (both on RatExps only currently). vcsn-standard-of. vcsn-lift.


krat -> rat

kratexp, kratexpset etc. are renamed as ratexp, ratexpset, etc.

labels are unit

labels-are-empty/lae were mapped to labels-are-unit/lau.



For consistency with dyn::automaton, vcsn::ctx::abstract_context is renamed vcsn::dyn::context. Eventually, we might turn it into a shared pointer too.

dyn::de_bruijn, bin/vcsn-de-bruijn

New tools, useful for tests for instance.

$ vcsn-de-bruijn char_b_lal 2
  vcsn_context = "char_b_lal"
  vcsn_genset = "ab"
  rankdir = LR
  node [shape = circle]
    node [style = invis, shape = none, label = "", width = 0, height = 0]
  I1 -> 1
  1 -> 1 [label = "a + b"]
  1 -> 2 [label = "a"]
  2 -> 3 [label = "a + b"]
  3 -> 4 [label = "a + b"]
  4 -> F4


dyn::parse_file and parse_string

They construct dyn::automaton's.


For bad reasons, currently works only for char_b_lal

vcsn-determinize and vcsn-evaluate

Two new shell commands to write tests.


pprat works with abstract algorithms

pprat now uses only abstract (aka, dynamic) algorithms! On OS X, it is now a 77KB program; it was 11MB before.

This schedules its death: either it will be replaced by a set of smaller grain commands (vcsn-determinize, vcsn-standard-of, etc.) from which test cases will be easier to write, or it will be replaced by some early implementation of a TAF-Kit-like unified program (instead of one per context).


automata provide a vname

To dispatch algorithms such as dotty, we not only need to know the context type name, but also the automaton type name, as mutable_automaton and transpose_automaton are two different types for instance.



Because it uses only algorithms made abstract (make_context, make_automaton_editor, and dotty), the dot parser now works for any of the precompiled contexts!



In addition to the add_entry method of mutable_automaton, there is now an add_entry algorithm, which is templated by the automaton type. This algorithm provides an abstract interface to an unknown type of automaton.



For consistency, polynomials is renamed polynomialset.

mutable_automaton::add_entry and del_entry

The first of these new functions allows to add directly a list of transition between two states by passing the corresponding entry_t (this is most useful when reading an automaton with entries, such as with the Dot parser). The second one removes every existing transition between two states.


labels are empty

Initial work on labels-are-empty automata. See the unit/char_z_lae test. The labels are not displayed, but the "{...}" to denote the weights, are kept:

  node [shape=circle]
    node [style=invis,shape=none,label="",width=0,height=0]
  I1 -> 1
  2 -> F2 [label="{10}"]
  1 -> 2 [label="{51}"]
  2 -> 3 [label="{3}"]
  2 -> 1
  1 -> 1 [label="{42}"]
  1 -> 3

In that case, the transitions do not store labels.

lift now returns a labels-are-empty automaton/ratexp

Accordingly, pprat -l (lift) now displays:

$ pprat -Lw -l 'ab+cd'
  node [shape=circle]
    node [style=invis,shape=none,label="",width=0,height=0]
  1 -> 2 [label="{ab}"]
  2 -> F2
  I1 -> 1
  3 -> F3
  1 -> 3 [label="{cd}"]



It is now possible to load an automaton from its dotty output. Actually, it is possible to write simpler automata. This is no yet fully generic: it works properly only for char_b_lal.

The test program unit/parse-dot gives access to it. When fed with the following input file:

  vcsn_context=char_b_lal vcsn_genset="a"
  {1} -> {2 3} -> {4 5 6} [label=a]
  I -> 1
  {4 5 6} -> F

it produces an automaton, and dumps it using the dotty algorithm:

  node [shape=circle]
    node [style=invis,shape=none,label="",width=0,height=0]
  1 -> 2 [label="a"]
  1 -> 3 [label="a"]
  2 -> 4 [label="a"]
  2 -> 5 [label="a"]
  2 -> 6 [label="a"]
  3 -> 4 [label="a"]
  3 -> 5 [label="a"]
  3 -> 6 [label="a"]
  I1 -> 1
  4 -> F4
  5 -> F5
  6 -> F6


pprat uses -L for labels instead of -A

For consistency, since we now also use the name "labels" to denote the leaves of rational expressions (others that and ), -A is renamed -L.

Metadata are embedded in the Dot file

The pseudo name "A" which was used in every dotty output is no longer defined, as it is both optional and useless. The context name and the alphabet are also provided. For instance:

$ ./tests/unit/ladybird-b 2 | sed 4q
$ pprat -s -L z 'abc' | sed 4q
$ pprat -s -L zr -A w 'abc' | sed 4q

This is an experimentation, and the current choice is somewhat unsatisfactory. Instead of


it is probably more sensible to use


or maybe


so that when weightsets depend for instance upon an alphabet, it can be specified too. Instead of


(which does not define the alphabet used for the weightset), one would expect:


or maybe


Also, the name "kratexpset" is of course open to discussion:



It is now possible to read back polynomials such as "a+b+{2}a".


It is used more extensively to forbid meaningless calls, such as determinizing a law automaton.


Precompiled contexts

Several predefined contexts come with their own header (e.g., "ctx/char_b_lal"), and their own library (e.g., "libchar_b_lal"). This is provide for char_{b,z,zmin}_{lal,law}.

z_min renamed zmin

Consistently with Vaucanson 1.4.



Transposition on automaton is a read/write view: operations such as del_state, add_transition, etc. on a transposed automaton actually modify the wrapped automaton: set_final calls set_initial and so forth.

As an extreme example, the following snippet:

using context_t = vcsn::ctx::char_b;
using automaton_t = vcsn::mutable_automaton<context_t>;
using tr_automaton_t = vcsn::details::transpose_automaton<automaton_t>;
context_t ctx{{'a', 'b'}};
auto ks = ctx.make_kratexpset();
auto aut = vcsn::standard_of<tr_automaton_t>(ctx, ks.conv("a+a+a+a"));

applies the standard-of algorithm to a transposed mutable_automaton. In other words,


is the transposition of a standard automaton, except that it is a mutable_automaton, not a transpose_automaton<mutable_automaton>.



The "transpose" operation is implemented on words, weights, kratexps, and automata. pprat provides support to transpose on kratexps (option -t):

pprat -W zrr -t {{{2}ab}cd}abcd   =>   {{{2}ba}dc}dcba
pprat -W zrr -t {ab}(abcd)*{cd}   =>   ({dc}(dcba)*{ba})

and on (standard) automata (option -T):

$ pprat -A w -W br    -s '{ab}(\e+a+b({abc}c{bcd})*){cd}' >
$ pprat -A w -W br -T -s '{ab}(\e+a+b({abc}c{bcd})*){cd}' >
$ diff -W80 -t  -y
digraph A {                            digraph A {
  rankdir=LR                             rankdir=LR
  node [shape=circle]                    node [shape=circle]
  {                                      {
    node [style=invis,shape=none,la        node [style=invis,shape=none,la
    I1                                     I1
                                    >      I2
                                    >      I4
    F1                                     F1
    F2                              <
    F4                              <
  }                                      }
  1 -> F1 [label="{(ab).(cd)}"]     |    I1 -> 1 [label="{(dc).(ba)}"]
  I1 -> 1                           |    1 -> F1
  2 -> F2 [label="{cd}"]            |    I2 -> 2 [label="{dc}"]
  1 -> 2 [label="{ab}a"]            |    2 -> 1 [label="{ba}a"]
  4 -> F4 [label="{cd}"]            |    I4 -> 4 [label="{dc}"]
  4 -> 4 [label="{(abc).(bcd)}c"]   |    4 -> 4 [label="{(dcb).(cba)}c"]
  1 -> 3 [label="{ab}b"]            |    3 -> 1 [label="{ba}b"]
  3 -> 4 [label="{(abc).(bcd)}c"]   |    4 -> 3 [label="{(dcb).(cba)}c"]
}                                      }



An initial version of aut_to_exp is available. The new pprat option -a provides an access to this algorithm: apply aut_to_exp to the standard_of an expression.

pprat      -a 'a*'     => \e+(a.(a*))
pprat      -a '(a+b)c' => (a.c)+(b.c)
pprat -W z -a '{2}({3}a+{5}b){7}c{11}' => (({6}a.{7}c)+({10}b.{7}c)){11}

Currently, the only "heuristic" implemented eliminates the states in order. There are probably possible improvements.

pprat -a '(a+b)*' | wc -c => 265

pprat: -a and -w are renamed -A and -W

new factory: de Bruijn

Builds automata for (a+b)a(a+b)^n.


standard_of is part of vcsn::

It used to be in vcsn::rat::.



kratexpset, i.e. the object that provides operation on kratexps (with specified Gen and Weight), used to derive from abstract_kratexp (which is "opaque": it does not know the precise type that is used underneath).

Now, from abstract_kratexp we derive a concrete_abstract_kratexpset which aggregates a kratexpset. This means that kratexpset no longer derives from a weakly-typed ancestor, and can provide simple and strongly-typed routines.


For consistency with weightset/weight, genset/gen, kratexps (note the s) is renamed as kratexpset and std::shared_ptr<const rat::node> as kratexp.


A new algorithm which creates, from an automaton, another one with the same states and transitions, but the new automaton features only spontaneous transitions, whose weights correspond to the labels (and weights) of the initial one.

For instance:

$ pprat -aw -wz  -sl '({2}\e+{3}a){4}'
digraph A {
  node [shape=circle]
    node [style=invis,shape=none,label="",width=0,height=0]
  1 -> F1 [label="{8}"]
  I1 -> 1
  2 -> F2 [label="{4}"]
  1 -> 2 [label="{3}a"]

digraph A {
  node [shape=circle]
    node [style=invis,shape=none,label="",width=0,height=0]
  1 -> F1 [label="{{8}\\e}"]
  I1 -> 1
  2 -> F2 [label="{{4}\\e}"]
  1 -> 2 [label="{{3}a}\\e"]

RatExps: fix is_unit

is_unit simply checked that the expression was , but did not check that weight itself was the unit.


This variable allows to force the display of weights.


Contexts aggregate shared pointers

Now contexts are mutable, and hold (shared) pointers to (immutable) gensets and weightsets. This way, we can alter contexts (e.g., the ladybird factory can add the letters it needs in a new genset), yet there is good sharing, and identity can still be used to distinguish, for instance, two gensets defined equally.

It is also simpler to really expose them as pointers, so every "weightset().mul", etc. must be rewritten as "weightset()->mul".



The Kind parameter is now part of the context. The same type of Kind is now use for both RatExps and automata. This results in many significant simplifications.

For instance, again, the test case for product:


 using context_t = vcsn::ctx::char_z;
 context_t ctx { {'a', 'b', 'c'} };
 using automaton_t =
   vcsn::mutable_automaton<context_t, vcsn::labels_are_letters>;
 automaton_t aut1(ctx);


using context_t = vcsn::ctx::char_z;
context_t ctx { {'a', 'b', 'c'} };
using automaton_t = vcsn::mutable_automaton<context_t>;

Or the source of pprat:


using atom_kind_t
  = typename Factory::kind_t;
using label_kind_t
  = typename vcsn::label_kind<atom_kind_t>::type;
using context_t =
  vcsn::ctx::context<typename Factory::genset_t,
                     typename Factory::weightset_t,
context_t ctx{factory.genset(), factory.weightset()};
using automaton_t = vcsn::mutable_automaton<context_t>;
auto aut = vcsn::rat::standard_of<automaton_t>(ctx, e);


using context_t = typename Factory::context_t;
using automaton_t = vcsn::mutable_automaton<context_t>;
auto aut = vcsn::rat::standard_of<automaton_t>(factory.context(), e);



"Contexts" were introduced to factor two aspects that are required through out the library: the GenSet type (i.e., the nature of the generators), and the WeightSet type (i.e., the nature of the weights). Not only do contexts define these types, they must also be instantiated so that "run-time" details be known: for instance the set of generators is dynamic (what are the allowed letters), and on occasion the weightset also needs run-time information (e.g., when RatExp are parameterized by RatExp, what is the alphabet of the latter ones?).

As an example of the changes on the user side, consider the product test-case.


 typedef vcsn::set_alphabet<vcsn::char_letters> alpha_t;
 typedef vcsn::mutable_automaton<alpha_t, vcsn::z,
                                vcsn::labels_are_letters> automaton_t;
 vcsn::z z;
 alpha_t alpha{'a', 'b', 'c'};
 automaton_t aut1(alpha, z);


 using context_t = vcsn::ctx::char_z;
 context_t ctx { {'a', 'b', 'c'} };
 using automaton_t =
   vcsn::mutable_automaton<context_t, vcsn::labels_are_letters>;
 automaton_t aut1(ctx);


RatExp: atoms are words

Expressions such as "(ab)(ab)" used to be equivalent to "(abab)" (a single four-letter atom). Now:

pprat -aw '(ab)(ab)'   => (ab).(ab)
pprat -aw 'abab'       => abab
pprat -aw 'ab.ab'      => (ab).(ab)
pprat -aw 'ab(ab)abc*' => (ab).(ab).(ab).(c*)


New algorithm: eval, evaluates a word over an (weighted) automaton

Defined in vcsn/algos/eval.hh as vcsn::eval.

New algorithm: determinize, Boolean automaton determinization

Defined in vcsn/algos/determinize.hh as vcsn::determinize.


RatExp: changes in the display

Fixed the output of "atoms are words" expressions. For instance "(ab)" used to be displayed as "ab" (which is wrong as it is parsed as "a(b)"). It is now properly displayed as "(ab)".

RatExp: slight changes in the grammar

A star is now valid after a weight:

$ pprat -w z '{2}ab{3}*'


dotty: define initial/final states first

In order to improve readability, instead of

digraph A {
  node [shape=circle];
  F1 [style=invis,shape=none,label="",width=0,height=0]
  1 -> F1 [label="{a.a.((d.d)*)}"]
  3 -> 2 [label="{(d.d)*}b"]
  I1 [style=invis,shape=none,label="",width=0,height=0]
  I1 -> 1
  1 -> 2 [label="{a.a.((d.d)*)}b"]
  F3 [style=invis,shape=none,label="",width=0,height=0]
  3 -> F3 [label="{(d.d)*}"]
  2 -> 3 [label="b"]

we now produce

digraph A {
  node [shape=circle]
    node [style=invis,shape=none,label="",width=0,height=0]
  1 -> F1 [label="{a.a.((d.d)*)}"]
  3 -> 2 [label="{(d.d)*}b"]
  I1 -> 1
  1 -> 2 [label="{a.a.((d.d)*)}b"]
  3 -> F3 [label="{(d.d)*}"]
  2 -> 3 [label="b"]


RatExp: support for the kind of atoms

As a major overhaul, the rational expressions (vcsn::rat::node) are now parameterized by Atom, which denotes the atom value. The kratexps structure is now parameterized by the Kind, from which it is deduced, from the GenSet parameter, whether we should use word_t or letter_t atoms.

pprat: an option -a

To provide user-access to these feature, pprat now supports an option -a, which accepts "letters" or "words" as argument, with obvious meaning. For instance:

pprat -a letters 'abc' => a.b.c
pprat -a letters '' => a.b.c.a.b.c
pprat -w br -al '{aa}bb{c}dd{a}' => ({a.a}(b.b).{c}(d.d)){a}

pprat -a words 'abc' => abc
pprat -a words '' =>
pprat -w br -aw '{aa}bb{c}dd{a}' => ({aa}bb.{c}dd){a}

Of course, this also works with the "standard-of" option:

$ pprat -w br -al '{aa}({dd}\e+bb)*'
$ pprat -s -w br -al '{aa}({dd}\e+bb)*'
digraph A {
  node [shape=circle];
  F1 [style=invis,shape=none,label="",width=0,height=0]
  1 -> F1 [label="{a.a.((d.d)*)}"]
  3 -> 2 [label="{(d.d)*}b"]
  I1 [style=invis,shape=none,label="",width=0,height=0]
  I1 -> 1
  1 -> 2 [label="{a.a.((d.d)*)}b"]
  F3 [style=invis,shape=none,label="",width=0,height=0]
  3 -> F3 [label="{(d.d)*}"]
  2 -> 3 [label="b"]


$ pprat  -w br -aw '{aa}({dd}\e+bb)*'
$ pprat -s -w br -aw '{aa}({dd}\e+bb)*'
digraph A {
  node [shape=circle];
  F1 [style=invis,shape=none,label="",width=0,height=0]
  1 -> F1 [label="{aa.(dd*)}"]
  2 -> 2 [label="{dd*}bb"]
  F2 [style=invis,shape=none,label="",width=0,height=0]
  2 -> F2 [label="{dd*}"]
  1 -> 2 [label="{aa.(dd*)}bb"]
  I1 [style=invis,shape=none,label="",width=0,height=0]
  I1 -> 1


RatExp: in some case the weights could be lost

"Associativity" was applied too eagerly, which would result in loss of some weights. E.g. {a}bb{c}dd resulted in b.b.d.d, not it evaluates to {a}(b.b).{c}(d.d).


RatExp: improved pretty-printing

The outermost pair of parentheses is removed if useless. For instance:

 (a.b) => a.b
 (a+b+c) => a+b+c
 (a*) => a*

But in the following examples they are kept.


standard-of: star is fixed

Standard-of seems to be correct.


Expressions overhaul

They are immutable: we no longer make side effects on expressions. They are shared_ptr, no longer plain pointers. They can be used like the other values, by value.


standard-of is fully implemented

Support for star was implemented, and checked for B and Z. For implementation reasons, one cannot yet use rational expressions as weights.



RatExp::head and tail


In order to improve the readability of its output, it no longer "defines" the reachable states. See the following diff:

 digraph A {
   node [shape=circle];
-  1
-  2
-  3
-  4
   I1 [style=invis,shape=none,label="",width=0,height=0]
   I1 -> 1 [label="{6}"]
   F1 [style=invis,shape=none,label="",width=0,height=0]
   1 -> F1
   1 -> 2 [label="a"]
   1 -> 3 [label="a"]
   2 -> 4 [label="{3}b"]
   F4 [style=invis,shape=none,label="",width=0,height=0]
   4 -> F4
   4 -> 3 [label="a"]

standard-of: many fixes in the handling of the weights

An expression such as {12}\e used to leave the weight in the initial transition; it is now on the final transition. More generally the initial transition always has unit as weight.

The product and sum of expressions now handle the left and right weights.

Accepting initial states in expressions such as "+a" are no longer lost.


Many renamings

alphabet_t/alphabet() -> genset_t/genset(), etc.
factory -> abstract_kratexp.
factory_ -> kratexp.
initials() -> initial_transitions(), etc.
invalid_state -> null_state, etc.
nb_state() -> num_states(), etc.
polynomial -> polynomials, etc.



An implementation of the product of two automata is available.



An example of tropical semiring, to test show_unit().



This method return a special, reserved character, that is used to
label initial and final transitions.  This character is not part of
the alphabet and is never output.



Initial work on "+".

mutable_automata are implemented using a pre() and post() states

What is missing is the correct '$' letter on the initial and final
transitions.  The current value is the default value for label_t.

The previous interface has been preserved (but maybe we should
clean it) except for one change:

  initial() and final() have been changed to return a pseudo
  container of transitions.  These transitions give us both
  the weight and the initial/final state.

The following methods are new:

  pre(), post()         returns the pre-initial and post-final state.
  all_states()          returns all states, including pre() and post()
  all_transitions()     returns all transitions, including initial and
                          final transitions
  all_entries()         likewise for entries
  all_out(s), all_in(s) likewise for outgoing and ingoing transitions

The methods get_initial_weight(s) and get_final_weight(s) are slower
now, because they need to locate the corresponding initial/final
transition.  For the same reason, is_initial(), is_final() are also



Initial version of standard-of is implemented. Can be tested with pprat's new option -s:

$ pprat -wz -s '{123}a'
digraph A {
  node [shape=circle];
  I1 [style=invis,shape=none,label="",width=0,height=0]
  I1 -> 1
  F2 [style=invis,shape=none,label="",width=0,height=0]
  2 -> F2
  1 -> 2 [label="{123}a"]

mutable_automaton uses unsigned for state_t and transition_t.

This allows to store states in a std::vector<stored_state_t>. Likewise for transitions. Erased elements are marked (so they are skipped over during iteration), and added to a free store to be reused later.

mutable_automaton does not store any weight when WeightSet == b.

mutable_automaton has a read-only entry interface

entries() is a pseudo container that filters transitions()
  to see each (src,dst) pair at most once.

entry_at(src, dst) and entry_at(t) return a polynomial describing
  the entry between (src, dst) or (src_of(t), dst_of(t)).

entryset() returns the WeightSet that can be used to manipulate
  these polynomials.

make check-rat, make check-unit

There is a check target for each subdirectory of tests/.


Alphabets are checked

pprat is hard-coded to use a, b, c, d for all the alphabets (including for inner rational expressions):

$ pprat -w b -e 'y'
1.1: invalid word: y: invalid letter: y

Weights are checked

As follows:

$ pprat -w b -e '{12}a'
1.1-5: invalid Boolean: 12

Unfortunately the locations are bad currently for complex weights:

$ pprat -w zr -e '{x}a'
1.1-4: 1.1: invalid word: x: invalid letter: x

To be fixed.


G++ 4.7 is required

We use constructs that are not supported by 4.6 (e.g., constructor delegation).


As a demonstration that rational expressions can be weights of rational expressions, pprat supports '-w zrr' (Rat<Rat<Z>>):

 $ pprat -w zr -e '{{{2}{3}a}u}x{{{4}{5}\e}\e}'