Difference between revisions of "Vcsn/News File"


(Rename Vaucanson namespace into Vcsn)
(Fix vaucanson links to vcsn links)
Line 552: Line 552:
Release of our fourth beta, vaucanson-2b.3. Available on MacPorts as "vaucanson".
Release of our fourth beta, vaucanson-2b.3. Available on MacPorts as "vaucanson".
See https://www.lrde.epita.fr/wiki/Vaucanson/Vaucanson2b3.
See https://www.lrde.epita.fr/wiki/Vcsn/Vaucanson2b3.
=== A Virtual Machine for easy experiments ===
=== A Virtual Machine for easy experiments ===

Revision as of 16:10, 6 November 2014

Vaucanson 2 news

This file describes user visible changes in the course of the development of Vaucanson 2, in reverse chronological order. On occasions, significant changes in the internal API may also be documented.

Vaucanson 2.0


More than two years after the initial commit in our repository, the Vaucanson team is happy the announce the first release of Vaucanson 2!

People who worked on this release:

  • Akim Demaille
  • Alexandre Duret-Lutz
  • Alfred M. Szmidt
  • Antoine Pietri
  • Canh Luu
  • Jacques Sakarovitch
  • Luca Saiu
  • Sylvain Lombardy
  • Valentin Tolmer


"series" are now supported (static, dyn, Python)

Ratexpsets now take a constructor parameter specifying which set of identities to enforce, currently with two possible values: "trivial" and "series". Old-style ratexps only support trivial identities, while series also support sum commutativity and product distributivity over sum. Heterogeneous operations involving both trivial ratexps and series ratexps are possible.

In case of a ratexpset belonging to a context, the default value for the optional parameter is always trivial. The "seriesset" alias, which does not accept an identity parameter, is also supported.

"Series" is technically a misnomer: the data structure is a better approximation of the mathematical concept of series and in particular yields a semiring structure, but actual series equality remains undecidable in the general case.

Series currently depend on the commutativity of their weightset, and only support a subset of the available operations.

>>> vcsn.context('lal_char(a-c)_ratexpset<lal_char(x-z)_b>') # series
>>> vcsn.context('lal_char(a-c)_seriesset<lal_char(x-z)_b>') # series
>>> vcsn.context(ratexpset<lal_char(x-z)_b>_z')              # trivial
>>> vcsn.context('lal_char(a-c)_ratexpset<lal_char(x-z)_b>(trivial)')
>>> e = vcsn.context('lal_char(a-c)_z').ratexp('b+a+b'); e
>>> s = vcsn.context('lal_char(a-c)_z').series('b+a+b'); s
>>> (e+s).is_series()

has_twins_property: new algorithm

Computes whether an automaton has the twins property.


determinize supports weighted automata

The determinize algorithm (dyn and Python) now accepts an optional 'algo' argument.

When 'algo' is:

the fast Boolean-only implementation is used. It always terminates.
any weightset is supported. On some inputs, it may not terminate.
"auto" (default value)
"boolean" if the automaton is Boolean, "weighted" otherwise.


are-equivalent is generalized

The 'are_equivalent' algorithm (a1.is_equivalent(a2) in Python) now supports weights in fields (e.g., Q or R), but also on Z. The labelset still must be free.


reduce: new algorithm on automata

Implements the Schützenberger algorithm for reduction of representations for any skew field. As a special case, automata with weights in Z are also supported.


determinization is more robust to large alphabets

In order to optimize (Boolean) determinization, 'determinize' no longer accepts an optional 'complete' argument to require a deterministic complete automaton. One must now call 'complete()' afterwards.

The speed improvements are (erebus: OS X i7 2.9GHz 8GB, Clang 3.5 -O3 -DNDEBUG):

  (1)   (2)   (3)
 7.93s 7.80s 7.43s: a.determinize()      # a = ladybird(21)
 6.64s 6.45s 0.84s: a.determinize()      # a = lal(a-zA-Z0-9).ladybird(18)

where (1) is the "original" version, (2) is the version without the optional completion of the determinized automaton (yes, it was set to False in the bench of (1)), and (3) is the current version, which avoids considering unused letters.


prefix, suffix, factor, subword: new algorithms

These four new functions (static, dyn and Python) take an automaton and return another that accepts a superset of its language:

- suffix makes each accessible state final (with unit weight)

- prefix makes each coaccessible state initial

- factor makes each useful state initial and final

- subword applies the Magnus transform: for each non spontaneous
  transition (src, label, weight, dst), it adds a spontaneous transition
  (src, one, weight, dst).


minimize: new subset decorators, except for the "brzozowski" variant

Minimizing an automaton now yields a decorated automaton keeping track of source state names. The new "subset decorator" code is decoupled from minimization and is intended to be used for other algorithms as well.

>>> a = vcsn.context('lal_char(a-z)_b').ratexp('a+b*e+c+dc').standard(); a
>>> a.minimize()

The minimize algorithm no longer recognizes the "brzozowski" variant at the Static level, as it would require a very different, and likely uninteresting, decorator; the user can still directly call minimize_brzozowski at the Static level. We still support "brzozowski" as a variant at the Dyn and Python levels.


efsm: support for transducers

Exchange with OpenFST via efstcompile/efstdecompile now supports transducers.


TAF-Kit is phased out

Existing commands for TAF-Kit are left and are occasionally useful. However, today the last existing TAF-Kit test has been converted to Python: TAF-Kit is no longer checked at all by the test suite.

lift: now bound in Python

One can now lift automata and ratexps from Python.


Minimize: extend 'weighted' and 'signature' variants

Lift arbitrary restrictions on the labelset of the 'weighted' and 'signature' minimization variants.

>>> ctx = vcsn.context('law_char(a-z)_b')
>>> ctx.ratexp('ab+<3>cd+ac').standard().minimize('weighted')


More systematic use of decorators

In addition to the venerable transpose_automaton, several automaton decorator types have been introduced (blind_automaton, determinized_automaton, pair_automaton, product_automaton, ratexp_automaton), and are now used in the many algorithms, including: compose, determinize, product, shuffle, infiltration, synchronizing_word, etc.

Coupled with the fact that automata can now display state names (as opposed to numbers) in Dot output, one gets rich displays of automata. For instance:

>>> ctx = vcsn.context('lal_char(ab)_b')
>>> ctx.ratexp('aa+ab').derived_term().determinize()

now displays an automaton whose states are labeled as sets of ratexps: "{aa+ab}", "{a, b}", and "{}".

Use 'automaton.strip()' to remove state names.


dot2tex format

Automata now support for "dot2tex" format, meant to be used with the dot2tex program. It allows to combine TikZ's nice rendering of automat, LaTeX's math mode to render state and transition labels, with dot automatic layout.

IPython: an interactive display

Starting with IPython 2.0, running 'aut.display()' provides an interactive display of an automaton, with means to select the display mode (e.g., "dot", "dot2tex", etc.).

shuffle: now available on ratexps

Available in static, dyn, and Python.


Optimization: composing two automata transposition yields the identity

Transposing an automata twice now yields the original automata, instead of an automata wrapped by two decorator layers.

is-isomorphic: extend to any context and any automata

Lift the previous limitation of is-isomorphic to deterministic lal automata. The sequential case keeps its linear complexity, but the new generic code has a worst-case complexity of O((n+1)!); however the common case is much faster, as we heuristically classify states according to in- and out-transitions, restricting brute-force search to states which are possible candidates for isomorphism.

>>> ctx = vcsn.context('lal_char(a-z)_b')
>>> a = ctx.ratexp('ab+<3>ab+ab+ac').standard()
>>> b = ctx.ratexp('ab+ac+<3>ab+ab').standard()
>>> a.is_isomorphic(b)
>>> at = ctx.ratexp('abc').standard().transpose()
>>> c = ctx.ratexp('cba').standard()
>>> at.is_isomorphic(c)


Automata are now handled via shared pointers

Types such as vcsn::mutable_automaton<Ctx> used to support the so called "move semantics" only. In particular, simple assignment between automata was not possible. This intentional limitation was introduced to avoid accidental lose of performance by unexpected deep copies of automata, while keeping partly "value semantics".

This, however, resulted in too many constraints, especially when one wants to embed automata in other structures (which is for instance the case of transpose_automaton which wraps an automaton).

To address these issues, types such as vcsn::mutable_automaton<Ctx> are now std::shared_ptr. This does change the programming style, both when instantiating an automaton (typically now using make_mutable_automaton), and when using it (with '->' instead of '.'):

Instead of:

automaton_t aut{ctx};
auto s1 = aut.new_state();


auto aut = vcsn::make_mutable_automaton(ctx);
// or: auto aut = vcsn::make_shared_ptr<automaton_t>(ctx);
auto s1 = aut->new_state();


Polynomials are now usable as weights

Polynomialsets are now usable as a generic weightset. Polynomials are mostly useful on law and ratexpset.

>>> ctx = vcsn.context('lal_char(abc)_polynomialset<law_char(xyz)_z>')
>>> ctx.ratexp('<x + xy + x + \e>a')
<\e + <2>x + xy>a


ratexps: support for negated letter classes

We may now use '[^...]' to denote a letter other than the listed ones. The special case '[^]' denotes any character of the alphabet.

>>> c = vcsn.context('lal_char(0-9)_b')
>>> c.ratexp('0+[^0][^]*')

ratexps: invalid letter classes are rejected

Instead of being ignored, invalid intervals, or empty classes, are now rejected.

>>> c.ratexp('[9-0]')
RuntimeError: invalid letter interval: 9-0
>>> c.ratexp('[]')
RuntimeError: invalid empty letter class

ratexps: improved support for letter classes

Previously letter classes were supported only for context on top of a simple alphabet (LAL, LAN and LAW). Generators of more complex contexts such as LAL x LAN are now supported:

>>> c = vcsn.context('lat<lal_char(abc),lan_char(xyz)>_b')
>>> c.ratexp("['(a,x)''(c,z)']")

>>> c.ratexp("[^'(a,x)''(c,z)']")

>>> c.ratexp("['(a,x)'-'(a,z)']")

>>> c.ratexp("[^]")


compose: new algorithm (static, dyn, Python)

A new algorithm has been introduced to allow the composition of two transducers. It computes the accessible part of the transducer resulting from the composition of the second tape of the first transducer with the first tape of the second one.

>>> c1 = vcsn.context('lat<lan_char(abc),lan_char(ijk)>_b')
>>> c2 = vcsn.context('lat<lan_char(ijk),lan_char(xyz)>_b')
>>> t1 = c1.ratexp("('(a,i)'+'(b,j)'+'(c,k)')*").thompson()
>>> t2 = c2.ratexp("('(i,x)'+'(j,y)'+'(k,z)')*").standard()
>>> t1.compose(t2).proper().shortest(8)
(\e,\e) + (a,x) + (b,y) + (c,z) + (aa,xx) + (ab,xy) + (ac,xz) + (ba,yx)


insplit: new algorithm (static, dyn, Python)

It is now possible to do the in-splitting of an automaton, i.e. to get the equivalent automaton such that each state have either only incoming epsilon-transitions or no incoming epsilon-transitions. This is the first step in a product algorithm that supports epsilon-transitions.


Multiplication by a weight no longer requires a standard automaton

The right-multiplication by a weight uselessly required a standard automaton; it now accepts any automaton. The left-multiplication keeps its specification for standard automata, but now supports non-standard automata, in which case the weight is put on the initial arrows.

Multiplication by 0 is fixed

The multiplication of an automaton by the null weight results in the "zero automaton": a single state, initial, and no transitions.


Python syntactic sugar

Multiplication by a scalar on the left, and on the right, can be performed with implicit conversion of the weights. Instead of

>>> z = vcsn.context('lal_char(ab)_z')
>>> r = z.ratexp('[ab]*')
>>> z.weight(2) * r * z.weight(3)

one can now write

>>> 2 * r * 3

and similarly for automata.

Besides, repeated &-product can be denoted with the same symbol, &: 'a & 3' denotes 'a & a & a'.


Vaucanson 2 has moved

Vaucanson 2 is now also hosted on gitlab.lrde.epita.fr. It is also renamed vaucanson.git, rather than vaucanson2.git. To update your existing repository, run a command similar to:

$ git remote set-url origin git@gitlab.lrde.epita.fr:vcsn/vaucanson

or edit your .git/config file to update the URL.

You may also visit http://gitlab.lrde.epita.fr/vcsn/vaucanson.

dyn::label is born

The familly of dynamic object (which includes dyn::automaton, dyn::weight, dyn::polynomial, dyn::ratexp, and others) now features a dyn::label. Several algorithms dealing with labels used the "std::string" C++ type to this end. This was inadequate, and the corresponding signatures have been cleaned bottom up.

For instance "derivation" used to have the following signatures


template <typename RatExpSet>
derivation(const RatExpSet& rs, const typename RatExpSet::ratexp_t& e,
           const std::string& word, bool breaking = false)


polynomial derivation(const ratexp& exp, const std::string& s,
                      bool breaking = false);

These signatures are now:

template <typename RatExpSet>
derivation(const RatExpSet& rs, const typename RatExpSet::ratexp_t& e,
           const typename RatExpSet::labelset_t::word_t& word,
           bool breaking = false)

polynomial derivation(const ratexp& exp, const label& l,
                      bool breaking = false);

At the Python level, derivation was adjusted so that one may still pass a string, and see it upgraded, so both these calls work:

>>> ctx = vcsn.context('lal_char(ab)_z')
>>> r = ctx.ratexp('(<2>a)*')
>>> r.derivation(ctx.word('aa'))
>>> r.derivation('aa')

Context extraction

It is now possible to obtain an automaton or a ratexp's context. In dyn, use "dyn::context_of(obj)", in Python use "obj.context()".

Copy can now change the type of the automaton

vcsn::copy used to support only "homogeneous" duplications. It also offers access to the the origins (a map from resulting states to origin states).

Variadic product of automata

The synchronized product of automata is now variadic: the product of n-automata directly builds an automaton labelled with n-uples of original states. The Python operator, &, is modified to pretend it is variadic (rather than binary): it delays the computation until the result is needed. The ".value()" method allows to force the evaluation.

>>> ctx = vcsn.context('lal_char(ab)_z')
>>> a1 = ctx.de_bruijn(5)
>>> a = ctx.ratexp('a{5}').derived_term()
>>> import timeit
>>> timeit.timeit(lambda: (((a1&a1).value() & a1).value() & a).value(),
>>> timeit.timeit(lambda: (a1 & a1 & a1 & a).value(), number=1000)

Be aware that if the result is not needed, then it is simply not computed at all (hence, appears to be blazzingly fast):

>>> timeit.timeit(lambda: a1 & a1 & a1 & a, number=1000)

"Fine grain" runtime compilation works

So far dyn:: provided support for "context" runtime compilation, i.e., if a context is unknown (e.g., "lat<lal_char(ab), lal_char(xy)>_b"), C++ code is emitted, compiled, and loaded. Now Vaucanson also supports per-algorithm runtime compilation.

Brzozowski minimization.

It is now possible to ask for the Brzozowski's minimization:

>>> a = vcsn.context('lal_char(ab)_b').ratexp('a+b').standard()
>>> a.minimize('moore').info()['number of states']
>>> a.minimize('signature').info()['number of states']
>>> a.minimize('brzozowski').info()['number of states']


GCC is back in business

GCC compiles Vaucanson incorrectly (i.e., Vaucanson appears to behave incorrectly, but it is actually the compiler that is incorrect). For this reason, some features were disabled with such a compiler, and recently GCC support was completely dropped.

Efforts were put in finding reasonable workarounds for these bugs, and now GCC is, again, supported. Parts that used to be disabled are now supported.

For more information on this bug, see the following problem report http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51253.


Vaucanson 1 has moved

Vaucanson 1 is now hosted on gitlab.lrde.epita.fr. It is also renamed vaucanson1.git, rather than just vaucanson.git. To update your existing repository, run a command similar to:

$ git remote set-url origin git@gitlab.lrde.epita.fr:vcsn/vaucanson1

or edit your .git/config file to update the URL.

You may also visit http://gitlab.lrde.epita.fr/vcsn/vaucanson1.


ratexps conjunctions

The "intersection" operation on ratexps is renamed as "conjunction".


Improve treatment of nullable labelsets

As introduced earlier, nullablesets are now written "lan<...>", for instance "lan<lal_char(ab)>b". The former syntax, e.g., "lanchar(ab)_b", is kept for backward compatibility and ease of use.

Up to now, "lan<>" would only work on lal and labelsets with already a "one" provided, making the improvement quite useless. Now "lan<...>" will manufacture a "one" for the labelsets that don't have one, so it can be wrapped around anything.

In addition to that, simplifications are applied; for instance "lan<lan<lan_char(ab)>>b" actually builds "lan<lal_char(ab)>b". Similarly, "lan<lat<lan_char(ab), law_char(ab), ratexpset<lal_char(ab)_b>>>b" generates "lat<lan<lal_char(ab)>, lawchar(ab), ratexpset<lal_char(ab)_b>>_b", since the tuple wrapped in the outer lan already has a "one", given that all of its components have one.


concatenation and conjunction of ratexps accept more heterogenous arguments

Conversions of both labels and weights are performed if needed.

>>> a = vcsn.context('ratexpset<lal_char(xy)_b>_z').ratexp("<2>'x*'")
>>> b = vcsn.context('lal_char(b)_q')              .ratexp('<1/3>b')
>>> a * b
>>> (a*b).info()['type']

>>> ab = vcsn.context('lal_char(ab)_z').ratexp('(a+b)*')
>>> bc = vcsn.context('lal_char(bc)_b').ratexp('(b+c)*')
>>> ab & bc
>>> (ab & bc).info()['type']


left-mult and right-mult are bound in Python

Both work on automata and ratexps. Left multiplication now has its arguments in a more natural order in the C++ API: (weight, automaton), instead of the converse previously. TAF-Kit still has it the old way. The Python operator * (left-associative) is overloaded to provide syntactic sugar.

>>> z = vcsn.context('lal_char(abc)_z')
>>> a = z.weight("12") * z.ratexp('ab').standard() * z.weight("23")
>>> a.ratexp()

>>> z.weight("12") * z.ratexp('ab') * z.weight("23")


sum of rational expressions accepts more heterogenous arguments

Conversions of both labels and weights are performed if needed.

>>> a = vcsn.context('ratexpset<lal_char(xy)_b>_z').ratexp("<2>'x*'")
>>> b = vcsn.context('lal_char(b)_q')              .ratexp('<1/3>b')
>>> a + b

concatenation accepts more heterogenous arguments

As for products and union, it is now possible to compute the concatentation of automata with different types:

>>> z = vcsn.context('lal_char(a)_z').ratexp('<2>a')  .derived_term()
>>> q = vcsn.context('lal_char(b)_q').ratexp('<1/3>b').derived_term()
>>> r = vcsn.context('lal_char(c)_r').ratexp('<.4>c') .derived_term()
>>> (z*q*r).ratexp()


Bug: standardization of automata

The standardization of an automaton no longer leaves former initial states that became non-accessible.

Bug: right-mult on automata

Its behavior was completely wrong.


parse and display letter classes in transitions

It is now possible to directly write labels of transitions with letter classes. For instance '[a-ky]' denotes every letter between a and k in the alphabet, or y.

Transition labels with equal weights are displayed this way. For instance 'g, a, b, d, c, f' becomes [a-dfg] and '<2>a, <2>b, <2>c" is displayed '<2>[abc]', but 'a, b' stays in this form.


union accepts more heterogenous arguments

As for products (Hadamard, shuffle, infiltration), it is now possible to compute the union of automata with different types:

>>> z = vcsn.context('lal_char(a)_z').ratexp('<2>a')  .derived_term()
>>> q = vcsn.context('lal_char(b)_q').ratexp('<1/3>b').derived_term()
>>> r = vcsn.context('lal_char(c)_r').ratexp('<.4>c') .derived_term()
>>> (z|q|r).ratexp()

automaton product optimization.

An optimization enabled by state renumbering.

Score changes on Luca's workstation (before/after):

4.60s  2.87s: a.product(a)         # a = std([a-e]?{50})
2.34s  2.37s: a.shuffle(a)         # a = std([a-e]?{50})
4.01s  2.58s: a.infiltration(a)    # a = std([a-e]?{30})
4.17s  2.96s: a**12                # a = std([a-e]*b(<2>[a-e])*)


state renumbering is not automatic any longer.

In view of several forthcoming in-place algorithms on automata including edit, we changed the automaton output code not to automatically renumber states at the static, dyn and Python levels. A new "sort" algorithm is available to renumber states (breadth-first and then by outgoing transition label) and reorder transitions (by label) in a predictable way, when explicitly requested. Tafkit output is automatically sorted.

Having a predictable numbering will enable future optimizations on automaton product and other algorithms accessing all the transition from a given state by label.


double_ring: New Python binding

Returns a double ring automaton.


ratexpset can be used for LabelSet

Contexts such as 'ratexpset<lal_char(ab)_b>b' are now valid. The eliminatestate algorithm works properly on automata of such type.


ratexp.info: New Python binding

Like automaton.info, returns a dictionary of properties of the ratexp.


are-isomorphic/is-isomorphic: new algorithm (static, dyn, tafkit, Python)

Given two automata, check whether they are isomorphic to one another. Currently implemented in the deterministic case only, for lal contexts.

>>> ctx = vcsn.context('lal_char(ab)_z')
>>> ctx.ratexp('a+b*').standard().is_isomorphic(ctx.ratexp('b*+a').standard())

weighted minimization (static, dyn, tafkit, Python)

A third implementation of the minimization algorithm named "weighted" is now available, supporting any lal context.

The new variant is a more widely applicable generalization of the "signature" implementation. The new variant is the default for non-boolean weightsets; it requires a trim automaton but does not rely on determinism or other properties. Preliminary measures show performance to be close to "signature", or even clearly superior in the case of sparse automata such as dictionaries.

Vaucanson 2b.3


Release of our fourth beta, vaucanson-2b.3. Available on MacPorts as "vaucanson".

See https://www.lrde.epita.fr/wiki/Vcsn/Vaucanson2b3.

A Virtual Machine for easy experiments

Installing Vaucanson 2 on some platforms can be tedious. Clément Démoulins contibuted a Virtual Machine that makes it easy to experiment with Vaucanson 2 without having to compile it. He also contributed a Vagrantfile to make it even easier to deploy.

To install a Vaucanson virtual machine, please follow this procedure:

1. Install VirtualBox
   From your distro, or from https://www.virtualbox.org/wiki/Downloads.

2. Install Vagrant
   From your distro, or from http://www.vagrantup.com/downloads.html

3. Download this Vagrantfile and save it somewhere.

   For instance
   $ mkdir ~/src/vcsn2
   $ cd ~/src/vcsn2
   $ wget https://www.lrde.epita.fr/dload/vaucanson/2.0/Vagrantfile

4. Run Vagrant (first time will be slow: let it download the VM)

   $ cd ~/src/vcsn2
   $ vagrant up

   Vaucanson is runnning!

5. Open http://localhost:8888 in your favorite browser.

6. Experiment!  (Hit Shift-Enter to evaluate):
   import vcsn

7. Turn your VM off when you are done
   $ vagrant halt


ratexp difference: new algorithm (static, dyn, Python)

The "difference" algorithms generates a ratexp that accepts words of the left-hand side that are not accepted by the right-hand side ratexp. Also bound as the "%" operator in Python.

>>> ctx = vcsn.context('lal_char(abc)_b')
>>> l = ctx.ratexp('[abc]*')
>>> r = ctx.ratexp('[ab]*')
>>> l.difference(r)
>>> l % r


ratexp intersection: new algorithm (static, dyn, Python)

The "intersection" algorithm computes a ratexp that denotes the Hadamard product of two rational expressions. Also bound as the "&" operator in Python.

>>> ctx = vcsn.context('lal_char(abc)_b')
>>> r = ctx.ratexp('[abc]*')
>>> r.intersection(r)
>>> r & r

ratexp concatenation: new algorithm (static, dyn, Python)

The "concatenate" algorithm computes a ratexp that denotes the concatenation of two rational expressions. Also bound as the "*" operator in Python.

>>> ctx = vcsn.context('lal_char(abc)_b')
>>> r = ctx.ratexp('[abc]*')
>>> r.concatenate(r)
>>> r * r

Python API: Operators overloading on automata

The Python API now overloads the following operators for automata:

  • "+", sum of two automata
  • "&", product of two automata
  • "~", complement of an automaton
  • "*", concatenation of two automata
  • "%", difference between two automata
  • "|", union of two automata
  • "**", power of an automaton


ratexp sum: new algorithm (static, dyn, Python)

Compute the sum of two rational expressions. Also bound as the "+" operator in Python.

>>> ctx = vcsn.context('lal_char(abc)_b')
>>> r = ctx.ratexp('[abc]*')
>>> r.sum(r)
>>> r + r

proper: an optional argument to avoid state deletion

The proper algorithm eliminates the states that become inaccessible in the course of spontaneous-transition elimination. This can be disabled by passing "false" as additional argument to proper (static, dyn, Python).


star_height: new algorithm (static, dynamic, Python)

Computes the star-height of an expression.

>>> vcsn.context('lal_char(ab)_b').ratexp('(a***+a**+a*)*').star_height()

Bison is no longer needed

To install Vaucanson from a tarball, Bison is no longer needed. Of course, modifying one of grammar files will fail, unless Bison 3.0 is installed.


Letter classes in context specifications

It is now possible to use ranges to define alphabets. For instance in Python,


builds a context whose alphabet covers letters, digits, and underscore.

Vaucanson 2b.2


Release of our third beta, vaucanson-2b.2. Available on MacPorts as "vaucanson".

Letter classes

Letter classes are available in ratexps: '[a-d0-9_]' is expanded into (a+b+c+d+0+1+2+3+4+5+6+7+8+9+_). The negation ([^...]) is not supported.


The '.' operator is no longer printed

The pretty-printing of (non-LAW) ratexps is simplified.

Before: $ vcsn ladybird 2 | vcsn determinize | vcsn aut-to-exp +a.(b+a.a+c.(a+c).b).(a+c.(a+c)*) $ vcsn derived-term -Ee 'a:b:c' | vcsn aut-to-exp (a.b+b.a).c+(a.c+c.a).b+(b.c+c.b).a

After: $ vcsn ladybird 2 | vcsn determinize | vcsn aut-to-exp +a(b+aa+c(a+c)b)(a+c(a+c)*) $ vcsn derived-term -Ee 'a:b:c' | vcsn aut-to-exp (ab+ba)c+(ac+ca)b+(bc+cb)a

Shortlex order is now used for ratexps

This changes the pretty-printing of polynomials of ratexps, for instance the result of derivations.

Before: $ vcsn derivation -e '(a+b)a(a+b)' aa a+b + a.a.(a+b) + a

After: $ vcsn derivation -e '(a+b)a(a+b)' aa a* + a+b + aa(a+b*)

ratexp implementation overhaul

Although internal, this change is documented as it deeply changes the way ratexps are handled in C++ code.

The abstract-syntax tree of the rational expressions now matches the usual (abstract) grammar:

E ::= \z | \e | a | E+F | E.F | E* | kE | Ek       k is a weight

In other words, there are now 'left-weight' and 'right-weight' nodes that exist, whereas before, the six first cases carried left and right weights. Trivial identities are enforced, and, for instance, no tree for 'a k' exist: it is converted to 'k a'.


Many of these listed features were actually developped over the last month.

products accept more heterogenous arguments

It is already possible to compute the (regular, infiltration and shuffle) products of automata with different types of "basic" weightsets (e.g. B, Z, Q, R).

It is now also possible when weights are ratexps. For instance the product of automaton with weights in Q and an automaton with weights in RatE yields an automaton with weights in Q-RatE.

New ratexp operators: '&' (intersection) and ':' (shuffle)

The operator '&' denotes the intersection in the case of Boolean weights, or more generally, the Hadamard product. Only "derived_term" can compute an automaton from it, in which case:

derived_term(E & F) = product(derived_term(E), derived_term(F))

The operator ':' denotes the shuffle product, aka interleave. For instance 'a:b:c' denotes the language of the permutations of "abc". Only "derived_term" can compute an automaton from it, in which case:

derived_term(E : F) = shuffle(derived_term(E), derived_term(F))

minimization is much faster

There are now two different implementations of the minimization, namely "moore" and "signature". Both require a trim automaton as input, and "signature" accepts non-deterministic automata.

The "moore" minimization is currently the fastest.

context run-time compilation

When dyn::make_context is presented with an unknown but valid context, it is compiled and loaded dynamically. For instance:

$ vcsn cat -C 'lao_r' -W -e 3.14
# Wait for the context to be compiled...

The compiled context is currently left in /tmp for forthcoming runs.

$ ls /tmp/lao*
/tmp/lao_r.cc   /tmp/lao_r.o   /tmp/lao_r.so

Python binding

It is now possible to use Vaucanson from Python. The Python binding is on top of the dyn API, and inherits all its features.

This Python API is object-oriented, contrary to dyn which is a list of types (context, ratexp, automaton, etc.) and functions (derived_term, determinize, etc.):

 dyn:: functions                    |  Python methods
derived_term(ratexp) -> automaton   | ratexp.derived_term() -> automaton
determinize(automaton) -> automaton | automaton.determinize() -> automaton
etc.                                | etc.

Documentation is forthcoming, but for instance:

$ python
Python 2.7.6 (default, Nov 12 2013, 13:26:39)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import vcsn
>>> b = vcsn.context('lal_char(abc)_b')
>>> b
>>> r1 = b.ratexp('(a+b)*')
>>> r1
>>> a1 = r1.derived_term()
>>> a2 = b.ratexp('(b+c)*').standard()
>>> a1.product(a2).ratexp()


products accept heterogenous arguments

It is now possible to compute the (regular, infiltration and shuffle) products of a (B, Z, Q, R) automaton with a (B, Z, Q, R) automaton.

$ vcsn derived-term -C 'lal_char(ab)_q' -e '(<1/2>a+<1/3>b)*' > q.gv
$ vcsn derived-term -C 'lal_char(ab)_z' -e '(<2>a+<3>b)*'     > z.gv
$ vcsn derived-term -C 'lal_char(ab)_r' -e '(<.2>a+<.3>b)*'   > r.gv

$ vcsn product -f z.gv q.gv | vcsn aut-to-exp
$ vcsn product -f q.gv z.gv | vcsn aut-to-exp
$ vcsn product -f q.gv z.gv q.gv | vcsn aut-to-exp
$ vcsn product -f q.gv z.gv z.gv | vcsn aut-to-exp
$ vcsn product -f q.gv r.gv | vcsn aut-to-exp
$ vcsn product -f z.gv r.gv | vcsn aut-to-exp
$ vcsn product -f z.gv r.gv q.gv | vcsn aut-to-exp


TikZ output now uses standalone

The TikZ output can now both be compiled as a standalone document, and be used inlined (as LaTeX source) in another document. Read the document of the "standalone" package (e.g., run 'texdox standalone').

Data library path

TAF-Kit is now able to find installed files. Control the search path via the environment variable VCSN_DATA_PATH (a ":"-separated list of directories).

$ vcsn aut-to-exp -f lal_char_z/c1.gv

dot format parsing is much faster

We now require Bison 3.0 to build Vaucanson. In exchange, we get is very significant speed-up. On erebus (Mac OS X i7 2.9GHz 8GB, GCC 4.8 -O3) reading standard('a?{500}') goes from 30s to 1.2s.


minimize: new algorithm (static, dynamic, TAF-Kit)

Given a deterministic Boolean lal automaton, compute its minimal equivalent automaton using Moore's algorithm.

Example: $ vcsn standard -e '(a+b+c+d){100}' -o 100.gv $ vcsn cat -f 100.gv -O info | grep 'number of states' number of states: 401 $ vcsn minimize -f 100.gv -o 100min.gv $ vcsn cat -f 100min.gv -O info | grep 'number of states' number of states: 101 $ vcsn are-equivalent -f 100.gv 100min.gv true

fraction parsing bug fixes

Negative denominators used to be silently mangled, and zero denominators were accepted without failing.


split: new algorithm (static, dynamic, TAF-Kit)

Implements the breaking of a ratexp into a polynomial of ratexps, as required by breaking derivation, and broken derived terms. Not named "break", as this is a C++ keyword.


products accept heterogenous arguments

It is now possible to compute the (regular, infiltration and shuffle) products of a B-automaton with any other kind of automaton.

$ vcsn standard -C 'lal_char(ab)_b' -e 'a' -o a.gv
$ vcsn standard -C 'lal_char(ab)_ratexpset<lal_char(uv)_b>' \
                -e '(<u>a+<v>b)*' -o ab.gv
$ vcsn product -f ab.gv a.gv | vcsn shortest -f - 4
$ vcsn shuffle -f ab.gv a.gv | vcsn shortest -f - 4
a + <u+u>aa + <v>ab + <v>ba
$ vcsn infiltration -f ab.gv a.gv | vcsn shortest -f - 4
<u+\e>a + <u.u+u+u>aa + <u.v+v>ab + <v.u+v>ba


shortest: accepts the number of words as argument (defaults to 1)

$ vcsn shortest -O text -Ee '(a+b)*' 2
\e + a
$ vcsn enumerate -O text -Ee '(a+b)*' 2
\e + a + b + aa + ab + ba + bb

difference: new algorithm (static, dynamic, TAF-Kit)

Computes the difference between an automaton and a B-automaton (on LAL only):

$ vcsn derived-term -e '(?@lal_char(ab)_z)(<2>a+<3>b)*' -o lhs.gv
$ vcsn derived-term -e '(a+b)*b(a+b)*' -o rhs.gv
$ vcsn difference -f lhs.gv rhs.gv | vcsn aut-to-exp
$ vcsn difference -f rhs.gv rhs.gv | vcsn aut-to-exp

product accepts heterogenous arguments

It is now possible to compute the product of (for instance) a Z-automaton with a B-automaton.

$ vcsn standard -C 'lal_char(abc)_z' -e '(<2>a+<3>b+<5>c)*' -o 1.gv
$ vcsn standard -C 'lal_char(b)_b' -e 'b*' -o 2.gv
$ vcsn product -f 1.gv 2.gv | vcsn aut-to-exp
$ vcsn product -f 2.gv 1.gv | vcsn aut-to-exp

Vaucanson 2b.1


Release of our second beta, vaucanson-2b.1. Available on MacPorts as "vaucanson".


TAF-Kit: infiltration, product, shuffle: accept multiple arguments

You may pass several (one or more) arguments to vcsn product, infiltration and shuffle.

infiltration, product, shuffle: accept non commutative semirings

Beware that the (well defined) behavior of the resulting automata is no longer what one might expect.

$ vcsn standard -C 'lal_char(ab)_ratexpset<lal_char(uv)_b>' \
                -e '<u>a<v>b' -o uv.gv
$ vcsn standard -C 'lal_char(ab)_ratexpset<lal_char(xy)_b>' \
                -e '<x>a<y>b' -o xy.gv
$ vcsn product -f uv.gv xy.gv | vcsn enumerate -f - 4
$ vcsn shuffle -f uv.gv xy.gv | vcsn enumerate -f - 4
$ vcsn infiltration -f uv.gv xy.gv | vcsn enumerate -f - 4

Input/Output fixes

Support for EFSM in input/output is improved to support weights. Output was significantly sped up.

Before: 0.64s: standard -Ee 'a?{500}' -O dot >a500.gv 4.47s: standard -Ee 'a?{500}' -O efsm >a500.efsm 4.04s: standard -Ee 'a?{500}' -O fado >a500.fado 4.83s: standard -Ee 'a?{500}' -O grail>a500.grail 1.03s: standard -Ee 'a?{500}' -O tikz >a500.tikz

After: 0.76s: standard -Ee 'a?{500}' -O dot >a500.gv 0.37s: standard -Ee 'a?{500}' -O efsm >a500.efsm 0.35s: standard -Ee 'a?{500}' -O fado >a500.fado 0.35s: standard -Ee 'a?{500}' -O grail>a500.grail 0.68s: standard -Ee 'a?{500}' -O tikz >a500.tikz


boolean weight parsing bugix

The weight parser used to ignore all characters after the first one, so that for example "1-q" was considered valid and equivalent to "1".


product and infiltration-product are much faster

On erebus (MacBook Pro i7 2.9GHz 8GB, GCC 4.8 -O3), with "vcsn standard -E -e 'a?70' -o a70.gv":

Before: 13.01s: product -q -f a70.gv a70.gv 0.23s: shuffle -q -f a70.gv a70.gv 16.64s: infiltration -q -f a70.gv a70.gv After: 0.77s: product -q -f a70.gv a70.gv 0.19s: shuffle -q -f a70.gv a70.gv 0.87s: infiltration -q -f a70.gv a70.gv


TAF-Kit: option name changes

The former options -W and -L (to specify the weightset and labelset) are removed, definitively replaced by -C, to specify the context.

In addition to -A and -E (input is an automaton/ratexp), option -W now replaces option -w (input is a weight), and option -P denotes "polynomials". Note that "polynomials" are actually linear combinations of labels, weighted by weights, so for instance "LAL" contexts accept only single-letter labels, used LAW to accept words.

$ vcsn cat -C 'law_char(ab)_z' -P -e 'a+ab + <-1>a+ab+<10>bb'
<2>ab + <10>bb

internal overhaul

Better names were chosen for the various details of the dyn:: value support:

           abstract_automaton -> automaton_base

             abstract_context -> context_base

          abstract_polynomial -> polynomial_base
 concrete_abstract_polynomial -> polynomial_wrapper

              abstract_ratexp -> ratexp_base
     concrete_abstract_ratexp -> ratexp_wrapper

           abstract_ratexpset -> ratexpset_base
  concrete_abstract_ratexpset -> ratexpset_wrapper

              abstract_weight -> weight_base
     concrete_abstract_weight -> weight_wrapper

Note that "base" is slightly misleading, as it actually applies to the wrappers, and not to the static objects. For instance, weightsetwrapper derives from weightset_base, but b, z, and the other (static) weightsets do not derive from weightset_base. It would therefore be more precise to name "weightset_base" as "weightset_wrapper_base", but that might be uselessly long. Names may change again in the future.

Because dyn::polynomial and dyn::weight store the corresponding polynomialset/weightset, there is no need (so far?) for a dyn::polynomialset/dyn::weightset. Therefore, abstract_polynomialset and abstract_weightset were removed.

For consistency (and many other benefits), automaton_wrapper and context_wrapper were introduced. Now, the whole static API is completely independent of the dyn API, and the dyn API consists only of wrappers of "static"-level values---instead of inheritance for automata and contexts.


renamings (static, dynamic, TAF-Kit)

derive -> derivation derived-terms -> derived-term infiltrate -> infiltration


taf-kit: option -q for "quiet"

Equivalent to "-O null": do not generate output.

star-normal-form: new algorithm (static, dynamic, TAF-Kit)

Compute an equivalent rational expression where starred subexpressions have a non null constant term. Yields the same standard automaton, but built faster.

$ vcsn star-normal-form -e '(a*b*)*'

Valid only for Boolean automata.

ratexp printing: minimize parentheses

Print rational expressions with the minimum required number of parentheses, making some large expressions considerably easier to read. For example

Before: $ vcsn expand -C 'law_char(abc)_z' -e '(a+b)?{2}' (+<2>a+<2>b+(aa)+(ab)+(ba)+(bb)) $ vcsn ladybird 2 | vcsn determinize | vcsn aut-to-exp +(a.((b+(a.a)+(c.((a+c)).b))).(a+(c.((a+c)*))))

After: $ vcsn expand -C 'law_char(abc)_z' -e '(a+b)?{2}' (+<2>a+<2>b+aa+ab+ba+bb) $ vcsn ladybird 2 | vcsn determinize | vcsn aut-to-exp +a.(b+a.a+c.(a+c).b).(a+c.(a+c)*)

expand: new algorithm (static, dynamic, TAF-Kit)

Given a rational expression, distribute product over addition (recursively under the starred subexpressions) group and sort the equal monomials.

$ vcsn expand -C 'lal_char(abc)_z' -e '(a+b)?{2}*'

Vaucanson 2b.0


Release of our first beta, vaucanson-2b.0.


shuffle, infiltrate: new algorithms (static, dynamic, TAF-Kit)

New algorithms on LAL automata: computes the shuffle-product and infiltration of two automata. These (and now product as well) require weightsets to be commutative semirings.

$ vcsn standard -C 'lal_char(ab)_z' -E -e 'ab' -o ab.gv
$ vcsn product    -f ab.gv ab.gv | vcsn enumerate -O text -f - 4
$ vcsn shuffle    -f ab.gv ab.gv | vcsn enumerate -O text -f - 4
<4>aabb + <2>abab
$ vcsn infiltrate -f ab.gv ab.gv | vcsn enumerate -O text -f - 4
ab + <2>aab + <2>abb + <4>aabb + <2>abab


derived-terms: new algorithm (static, dynamic, TAF-Kit)

In addition to thompson and standard, this is a third means to build an automaton from a (weighted) rational expression. It corresponds to the Antimirov definition of derivatives (implemented by "derive"). It requires LAL rational expressions.

$ vcsn derived-terms -C 'lal_char(ab)_q' -e '(<1/6>a*+<1/3>b*)*'
  vcsn_context = "lal_char(ab)_q"
  rankdir = LR
    node [style = invis, shape = none, label = "", width = 0, height = 0]
    node [shape = circle]
  I0 -> 0
  0 -> F0 [label = "<2>"]
  0 -> 1 [label = "<1/3>a"]
  0 -> 2 [label = "<2/3>b"]
  1 -> F1 [label = "<2>"]
  1 -> 1 [label = "<4/3>a"]
  1 -> 2 [label = "<2/3>b"]
  2 -> F2 [label = "<2>"]
  2 -> 1 [label = "<1/3>a"]
  2 -> 2 [label = "<5/3>b"]

vcsn elimininate-state

This tool allows one to eliminate states one after the other.

$ vcsn eliminate-state -f lao.gv 2

Its interface is likely to change, or to be completely removed.

Smaller libraries

Some compiler magic was used to reduce the size of the libraries (about 20% on Mac OS X using GCC 4.8). TAF-Kit might start faster.


determinize: by default no longer forces the result to be complete

The determinization (static, dynamic, TAF-Kit) now admits an optional second argument (defaulting to '0'), a Boolean stating whether the result must be complete.


polynomialset fully replaces entryset

The 'entryset' type is removed, as polynomialset now provides a strict superset of its features.


derive: new algorithm (static, dynamic, TAF-Kit)

New algorithm on LAL automata: computes the derivation of rational expressions with respect to a word.

$ vcsn derive -C 'lal_char(a)_z' -e '(<2>a)*' a
$ vcsn derive -C 'lal_char(a)_z' -e '(<2>a)*' aa
$ vcsn derive -C 'lal_char(a)_z' -e '(<2>a)*' aaaa
$ vcsn derive -C 'lal_char(a)_z' -e '(<2>a)*' b

$ vcsn derive -C 'lal_char(ab)_q' -e '(<1/6>a*+<1/3>b*)*' a
$ vcsn derive -C 'lal_char(ab)_q' -e '(<1/6>a*+<1/3>b*)*' aa


enumerate returns a dyn::polynomial

The 'enumerate' algorithm use to cheat, and returned a std::vector<string> (each string being a pretty-printing of the monomial, e.g., "<2>a"). It now returns a dyn::polynomial.

vcsn-enumerate is biased to prefer the 'list' output format:

$ vcsn enumerate -Ee 'a*' 3

however the proper syntax for polynomial can be asked for.

$ vcsn enumerate -O text -Ee 'a*' 3
\e + a + aa + aaa

New dyn:: type: polynomial

A new member joins the dyn:: family of types (i.e., automaton, ratexp, ratexpset, weight, weightset). Currently there are no means to read such a value from TAF-Kit, but there is output support with two different output format:

  • 'text' (aka 'default') Prints the polynomial this usual way, e.g., "<2>a+<3>b".
  • 'list' Prints one monomial per line, e.g. '<2>a <3>b'


mutable_automaton: speed improvement

"set_transition" used to invoke twice "get_transition", which had a serious performance impact on some algorithms. This is fixed.

Before: 9.08s (0.33s+8.75s): ladybird 21 | determinize -O null 16.88s (0.85s+16.03s): thompson -C "lan_char(a)b" -Ee "a?{2000}" | proper -O null 21.30s (9.76s+11.54s): standard -Ee "(a+b+c+d)?{100}" | aut-to-exp -O null 7.29s (0.04s+7.25s): standard -C "lalchar(ab)_z" -Ee "(a+b)b(<2>a+<2>b)" | power -O null -f- 10 0.04s (0.04s): standard -E -e "(a?){70}" -o a70.gv 24.20s (24.20s): product -O null -f a70.gv a70.gv

After: 8.54s (0.13s+8.41s): ladybird 21 | determinize -O null 11.87s (0.86s+11.01s): thompson -C "lan_char(a)b" -Ee "a?{2000}" | proper -O null 21.03s (9.40s+11.63s): standard -Ee "(a+b+c+d)?{100}" | aut-to-exp -O null 6.58s (0.06s+6.52s): standard -C "lalchar(ab)_z" -Ee "(a+b)b(<2>a+<2>b)" | power -O null -f- 10 0.07s (0.07s): standard -E -e "(a?){70}" -o a70.gv 13.16s (13.16s): product -O null -f a70.gv a70.gv

Note in particular that the spontaneous transition elimination algorithm is faster, going from 16s to 11s on a MacBook Pro i7 2.9GHz 8GB RAM, on the following sequence.

$ vcsn thompson -C 'lan_char(a)_b' -Ee 'a?{2000}' | vcsn proper -O null


random: new algorithm (static, dynamic, TAF-Kit)

Random generation of automata. Subject to changes. Accepts four arguments: number of states, density (of transitions, defaults to .1, 1 generates a clique), number of initial states (defaults to 1), and number of final states (defaults to 1).

So far LAL and LAN only, no support for random weights.

$ vcsn random -O fado 3
@DFA 0
0 d 1
1 c 2
2 d 2
$ vcsn random -O fado 3
@DFA 2
0 c 1
1 d 2
2 c 0
$ vcsn random -C 'lan_char(ab)_b' -O fado 3
@NFA 1 * 0
0 @epsilon 1
1 b 2
2 b 1


Overhaul of the package

Thanks to Automake 1.14 features, there is now a single Makefile, which significantly speeds up the compilation of the package. The test-suite now uses the Test Anything Protocol, which results in more verbose results.

Beware that because of subtle issues (in the generated Makefile snippets that track dependencies), you are higly recommend to "make clean" after upgrading from the Git repository, and then "make" as usual.


proper: faster implementation

The spontaneous transition elimination algorithm is now faster, going from 102s to 15s on a MacBook Pro i7 2.9GHz 8GB RAM, on the following sequence.

$ vcsn thompson -C 'lan_char(a)_b' -Ee 'a?{2000}' | vcsn proper -O null



New algorithm (static, dynamic, TAF-Kit) on LAL automata: whether some word is the label of at least two successful computations.

$ vcsn is-ambiguous <<\EOF
  I -> 0
  0 -> 1 [label = "a"]
  0 -> 2 [label = "a"]
  1 -> F

$ vcsn is-ambiguous <<\EOF
  I -> 0
  0 -> 1 [label = "a"]
  0 -> 2 [label = "a"]
  1 -> F
  2 -> F

Also reported in the 'info' format for automata.

product: recover the original states

Similarly to 'determinize', the (static version of) 'product' can now be queried to get a map from states of the result to pairs of original states.


Sum of standard automata is fixed

See the previous entry: the computation of the initial transition was wrong, which resulted in the production of non-standard automata. This is fixed:

$ vcsn standard -C 'lal_char(a)_ratexpset<lal_char(x)_b>' -e '<x>a*' > 1.gv
$ vcsn standard -C 'lal_char(b)_ratexpset<lal_char(y)_b>' -e '<y>b*' > 2.gv
$ vcsn sum -f 1.gv 2.gv
  vcsn_context = "lal_char(ab)_ratexpset<lal_char(xy)_b>"
  rankdir = LR
    node [style = invis, shape = none, label = "", width = 0, height = 0]
    node [shape = circle]
  I0 -> 0
  0 -> F0 [label = "<x+y>"]
  0 -> 1 [label = "<x>a"]
  0 -> 2 [label = "<y>b"]
  1 -> F1
  1 -> 1 [label = "a"]
  2 -> F2
  2 -> 2 [label = "b"]


Operations on automata are generalized

Operations (union, sum, concatenate, chain) were uselessly restricted to LAL. Besides, the contexts were improperly computed (both labelset and weightset). This is fixed.

$ vcsn standard -C 'lal_char(a)_ratexpset<lal_char(x)_b>' -e '<x>a*' > 1.gv
$ vcsn standard -C 'lal_char(b)_ratexpset<lal_char(y)_b>' -e '<y>b*' > 2.gv
$ vcsn sum -f 1.gv 2.gv
  vcsn_context = "lal_char(ab)_ratexpset<lal_char(xy)_b>"
  rankdir = LR
    node [style = invis, shape = none, label = "", width = 0, height = 0]
    node [shape = circle]
  I0 -> 0 [label = "<\\e+\\e+\\e>"]
  0 -> F0 [label = "<x+y>"]
  0 -> 1 [label = "<x>a"]
  0 -> 2 [label = "<y>b"]
  1 -> F1
  1 -> 1 [label = "a"]
  2 -> F2
  2 -> 2 [label = "b"]



New algorithm (static, dynamic, TAF-Kit).

$ vcsn double-ring -C 'lal_char(ab)_b' 6 1 3 4 5
  vcsn_context = "lal_char(ab)_b"
  rankdir = LR
    node [style = invis, shape = none, label = "", width = 0, height = 0]
    node [shape = circle]
  I0 -> 0
  0 -> 1 [label = "a"]
  0 -> 5 [label = "b"]
  1 -> F1
  1 -> 0 [label = "b"]
  1 -> 2 [label = "a"]
  2 -> 1 [label = "b"]
  2 -> 3 [label = "a"]
  3 -> F3
  3 -> 2 [label = "b"]
  3 -> 4 [label = "a"]
  4 -> F4
  4 -> 3 [label = "b"]
  4 -> 5 [label = "a"]
  5 -> F5
  5 -> 0 [label = "a"]
  5 -> 4 [label = "b"]

It is advised to pass "layout = circo" to Dot for the rendering.

concatenation, chain

New algorithm on standard automata (static, dynamic, TAF-Kit).


New algorithm on standard automata (static, dynamic, TAF-Kit). Same limitations as left-mult, see below.



New algorithm on standard automata (static, dynamic, TAF-Kit).

The TAF-Kit version is (currently) troublesome, as it does not infer the context to parse the weight from the automaton: be sure to specify -C:

$ vcsn standard -C 'lal_char(a)_z' -e a | vcsn left-mult 12
vcsn-left-mult: left_mult: no implementation available for mutable_automaton<lal_char_z> x b
$ vcsn standard -C 'lal_char(a)_z' -e a | vcsn left-mult -C 'lal_char(a)_z' 12
  vcsn_context = "lal_char(a)_z"
  rankdir = LR
    node [style = invis, shape = none, label = "", width = 0, height = 0]
    node [shape = circle]
  I0 -> 0
  0 -> 1 [label = "<12>a"]
  1 -> F1

taf-kit: option -w for weights as input

Currently only vcsn-cat supports it.

$ vcsn cat -w -e 2
vcsn-cat: invalid Boolean: 2
$ vcsn cat -w -e 1
$ vcsn cat -C 'lal_char(a)_z' -w -e 2

concatenate, star, sum

New algorithms on standard automata (static, dynamic, TAF-Kit).



New algorithm on automata (static as "union_a", dynamic as "union_a", TAF-Kit as "union"). Plain graph union.


New algorithm on rational expressions (static, dynamic, TAF-Kit).

$ vcsn is-valid -C 'lal_char(a)_r' -E -e '(<.5>\e)*'
$ vcsn is-valid -C 'lal_char(a)_r' -E -e '\e*'

For consistency, is-valid is now also available for automata in dyn:: and TAF-Kit (it used to be static only, visible from "info" output).

$ vcsn thompson -C 'lan_char(a)_r' -e '(<.5>\e)*' | vcsn is-valid
$ vcsn thompson -C 'lan_char(a)_r' -e '\e*' | vcsn is-valid


More RatExp quantifiers

In addition to "", there is "?"/"{?}" and "{+}". Support for "{}" is added for symmetry.

$ vcsn cat -Ee 'a?'
$ vcsn cat -Ee 'a?{3}'
$ vcsn cat -Ee '(a+b){+}'


I/O EFSM format support

We are now able to produce and read EFSM format (designed as an interface to OpenFST). This is not (yet) thoroughly tested for weighted automata.

$ vcsn ladybird -O efsm 4 |
   efstcompile | fstdeterminize | efstdecompile |
   vcsn cat -I efsm -O info
type: mutable_automaton<lal_char(abc)_b>
number of states: 15
number of initial states: 1
number of final states: 8
number of accessible states: 15
number of coaccessible states: 15
number of useful states: 15
number of transitions: 43
number of deterministic states: 15
number of eps transitions: 0
is complete: 0
is deterministic: 1
is empty: 0
is eps-acyclic: 1
is normalized: 0
is proper: 1
is standard: 0
is trim: 1
is useless: 0
is valid: 1

Output in EFSM format is improved

State numbers now start appropriately at 0 (instead of 2), and when there is a single initial state, no "pre-initial state" is output; this avoids the introduction of spontaneous transitions in deterministic automata.

Before (see news of 2013-07-01):

$ vcsn ladybird -O efsm 2
#! /bin/sh

cat >transitions.fsm <<\EOFSM
0       2       \e
2       3       a
3       3       b
3       3       c
3       2       c
3       2       a

cat >isymbols.txt <<\EOFSM
\e      0
a       1
b       2
c       3

fstcompile --acceptor --keep_isymbols --isymbols=isymbols.txt transitions.fsm


$ vcsn ladybird -O efsm 2
#! /bin/sh

cat >isymbols.txt <<\EOFSM
\e      0
a       1
b       2
c       3

cat >transitions.fsm <<\EOFSM
0       1       a
1       0       a
1       1       b
1       0       c
1       1       c

fstcompile --acceptor --keep_isymbols --isymbols=isymbols.txt transitions.fsm

I/O FAdo format support

We are now able to produce and read FAdo format.

$ vcsn ladybird -O fado 4 | \
  python -c "from FAdo import fa
nfa = fa.readFromFile('/dev/stdin')[0]
dfa = nfa.toDFA()
fa.saveToFile('dl4.fado', dfa)"

$ vcsn cat -I fado -O info -f dl4.fado
type: mutable_automaton<lal_char(abc)_b>
number of states: 15
number of initial states: 1
number of final states: 8
number of accessible states: 15
number of coaccessible states: 15
number of useful states: 15
number of transitions: 43
number of deterministic states: 15
number of eps transitions: 0
is complete: 0
is deterministic: 1
is empty: 0
is eps-acyclic: 1
is normalized: 0
is proper: 1
is standard: 0
is trim: 1
is useless: 0
is valid: 1


New algorithm on rational expressions (static, dynamic, TAF-Kit).

$ vcsn constant-term -e '(?@lal_char(a)_b)(\e)*'
$ vcsn constant-term -e '(?@lal_char(a)_z)(\e)*'
vcsn-constant-term: z: star: invalid value: 1
$ vcsn constant-term -C 'law_char(ab)_ratexpset<law_char(wxyz)_b>' \
                     -e '<w>(<x>a*+<y>b*)*<z>'


Automaton library

The set of library automata for existing contexts is now complete with respect to what Vaucanson 1 provided. Automata families are not, and will not, be part of this library ; for instance, instead of looking for ladybird-6.gv, use 'vcsn ladybird 6'.

lal_char_b: a1.gv b1.gv evena.gv oddb.gv
lal_char_z: b1.gv binary.gv c1.gv d1.gv
lal_char_zmin: minab.gv minblocka.gv slowgrow.gv

Currently one must specify their path:

$ vcsn evaluate -f share/vcsn/lal_char_zmin/minab.gv aabbba


TAF-Kit: works on standard input by default

The very frequent "-f -" sequence is no longer required: by default the input is stdin.

$ vcsn ladybird 2 | vcsn determinize | vcsn aut-to-exp



New algorithm on automata (static, dynamic, TAF-Kit). Corresponds to "standardize" in Vaucanson 1.


New algorithm on automata (static, dynamic, TAF-Kit). Also reported in "info" format.


New automata factory (static, dynamic, TAF-Kit): Brzozowski's universal witness.


New formats: grail and fado

$ vcsn standard -O fado -e 'a+b'
@DFA 1 2
0   a   1
0   b   2
$ vcsn standard -O fado -e 'a+ab'
@NFA 1 3 * 0
0   a   1
2   b   3
0   a   2
$ vcsn standard -O grail -e 'a+ab'
(START) |-  0
0   a   1
2   b   3
0   a   2
1   -|  (FINAL)
3   -|  (FINAL)



New algorithm on automata (static, dynamic, TAF-Kit).

$ vcsn standard -e '(?@lal_char(01)_z)(0+1)*1(<2>0+<2>1)*' >binary.gv
$ vcsn power -f binary.gv 0 | vcsn enumerate -f- 2
$ vcsn power -f binary.gv 1 | vcsn enumerate -f- 2
$ vcsn power -f binary.gv 2 | vcsn enumerate -f- 2
$ vcsn power -f binary.gv 4 | vcsn enumerate -f- 2
$ vcsn power -f binary.gv 8 | vcsn enumerate -f- 2



New algorithm (static, dynamic, TAF-Kit).

$ vcsn divkbaseb -C 'lal_char(01)_b' 3 2
  vcsn_context = "lal_char(01)_b"
  rankdir = LR
    node [style = invis, shape = none, label = "", width = 0, height = 0]
    node [shape = circle]
  I0 -> 0
  0 -> F0
  0 -> 0 [label = "0"]
  0 -> 1 [label = "1"]
  1 -> 0 [label = "1"]
  1 -> 2 [label = "0"]
  2 -> 1 [label = "0"]
  2 -> 2 [label = "1"]


enumerate produces a list of weighted words

enumerate now also provides the weight of the words. It is also fixed: it no longer reports words with nul weight (e.g., 'a' in 'a+<-1>a').

$ vcsn standard -e '(?@lal_char(01)_z)(0+1)*1(<2>0+<2>1)*' \
    | vcsn enumerate -f - 3

shortest is fixed and modified similarly.

$ vcsn standard -e '(?@lal_char(ab)_z)(a+<5>bb+<-1>a)' \
      | vcsn shortest -f -


WeightSet: added support for rational weights

We can now use automata with rational numbers as their weights.

$ vcsn standard -e "(?@lal_char(ab)_q)(<1/2>a+<2>b)*" \
  | vcsn evaluate -f - aaabbbb

$ vcsn standard -e "(?@lal_char(ab)_q)(<1/2>a+<2>b)*"\
  | vcsn evaluate -f - aaab


new format: efsm (and new tool: efstcompile)

The former output format ("fsm") is dropped, replaced by an adhoc "extended FSM" format: "efsm". The FSM format focuses only on the transitions, and lacks information about the labels (which are expected to be mapped to numbers), weights, whether it's an acceptor or transducer, etc.

The efsm format is designed to be simple to use with OpenFST: just run "efstcompile" instead of "fstcompile". As a matter fact, the new "efstcompile" tool is rather dumb, as it simply executes the "efsm" file.

$ vcsn ladybird -O efsm 2
#! /bin/sh

cat >transitions.fsm <<\EOFSM
0       2       \e
2       3       a
3       3       b
3       3       c
3       2       c
3       2       a

cat >isymbols.txt <<\EOFSM
\e      0
a       1
b       2
c       3

fstcompile --acceptor --keep_isymbols --isymbols=isymbols.txt transitions.fsm

$ vcsn ladybird -O efsm 8 | efstcompile | fstdeterminize | fstinfo \
    | grep '# of states'
# of states                                       256


new binary: vcsn

To give a flavor of what TAF-Kit should be (a single tool instead of one per command), the new "vcsn" script bounces to the vcsn-* tools. It does not support | as TAK-Kit 1 did.

$ vcsn are-equivalent -Ee '(a*b*)*' '(a+b)*'



Now removes states to which no transition arrive after spontaneous transitions removal.


Only the concatenation yielded an automaton whose projection on Boolean weights was different from the Thompson of the projection of the rational expression on Boolean. This is now fixed.


New algorithm (static, dynamic, TAF-Kit).


identity, unit => one

Labels new define one() and is_one() instead of identity() and is_identity().

We now use LAO, "labels are one", instead of LAU, "labels are unit".

WeightSets now define one() and is_one() instead of unit() and is_unit().


shortest, enumerate

New algorithms (static, dynamic, TAF-Kit).



Now accepts (LAL Boolean) rational expressions.


Now accepts (LAL Boolean) rational expressions. It cannot compare a rational expression with an automaton (or vice-versa). This is a temporary defect which shell be addressed once TAF-Kit is properly developed.

$ vcsn-are-equivalent -Ee '(a*b*)*' '(a+b)*'
$ vcsn-are-equivalent -Ee '(a*b)*' '(a+b)*'

dyn: overhaul

Consistency is enforced in dyn. In particular the very first dynamic/static bridge (which was not identified as such), vcsn::rat::abstract_ratexpset, is now part of dyn::.

dyn::weight now aggregates its WeightSet instead of a Context. More similar conversions were performed, and other are to come.



The output now shows useless states (and their transitions) in gray.



New algorithm (static, dynamic, TAF-Kit). Currently works only for LAL Boolean automata.

$ vcsn-standard -e 'a(ba)*' -o a1.gv
$ vcsn-standard -e '(ab)*a' -o a2.gv
$ vcsn-are-equivalent -f a1.gv a2.gv


is_trim, is_useless, is_empty

New algorithms (static, dynamic, TAF-Kit).

accessible_states, coaccessible_states, useful_states

New algorithms, static only.

num_accessible_states, num_coaccessible_states, num_useful_states

New algorithms, static only. Available in the "info" display.

copy accepts a predicate

It is now possible to filter the states to keep. Either as a predicate, or a set of states. For instance:

template <typename Aut>
Aut trim(const Aut& a)
  return vcsn::copy(a, useful_states(a));


New algorithm (static, dynamic, and TAF-Kit). Requires a LAL Boolean automaton.

$ vcsn-universal -f a1.gv
  vcsn_context = "lal_char(ab)_b"
  rankdir = LR
  node [shape = circle]
    node [style = invis, shape = none, label = "", width = 0, height = 0]
  { 0 1 2 }
  I0 -> 0
  0 -> 0 [label = "a, b"]
  0 -> 1 [label = "a"]
  1 -> 0 [label = "a, b"]
  1 -> 1 [label = "a, b"]
  1 -> 2 [label = "b"]
  2 -> F2
  2 -> 0 [label = "a, b"]
  2 -> 1 [label = "a, b"]
  2 -> 2 [label = "a, b"]


New algorithm (static, dynamic, and TAF-Kit). Requires a complete deterministic LAL Boolean automaton.

$ vcsn-standard -e '(?@lal_char(ab)_b)a' \
    | vcsn-determinize -f- \
    | vcsn-complement -f-  \
    | vcsn-aut-to-exp -f-



It is conforming with the specifications: all the states must be deterministic, not just the reachable ones.

info output adjustments

It now displays whether the automaton "is complete" and the number of deterministic states if LAL, otherwise "N/A". It also reports "is deterministic: N/A" for non-LAL.

Invalid labels are rejected

Under some circumstances, some invalid transitions could be accepted by the Dot parser (e.g., "aa" or "" in LAL). This is fixed.


Support for entries is removed.

Member types, functions, and variables, about entries, have been removed. The services provided by entries in Vaucanson 1 are provided by LAU automata, so entries are not as useful in Vaucanson 2. And anyway, if needed, it should rather be a set of free standing functions.


Refactoring: eps-removal => proper

Eps-removal is renamed to Proper.


Syntax for rational-expressions has changed!

Angular brackets are now used for weights. Instead of

$ vcsn-standard -e '(?@lal_char(01)_z)(0+1)*1({2}1+{2}0)*' -o binary.dot


$ vcsn-standard -e '(?@lal_char(01)_z)(0+1)*1(<2>1+<2>0)*' -o binary.dot

Braces are now used instead of (*...) for generalized quantifiers.

a{0} => \e
a{1} => a
a{2} => a.a
a{5} => a.a.a.a.a

a{0,1} => \e+a
a{0,2} => \e+a+a.a
a{0,3} => \e+a+a.a+a.a.a

a{1,2} => a.(\e+a
a{1,3} => a.(\e+a+a.a)

a{2,5} => a.a.(\e+a+a.a+a.a.a)

a{0,} => a*
a{1,} => a.(a*)
a{4,} => a.a.a.a.(a*)


info output

Now displays the number of spontaneous transitions (0 for LAL).


Digits as letters

Recently broken by accident, support for digits as letters is restored.

$ vcsn-standard -e '(?@lal_char(01)_z)(0+1)*1({2}1+{2}0)*' -o binary.dot
$ vcsn-evaluate -f binary.dot '11111111'
$ vcsn-evaluate -f binary.dot '101010'


Dot parser

This parser is now stricter than it used to be: be sure to escape backslashes in input Dot files, i.e.,

  0 -> 2 [label = "\e"]

is now invalid, write

  0 -> 2 [label = "\\e"]

This change was made because Graphviz treats "" (and renders it) exactly like "e".


New algorithm: Thompson

Conversion from rational expression to automata. Requires lan or law. The handling of weights might be changed in the near future.

$ vcsn-thompson -Ee '(?@lan_char(abc)_z){2}(a+{3}b+c)*{5}'
  vcsn_context = "lan_char(abc)_z"
  rankdir = LR
  node [shape = circle]
    node [style = invis, shape = none, label = "", width = 0, height = 0]
  { 0 1 2 3 4 5 6 7 8 9 }
  I8 -> 8
  0 -> 2 [label = "\\e"]
  0 -> 4 [label = "\\e"]
  0 -> 6 [label = "\\e"]
  1 -> 0 [label = "\\e"]
  1 -> 9 [label = "{5}\\e"]
  2 -> 3 [label = "a"]
  3 -> 1 [label = "\\e"]
  4 -> 5 [label = "{3}b"]
  5 -> 1 [label = "\\e"]
  6 -> 7 [label = "c"]
  7 -> 1 [label = "\\e"]
  8 -> 0 [label = "{2}\\e"]
  8 -> 9 [label = "{10}\\e"]
  9 -> F9


standard is the new name for standard-of

The "of" is useless, inconsistent with the other algorithms, and with the TAF-Kit v1 name.


Evaluation is fixed

Several initializations were incorrect, expecting the zero to be 0 (which is not the case for zmin for instance). There might also be some speed up.


New output format: tikz

Layout is dumb, yet this is useful to prepare LaTeX documents.


New output format: info

This pseudo format displays facts about the automaton (number of states and so on) or rational expressions (number of nodes).


Can now be called on LAL automata, for consistency with is-proper and is-valid.



The states are now numbered from 0.

Clang Compatibility

Clang++ (3.2 and 3.3) can now compile Vaucanson.


New LabelSet: nullableset

Initial support for "Labels are nullable".

$ cat lan.dot
digraph lan
  vcsn_context = "lan_char(a)_b"
  I1 -> 1
  2 -> F2
  1 -> 2 [label = "a"]
  1 -> 2 [label = "\e"]
$ vcsn-cat -Af lan.dot
  vcsn_context = "lan_char(a)_b"
  rankdir = LR
  node [shape = circle]
    node [style = invis, shape = none, label = "", width = 0, height = 0]
  { 1 2 }
  I1 -> 1
  1 -> 2 [label = "\\e, a"]
  2 -> F2
$ vcsn-eps-removal -Af lan.dot
  vcsn_context = "lan_char(a)_b"
  rankdir = LR
  node [shape = circle]
    node [style = invis, shape = none, label = "", width = 0, height = 0]
  { 1 2 }
  I1 -> 1
  1 -> F1
  1 -> 2 [label = "a"]
  2 -> F2


New dyn algorithm and tool (vcsn-is-proper).


determinize speed-up

Determinization algorithm is five times faster.

Before: $ time bin/vcsn-de-bruijn 18 | bin/vcsn-determinize -Af - -Onull real 0m13.049s user 0m12.773s sys 0m0.276s $ time bin/vcsn-de-bruijn 20 | bin/vcsn-determinize -Af - -Onull real 0m56.780s user 0m55.775s sys 0m0.988s

Now: $ time bin/vcsn-de-bruijn 18 | bin/vcsn-determinize -Af - -Onull real 0m2.181s user 0m2.048s sys 0m0.132s $ time bin/vcsn-de-bruijn 20 | bin/vcsn-determinize -Af - -Onull real 0m8.653s user 0m8.177s sys 0m0.476s


hierarchy and API clean up

dyn:: was cleaned. Some headers were renamed:

vcsn/ctx/abstract_context.hh -> vcsn/dyn/context.hh
vcsn/algos/dyn.hh            -> vcsn/dyn/algos.hh
vcsn/core/automaton.hh       -> vcsn/dyn/automaton.hh

dyn::make_automaton was introduced to hide implementation details.


LabelSet renamings

The name LetterSet, UnitSet, and WordSet were not compliant. They have been renamed as letterset, unitset, and wordset.


Generalized quantifier as syntactic sugar for rational expressions

The new "(* min, max)" quantifier (postfix, like "*") allows to specify "powers" of an expression. For instance:

({a}b)(*0) => \e
({a}b)(*1) => {a}b
({a}b)(*2) => {a}b.{a}b
({a}b)(*5) => {a}b.{a}b.{a}b.{a}b.{a}b

({a}b)(*0,1) => \e+{a}b
({a}b)(*0,2) => \e+{a}b+({a}b.{a}b)
({a}b)(*0,3) => \e+{a}b+({a}b.{a}b)+({a}b.{a}b.{a}b)

({a}b)(*1,2) => {a}b.(\e+{a}b)
({a}b)(*1,3) => {a}b.(\e+{a}b+({a}b.{a}b))

({a}b)(*2,5) => {a}b.{a}b.(\e+{a}b+({a}b.{a}b)+({a}b.{a}b.{a}b))

({a}b)(*0,) => ({a}b)*
({a}b)(*1,) => {a}b.(({a}b)*)
({a}b)(*4,) => {a}b.{a}b.{a}b.{a}b.(({a}b)*)


Comment in rational expressions

The (?#...) construct allows to embed comments in rational expressions. They are discarded. There is no means to include a closing parenthesis in this construct.

Context in rational expressions

The (?@...) allows a rational expression to "carry" its context. Contrast for instance the two following runs.

$ vcsn-cat -C 'lal_char(xyz)_z'  -Ee '{42}x+{51}z'
$ vcsn-cat -Ee '(?@lal_char(xyz)_z){42}x+{51}z'

This will be used, eventually, so that TAF-Kit-like tools propagate the context in runs such as:

$ vcsn-cat -Ee '(?@lal_char(xyz)_z){42}x+{51}z' | vcsn-transpose -Ef -
1.1-5: invalid Boolean: 42

Currently it fails, as the "default" context is "lal_char(abc)_b".

New output format: fsm

An initial, and rough, support for Open FSM's format is provided. Currently there is no support at all for the weights.

$ vcsn-de-bruijn -O fsm 12 | fstcompile | fstdeterminize | wc -l

New output format: null

The output is discarded. This is useful for benching.


New algorithms: is-deterministic and complete

is-deterministic takes an automaton as argument and exits with code status 0 if the given automaton is deterministic, 2 otherwise.

complete also takes an automaton as argument and make it complete. If the given automaton is already complete, then it is unchanged.


Overhaul of the LAU, LAL, LAW implementation

So far a context was a triple: <Kind, LabelSet, WeightSet>, where (for instance), Kind is labels_are_letters, LabelSet is set_alphabet<char_letters>, and WeightSet is zmin). This is troublesome on several regards, the clearest being that the LabelSet makes no sense for LAU.

Now contexts are pairs: <LabelSet, WeightSet>, where this time the LabelSet (same name as before, different concept) can be a instance of UnitSet for LAU, WordSet for LAW, or LetterSet for LAL. These structures, in turn, are parameterized by the effective set of generators to use: for instance, LetterSet<set_alphabet<char_letters>>. Of course UnitSet is not parameterized.

As a first visible consequence, the name of the LAU contexts has changed:

lau_char_br => lau_br
lau_char(xyz)_ratexpset<lal_char(abc)_b> => lau_ratexpset<lal_char(abc)_b>


options renamed

The options to select the input and output format are renamed -I and -O (instead of -i and -o).

new option: -o for output file

The vcsn-* tools now support '-o FILE' to save the output in FILE.


product: strengthened preconditions

The product of automata requires LAL automata. The output alphabet is the intersection of the ones of the operands. This works as expected when the automata have disjoint alphabets.


dyn::product and vcsn-product

They compute the product of automata.


vcsn-determinize and vcsn-evaluate use the common command line options

These tools support -C, -g, etc. like the other tools. See "vcsn-<tool> -h".

context names are now complete

Context names used to describe the "static" structure only (e.g., lal_char_ratexpset<law_char_b>). It now includes the "dynamic" part, currently only the list of generators (e.g., lal_char(abc)_ratexpset<law_char(xyz)_b>).

This context strings are both printed and read by the various tools. For instance:

$ vcsn-standard-of -C 'law_char(xyz)_ratexpset<law_char(abc)_z>' \
      -e '{abc}xyz' | vcsn-aut-to-exp -f -

Note that the second tool, vcsn-aut-to-exp, found the context in its input, the standard automaton in Dot format.


dyn::ratexpset, dyn::context

These are now handled by shared pointer, consistently with dyn::automaton and dyn::ratexp.

Nasty memory management issues have been fixed.


It now supports the same arguments as the other vcsn-* tools. It also no longer requires 'a' and 'b' to be accepted letters, and it uses the whole alphabet. For instance "vcsn-de-bruijn -g 'xyz' 3" generates an automaton for "(x+y+z)x(x+y+z)^3".

dyn::ladybird, vcsn-ladybird

New dynamic algorithm, and new tool (which also supports the common command line options).


contexts are renamed

Contexts have both a name and an identifier. The name is used to display in a readable form the nature of the context, for instance in Dot output. The identifier is used for instance headers, or predefined contexts, and libraries.

So far names and identifiers are equal, but this will change.

As a first step, identifiers/names are now <Kind><LabelSet><WeightSet> instead of <LabelSet><WeightSet><Kind>. For instance:

char_b_lal  => lal_char_b
char_br_lal => lal_char_br
char_zr_lal => lal_char_zr
char_br_lau => lau_char_br
char_br_law => law_char_br
char_zr_law => law_char_zr


pprat is removed

It was designed for the test suite. The vcsn-* tools are now sufficient for the test suite, and are exposed to the user.

vcsn-*: option overhaul The different tools had already too many different calling conventions. They are now (quite) consistent.


Calls the default implementation of aut-to-exp.

$ vcsn-standard-of -Wz     -e '{2}(ab){3}' | vcsn-aut-to-exp -Af -
$ vcsn-standard-of -Wz -Lw -e '{2}(ab){3}' | vcsn-aut-to-exp -Af -


It now also supports lifting rational expressions.

$ vcsn-lift -C char_b_lal -Ee 'abc'


dotty -> dot

The name "dotty" was incorrect (as it denotes a program instead of the format). Therefore, every occurrence of "dotty" is now mapped to "dot".


The Dot output (input and output) now uses ", " as a label separator, instead of " + ".


dyn: input/output

Routines: dyn::read_(automaton|ratexp)_(file|string) and dyn::print take the input format. Available input/output are: - "dotty". - "text". - "xml".


genset is replaced by labelset

Through out the code.


dyn: input

New routines: dyn::read_(automaton|ratexp)_(file|string).

dyn: output

New routines: dyn::print, for both automata and RatExps.

bin: new tools

vcsn-cat, vcsn-transpose (both on RatExps only currently). vcsn-standard-of. vcsn-lift.


krat -> rat

kratexp, kratexpset etc. are renamed as ratexp, ratexpset, etc.

labels are unit

labels-are-empty/lae were mapped to labels-are-unit/lau.



For consistency with dyn::automaton, vcsn::ctx::abstract_context is renamed vcsn::dyn::context. Eventually, we might turn it into a shared pointer too.

dyn::de_bruijn, bin/vcsn-de-bruijn

New tools, useful for tests for instance.

$ vcsn-de-bruijn char_b_lal 2
  vcsn_context = "char_b_lal"
  vcsn_genset = "ab"
  rankdir = LR
  node [shape = circle]
    node [style = invis, shape = none, label = "", width = 0, height = 0]
  I1 -> 1
  1 -> 1 [label = "a + b"]
  1 -> 2 [label = "a"]
  2 -> 3 [label = "a + b"]
  3 -> 4 [label = "a + b"]
  4 -> F4


dyn::parse_file and parse_string

They construct dyn::automaton's.


For bad reasons, currently works only for char_b_lal

vcsn-determinize and vcsn-evaluate

Two new shell commands to write tests.


pprat works with abstract algorithms

pprat now uses only abstract (aka, dynamic) algorithms! On OS X, it is now a 77KB program; it was 11MB before.

This schedules its death: either it will be replaced by a set of smaller grain commands (vcsn-determinize, vcsn-standard-of, etc.) from which test cases will be easier to write, or it will be replaced by some early implementation of a TAF-Kit-like unified program (instead of one per context).


automata provide a vname

To dispatch algorithms such as dotty, we not only need to know the context type name, but also the automaton type name, as mutable_automaton and transpose_automaton are two different types for instance.



Because it uses only algorithms made abstract (make_context, make_automaton_editor, and dotty), the dot parser now works for any of the precompiled contexts!



In addition to the add_entry method of mutable_automaton, there is now an add_entry algorithm, which is templated by the automaton type. This algorithm provides an abstract interface to an unknown type of automaton.



For consistency, polynomials is renamed polynomialset.

mutable_automaton::add_entry and del_entry

The first of these new functions allows to add directly a list of transition between two states by passing the corresponding entry_t (this is most useful when reading an automaton with entries, such as with the Dot parser). The second one removes every existing transition between two states.


labels are empty

Initial work on labels-are-empty automata. See the unit/char_z_lae test. The labels are not displayed, but the "{...}" to denote the weights, are kept:

  node [shape=circle]
    node [style=invis,shape=none,label="",width=0,height=0]
  I1 -> 1
  2 -> F2 [label="{10}"]
  1 -> 2 [label="{51}"]
  2 -> 3 [label="{3}"]
  2 -> 1
  1 -> 1 [label="{42}"]
  1 -> 3

In that case, the transitions do not store labels.

lift now returns a labels-are-empty automaton/ratexp

Accordingly, pprat -l (lift) now displays:

$ pprat -Lw -l 'ab+cd'
  node [shape=circle]
    node [style=invis,shape=none,label="",width=0,height=0]
  1 -> 2 [label="{ab}"]
  2 -> F2
  I1 -> 1
  3 -> F3
  1 -> 3 [label="{cd}"]



It is now possible to load an automaton from its dotty output. Actually, it is possible to write simpler automata. This is no yet fully generic: it works properly only for char_b_lal.

The test program unit/parse-dot gives access to it. When fed with the following input file:

  vcsn_context=char_b_lal vcsn_genset="a"
  {1} -> {2 3} -> {4 5 6} [label=a]
  I -> 1
  {4 5 6} -> F

it produces an automaton, and dumps it using the dotty algorithm:

  node [shape=circle]
    node [style=invis,shape=none,label="",width=0,height=0]
  1 -> 2 [label="a"]
  1 -> 3 [label="a"]
  2 -> 4 [label="a"]
  2 -> 5 [label="a"]
  2 -> 6 [label="a"]
  3 -> 4 [label="a"]
  3 -> 5 [label="a"]
  3 -> 6 [label="a"]
  I1 -> 1
  4 -> F4
  5 -> F5
  6 -> F6


pprat uses -L for labels instead of -A

For consistency, since we now also use the name "labels" to denote the leaves of rational expressions (others that and ), -A is renamed -L.

Metadata are embedded in the Dot file

The pseudo name "A" which was used in every dotty output is no longer defined, as it is both optional and useless. The context name and the alphabet are also provided. For instance:

$ ./tests/unit/ladybird-b 2 | sed 4q
$ pprat -s -L z 'abc' | sed 4q
$ pprat -s -L zr -A w 'abc' | sed 4q

This is an experimentation, and the current choice is somewhat unsatisfactory. Instead of


it is probably more sensible to use


or maybe


so that when weightsets depend for instance upon an alphabet, it can be specified too. Instead of


(which does not define the alphabet used for the weightset), one would expect:


or maybe


Also, the name "kratexpset" is of course open to discussion:



It is now possible to read back polynomials such as "a+b+{2}a".


It is used more extensively to forbid meaningless calls, such as determinizing a law automaton.


Precompiled contexts

Several predefined contexts come with their own header (e.g., "ctx/char_b_lal"), and their own library (e.g., "libchar_b_lal"). This is provide for char_{b,z,zmin}_{lal,law}.

z_min renamed zmin

Consistently with Vaucanson 1.4.



Transposition on automaton is a read/write view: operations such as del_state, add_transition, etc. on a transposed automaton actually modify the wrapped automaton: set_final calls set_initial and so forth.

As an extreme example, the following snippet:

using context_t = vcsn::ctx::char_b;
using automaton_t = vcsn::mutable_automaton<context_t>;
using tr_automaton_t = vcsn::details::transpose_automaton<automaton_t>;
context_t ctx{{'a', 'b'}};
auto ks = ctx.make_kratexpset();
auto aut = vcsn::standard_of<tr_automaton_t>(ctx, ks.conv("a+a+a+a"));

applies the standard-of algorithm to a transposed mutable_automaton. In other words,


is the transposition of a standard automaton, except that it is a mutable_automaton, not a transpose_automaton<mutable_automaton>.



The "transpose" operation is implemented on words, weights, kratexps, and automata. pprat provides support to transpose on kratexps (option -t):

pprat -W zrr -t {{{2}ab}cd}abcd   =>   {{{2}ba}dc}dcba
pprat -W zrr -t {ab}(abcd)*{cd}   =>   ({dc}(dcba)*{ba})

and on (standard) automata (option -T):

$ pprat -A w -W br    -s '{ab}(\e+a+b({abc}c{bcd})*){cd}' > forward.dot
$ pprat -A w -W br -T -s '{ab}(\e+a+b({abc}c{bcd})*){cd}' > transpose.dot
$ diff -W80 -t  -y forward.dot transpose.dot
digraph A {                            digraph A {
  rankdir=LR                             rankdir=LR
  node [shape=circle]                    node [shape=circle]
  {                                      {
    node [style=invis,shape=none,la        node [style=invis,shape=none,la
    I1                                     I1
                                    >      I2
                                    >      I4
    F1                                     F1
    F2                              <
    F4                              <
  }                                      }
  1 -> F1 [label="{(ab).(cd)}"]     |    I1 -> 1 [label="{(dc).(ba)}"]
  I1 -> 1                           |    1 -> F1
  2 -> F2 [label="{cd}"]            |    I2 -> 2 [label="{dc}"]
  1 -> 2 [label="{ab}a"]            |    2 -> 1 [label="{ba}a"]
  4 -> F4 [label="{cd}"]            |    I4 -> 4 [label="{dc}"]
  4 -> 4 [label="{(abc).(bcd)}c"]   |    4 -> 4 [label="{(dcb).(cba)}c"]
  1 -> 3 [label="{ab}b"]            |    3 -> 1 [label="{ba}b"]
  3 -> 4 [label="{(abc).(bcd)}c"]   |    4 -> 3 [label="{(dcb).(cba)}c"]
}                                      }



An initial version of aut_to_exp is available. The new pprat option -a provides an access to this algorithm: apply aut_to_exp to the standard_of an expression.

pprat      -a 'a*'     => \e+(a.(a*))
pprat      -a '(a+b)c' => (a.c)+(b.c)
pprat -W z -a '{2}({3}a+{5}b){7}c{11}' => (({6}a.{7}c)+({10}b.{7}c)){11}

Currently, the only "heuristic" implemented eliminates the states in order. There are probably possible improvements.

pprat -a '(a+b)*' | wc -c => 265

pprat: -a and -w are renamed -A and -W

new factory: de Bruijn

Builds automata for (a+b)a(a+b)^n.


standard_of is part of vcsn::

It used to be in vcsn::rat::.



kratexpset, i.e. the object that provides operation on kratexps (with specified Gen and Weight), used to derive from abstract_kratexp (which is "opaque": it does not know the precise type that is used underneath).

Now, from abstract_kratexp we derive a concrete_abstract_kratexpset which aggregates a kratexpset. This means that kratexpset no longer derives from a weakly-typed ancestor, and can provide simple and strongly-typed routines.


For consistency with weightset/weight, genset/gen, kratexps (note the s) is renamed as kratexpset and std::shared_ptr<const rat::node> as kratexp.


A new algorithm which creates, from an automaton, another one with the same states and transitions, but the new automaton features only spontaneous transitions, whose weights correspond to the labels (and weights) of the initial one.

For instance:

$ pprat -aw -wz  -sl '({2}\e+{3}a){4}'
digraph A {
  node [shape=circle]
    node [style=invis,shape=none,label="",width=0,height=0]
  1 -> F1 [label="{8}"]
  I1 -> 1
  2 -> F2 [label="{4}"]
  1 -> 2 [label="{3}a"]

digraph A {
  node [shape=circle]
    node [style=invis,shape=none,label="",width=0,height=0]
  1 -> F1 [label="{{8}\\e}"]
  I1 -> 1
  2 -> F2 [label="{{4}\\e}"]
  1 -> 2 [label="{{3}a}\\e"]

RatExps: fix is_unit

is_unit simply checked that the expression was , but did not check that weight itself was the unit.


This variable allows to force the display of weights.


Contexts aggregate shared pointers

Now contexts are mutable, and hold (shared) pointers to (immutable) gensets and weightsets. This way, we can alter contexts (e.g., the ladybird factory can add the letters it needs in a new genset), yet there is good sharing, and identity can still be used to distinguish, for instance, two gensets defined equally.

It is also simpler to really expose them as pointers, so every "weightset().mul", etc. must be rewritten as "weightset()->mul".



The Kind parameter is now part of the context. The same type of Kind is now use for both RatExps and automata. This results in many significant simplifications.

For instance, again, the test case for product:


 using context_t = vcsn::ctx::char_z;
 context_t ctx { {'a', 'b', 'c'} };
 using automaton_t =
   vcsn::mutable_automaton<context_t, vcsn::labels_are_letters>;
 automaton_t aut1(ctx);


using context_t = vcsn::ctx::char_z;
context_t ctx { {'a', 'b', 'c'} };
using automaton_t = vcsn::mutable_automaton<context_t>;

Or the source of pprat:


using atom_kind_t
  = typename Factory::kind_t;
using label_kind_t
  = typename vcsn::label_kind<atom_kind_t>::type;
using context_t =
  vcsn::ctx::context<typename Factory::genset_t,
                     typename Factory::weightset_t,
context_t ctx{factory.genset(), factory.weightset()};
using automaton_t = vcsn::mutable_automaton<context_t>;
auto aut = vcsn::rat::standard_of<automaton_t>(ctx, e);


using context_t = typename Factory::context_t;
using automaton_t = vcsn::mutable_automaton<context_t>;
auto aut = vcsn::rat::standard_of<automaton_t>(factory.context(), e);



"Contexts" were introduced to factor two aspects that are required through out the library: the GenSet type (i.e., the nature of the generators), and the WeightSet type (i.e., the nature of the weights). Not only do contexts define these types, they must also be instantiated so that "run-time" details be known: for instance the set of generators is dynamic (what are the allowed letters), and on occasion the weightset also needs run-time information (e.g., when RatExp are parameterized by RatExp, what is the alphabet of the latter ones?).

As an example of the changes on the user side, consider the product test-case.


 typedef vcsn::set_alphabet<vcsn::char_letters> alpha_t;
 typedef vcsn::mutable_automaton<alpha_t, vcsn::z,
                                vcsn::labels_are_letters> automaton_t;
 vcsn::z z;
 alpha_t alpha{'a', 'b', 'c'};
 automaton_t aut1(alpha, z);


 using context_t = vcsn::ctx::char_z;
 context_t ctx { {'a', 'b', 'c'} };
 using automaton_t =
   vcsn::mutable_automaton<context_t, vcsn::labels_are_letters>;
 automaton_t aut1(ctx);


RatExp: atoms are words

Expressions such as "(ab)(ab)" used to be equivalent to "(abab)" (a single four-letter atom). Now:

pprat -aw '(ab)(ab)'   => (ab).(ab)
pprat -aw 'abab'       => abab
pprat -aw 'ab.ab'      => (ab).(ab)
pprat -aw 'ab(ab)abc*' => (ab).(ab).(ab).(c*)


New algorithm: eval, evaluates a word over an (weighted) automaton

Defined in vcsn/algos/eval.hh as vcsn::eval.

New algorithm: determinize, Boolean automaton determinization

Defined in vcsn/algos/determinize.hh as vcsn::determinize.


RatExp: changes in the display

Fixed the output of "atoms are words" expressions. For instance "(ab)" used to be displayed as "ab" (which is wrong as it is parsed as "a(b)"). It is now properly displayed as "(ab)".

RatExp: slight changes in the grammar

A star is now valid after a weight:

$ pprat -w z '{2}ab{3}*'


dotty: define initial/final states first

In order to improve readability, instead of

digraph A {
  node [shape=circle];
  F1 [style=invis,shape=none,label="",width=0,height=0]
  1 -> F1 [label="{a.a.((d.d)*)}"]
  3 -> 2 [label="{(d.d)*}b"]
  I1 [style=invis,shape=none,label="",width=0,height=0]
  I1 -> 1
  1 -> 2 [label="{a.a.((d.d)*)}b"]
  F3 [style=invis,shape=none,label="",width=0,height=0]
  3 -> F3 [label="{(d.d)*}"]
  2 -> 3 [label="b"]

we now produce

digraph A {
  node [shape=circle]
    node [style=invis,shape=none,label="",width=0,height=0]
  1 -> F1 [label="{a.a.((d.d)*)}"]
  3 -> 2 [label="{(d.d)*}b"]
  I1 -> 1
  1 -> 2 [label="{a.a.((d.d)*)}b"]
  3 -> F3 [label="{(d.d)*}"]
  2 -> 3 [label="b"]


RatExp: support for the kind of atoms

As a major overhaul, the rational expressions (vcsn::rat::node) are now parameterized by Atom, which denotes the atom value. The kratexps structure is now parameterized by the Kind, from which it is deduced, from the GenSet parameter, whether we should use word_t or letter_t atoms.

pprat: an option -a

To provide user-access to these feature, pprat now supports an option -a, which accepts "letters" or "words" as argument, with obvious meaning. For instance:

pprat -a letters 'abc' => a.b.c
pprat -a letters 'abc.abc' => a.b.c.a.b.c
pprat -w br -al '{aa}bb{c}dd{a}' => ({a.a}(b.b).{c}(d.d)){a}

pprat -a words 'abc' => abc
pprat -a words 'abc.abc' => abc.abc
pprat -w br -aw '{aa}bb{c}dd{a}' => ({aa}bb.{c}dd){a}

Of course, this also works with the "standard-of" option:

$ pprat -w br -al '{aa}({dd}\e+bb)*'
$ pprat -s -w br -al '{aa}({dd}\e+bb)*'
digraph A {
  node [shape=circle];
  F1 [style=invis,shape=none,label="",width=0,height=0]
  1 -> F1 [label="{a.a.((d.d)*)}"]
  3 -> 2 [label="{(d.d)*}b"]
  I1 [style=invis,shape=none,label="",width=0,height=0]
  I1 -> 1
  1 -> 2 [label="{a.a.((d.d)*)}b"]
  F3 [style=invis,shape=none,label="",width=0,height=0]
  3 -> F3 [label="{(d.d)*}"]
  2 -> 3 [label="b"]


$ pprat  -w br -aw '{aa}({dd}\e+bb)*'
$ pprat -s -w br -aw '{aa}({dd}\e+bb)*'
digraph A {
  node [shape=circle];
  F1 [style=invis,shape=none,label="",width=0,height=0]
  1 -> F1 [label="{aa.(dd*)}"]
  2 -> 2 [label="{dd*}bb"]
  F2 [style=invis,shape=none,label="",width=0,height=0]
  2 -> F2 [label="{dd*}"]
  1 -> 2 [label="{aa.(dd*)}bb"]
  I1 [style=invis,shape=none,label="",width=0,height=0]
  I1 -> 1


RatExp: in some case the weights could be lost

"Associativity" was applied too eagerly, which would result in loss of some weights. E.g. {a}bb{c}dd resulted in b.b.d.d, not it evaluates to {a}(b.b).{c}(d.d).


RatExp: improved pretty-printing

The outermost pair of parentheses is removed if useless. For instance:

 (a.b) => a.b
 (a+b+c) => a+b+c
 (a*) => a*

But in the following examples they are kept.


standard-of: star is fixed

Standard-of seems to be correct.


Expressions overhaul

They are immutable: we no longer make side effects on expressions. They are shared_ptr, no longer plain pointers. They can be used like the other values, by value.


standard-of is fully implemented

Support for star was implemented, and checked for B and Z. For implementation reasons, one cannot yet use rational expressions as weights.



RatExp::head and tail


In order to improve the readability of its output, it no longer "defines" the reachable states. See the following diff:

 digraph A {
   node [shape=circle];
-  1
-  2
-  3
-  4
   I1 [style=invis,shape=none,label="",width=0,height=0]
   I1 -> 1 [label="{6}"]
   F1 [style=invis,shape=none,label="",width=0,height=0]
   1 -> F1
   1 -> 2 [label="a"]
   1 -> 3 [label="a"]
   2 -> 4 [label="{3}b"]
   F4 [style=invis,shape=none,label="",width=0,height=0]
   4 -> F4
   4 -> 3 [label="a"]

standard-of: many fixes in the handling of the weights

An expression such as "{12}" used to leave the weight in the initial transition; it is now on the final transition. More generally the initial transition always has unit as weight.

The product and sum of expressions now handle the left and right weights.

Accepting initial states in expressions such as "+a" are no longer lost.


Many renamings

alphabet_t/alphabet() -> genset_t/genset(), etc.
factory -> abstract_kratexp.
factory_ -> kratexp.
initials() -> initial_transitions(), etc.
invalid_state -> null_state, etc.
nb_state() -> num_states(), etc.
polynomial -> polynomials, etc.



An implementation of the product of two automata is available.



An example of tropical semiring, to test show_unit().



This method return a special, reserved character, that is used to
label initial and final transitions.  This character is not part of
the alphabet and is never output.



Initial work on "+".

mutable_automata are implemented using a pre() and post() states

What is missing is the correct '$' letter on the initial and final
transitions.  The current value is the default value for label_t.

The previous interface has been preserved (but maybe we should
clean it) except for one change:

  initial() and final() have been changed to return a pseudo
  container of transitions.  These transitions give us both
  the weight and the initial/final state.

The following methods are new:

  pre(), post()         returns the pre-initial and post-final state.
  all_states()          returns all states, including pre() and post()
  all_transitions()     returns all transitions, including initial and
                          final transitions
  all_entries()         likewise for entries
  all_out(s), all_in(s) likewise for outgoing and ingoing transitions

The methods get_initial_weight(s) and get_final_weight(s) are slower
now, because they need to locate the corresponding initial/final
transition.  For the same reason, is_initial(), is_final() are also



Initial version of standard-of is implemented. Can be tested with pprat's new option -s:

$ pprat -wz -s '{123}a'
digraph A {
  node [shape=circle];
  I1 [style=invis,shape=none,label="",width=0,height=0]
  I1 -> 1
  F2 [style=invis,shape=none,label="",width=0,height=0]
  2 -> F2
  1 -> 2 [label="{123}a"]

mutable_automaton uses unsigned for state_t and transition_t.

This allows to store states in a std::vector<stored_state_t>. Likewise for transitions. Erased elements are marked (so they are skipped over during iteration), and added to a free store to be reused later.

mutable_automaton does not store any weight when WeightSet == b.

mutable_automaton has a read-only entry interface

entries() is a pseudo container that filters transitions()
  to see each (src,dst) pair at most once.

entry_at(src, dst) and entry_at(t) return a polynomial describing
  the entry between (src, dst) or (src_of(t), dst_of(t)).

entryset() returns the WeightSet that can be used to manipulate
  these polynomials.

make check-rat, make check-unit

There is a check target for each subdirectory of tests/.


Alphabets are checked

pprat is hard-coded to use a, b, c, d for all the alphabets (including for inner rational expressions):

$ pprat -w b -e 'y'
1.1: invalid word: y: invalid letter: y

Weights are checked

As follows:

$ pprat -w b -e '{12}a'
1.1-5: invalid Boolean: 12

Unfortunately the locations are bad currently for complex weights:

$ pprat -w zr -e '{x}a'
1.1-4: 1.1: invalid word: x: invalid letter: x

To be fixed.


G++ 4.7 is required

We use constructs that are not supported by 4.6 (e.g., constructor delegation).


As a demonstration that rational expressions can be weights of rational expressions, pprat supports '-w zrr' (Rat<Rat<Z>>):

 $ pprat -w zr -e '{{{2}{3}a}u}x{{{4}{5}\e}\e}'