{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Automata\n",
"\n",
"## Editing Automata\n",
"Vcsn provides different means to enter automata. One, which also applies to plain Python, is using the `automaton` constructor:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"mutable_automaton, z>>"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import vcsn\n",
"vcsn.automaton('''\n",
"context = \"lal_char(ab), z\n",
"$ -> p <2>\n",
"p -> q <3>a,<4>b\n",
"q -> q a\n",
"q -> $\n",
"''')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"See the documentation of `vcsn.automaton` for more details about this function. The syntax used to define the automaton is, however, described here.\n",
"\n",
"In order to facilitate the definition of automata, Vcsn provides additional ''magic commands'' to the IPython Notebook. We will see through this guide how use this command."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `%%automaton`: Entering an Automaton\n",
"\n",
"IPython supports so-called \"cell magic-commands\", that start with `%%`. Vcsn provides the `%%automaton` magic command to enter the literal description of an automaton. For instance, the automaton above can entered as follows:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"name_automaton, z>>>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%automaton a\n",
"context = \"lal_char(ab), z\"\n",
"$ -> p <2>\n",
"p -> q <3>a, <4>b\n",
"q -> q a\n",
"q -> $"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The first argument, here `a`, is the name of the variable in which this automaton is stored:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"name_automaton, z>>>"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You may pass the option `-s` or `--strip` to strip the automaton from its layer that keeps the state name you have chosen. In that case, the internal numbers are used, unrelated to the user names (actually, the numbers are assigned to state names as they are encountered starting from 0)."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"mutable_automaton, z>>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%automaton --strip a\n",
"context = \"lal_char(ab), z\"\n",
"$ -> p <2>\n",
"p -> q <3>a, <4>b\n",
"q -> q a\n",
"q -> $"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"mutable_automaton, z>>"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The second argument specifies the format in which the automaton is described, defaulting to `auto`, which means \"guess the format\":"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"name_automaton, z>>>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%automaton a dot\n",
"digraph\n",
"{\n",
" vcsn_context = \"lal_char(ab), z\"\n",
" I -> p [label = \"<2>\"]\n",
" p -> q [label = \"<3>a, <4>b\"]\n",
" q -> q [label = a]\n",
" q -> F\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"name_automaton, z>>>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%automaton a\n",
"digraph\n",
"{\n",
" vcsn_context = \"lal_char(ab), z\"\n",
" I -> p [label = \"<2>\"]\n",
" p -> q [label = \"<3>a, <4>b\"]\n",
" q -> q [label = a]\n",
" q -> F\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Automata entered this way are persistent: they are stored in the notebook and will be recovered when the page is reopened."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### `%automaton`: Text-Based Edition of an Automaton\n",
"\n",
"In IPython \"line magic commands\" begin with a single `%`. The line magic `%automaton` takes three arguments:\n",
"1. the name of the automaton\n",
"2. the format you want the textual description of the automaton. Defaults to `auto`.\n",
"3. the display mode: `h` for horizontal and `v` for vertical. Defaults to `h`.\n",
"\n",
"Contrary to the cell magic, the `%automaton` can be used to update an existing automaton:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%automaton a"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The real added value is that now you can interactively edit this automaton: changes in the text are immediately propagated on the rendered automaton.\n",
"\n",
"When given a fresh variable name, `%automaton` creates a dummy automaton that you can use as a starting point:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%automaton b fado"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Beware however that these automata are _not_ persistent: changes will be lost when the notebook is closed."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Automata Formats\n",
"\n",
"Vcsn supports differents input and output formats. Some, such as `tikz`, are only export-only formats: they cannot be read by Vcsn.\n",
"\n",
"### daut (read/write)\n",
"This simple format is work in progress: its precise syntax is still subject to changes. It is roughly a simplification of the `dot` syntax. The following example should suffice to understand the syntax. If \"guessable\", the context can be left implicit."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"name_automaton, z>>>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%automaton a\n",
"context = \"lal_char(ab), z\"\n",
"$ -> p <2>\n",
"p -> q <3>a, <4>b\n",
"q -> q a\n",
"q -> $"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### dot (read/write)\n",
"This format relies on the \"dot\" language of the GraphViz toolkit (http://graphviz.org). This is the default format for I/O in Vcsn.\n",
"\n",
"An automaton looks as follows:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"name_automaton, b>>>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%automaton a dot\n",
"// The comments are introduced with //, or /* ... */\n",
"//\n",
"// The overall syntax is that of Dot for directed graph (\"digraph\").\n",
"digraph\n",
"{\n",
" // The following attribute defines the context of the automaton.\n",
" vcsn_context = \"lal_char, b\"\n",
" // Initial states are denoted by an edge between a node whose name starts\n",
" // with an \"I\". So \"0\" is a initial state.\n",
" I -> 0\n",
" // Transitions are edges whose label is that of the transition.\n",
" 0 -> 0 [label = \"a\"]\n",
" 0 -> 0 [label = \"b\"]\n",
" 0 -> 1 [label = \"c, d\"]\n",
" // Final states are denoted by an edge to a node whose name starts with \"F\".\n",
" 1 -> Finish\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### efsm (read/write)\n",
"This format is designed to support import/export with OpenFST (http://openfst.org): it wraps its multi-file format (one file describes the automaton with numbers as transition labels, and one or several others define these labels) into a single format. It is not designed to be used by humans, but rather to be handled by two tools:\n",
"- `efstcompile` to compile such a file into the OpenFST binary file format,\n",
"- `efstdecompile` to extract an `efsm` file from a binary OpenFST file.\n",
"\n",
"#### efsm for acceptors (single tape automata)\n",
"As an example, consider the following exchange between Vcsn and OpenFST."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"mutable_automaton, zmin>>"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = vcsn.context('lal_char(ab), zmin').expression('[ab]*a(<2>[ab])').automaton()\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"#! /bin/sh\n",
"\n",
"me=$(basename \"$0\")\n",
"medir=$(mktemp -d \"/tmp/$me.XXXXXX\") || exit 1\n",
"\n",
"arc_type=standard\n",
"\n",
"cat >$medir/symbols.txt <<\\EOFSM\n",
"\\e\t0\n",
"a\t1\n",
"b\t2\n",
"EOFSM\n",
"\n",
"cat >$medir/transitions.fsm <<\\EOFSM\n",
"0\t1\ta\t2\n",
"0\t2\ta\t2\n",
"0\t2\tb\t2\n",
"1\t3\ta\t0\n",
"1\t3\tb\t0\n",
"2\t1\ta\t0\n",
"2\t2\ta\t0\n",
"2\t2\tb\t0\n",
"3\t0\n",
"EOFSM\n",
"\n",
"fstcompile --acceptor \\\n",
" --arc_type=$arc_type \\\n",
" --keep_isymbols --isymbols=$medir/symbols.txt \\\n",
" --keep_osymbols --osymbols=$medir/symbols.txt \\\n",
" $medir/transitions.fsm \"$@\"\n",
"sta=$?\n",
"\n",
"rm -rf $medir\n",
"exit $sta\n"
]
}
],
"source": [
"efsm = a.format('efsm')\n",
"print(efsm)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following sequence of operations uses OpenFST to determinize this automaton, and to load it back into Vcsn."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"#! /bin/sh\n",
"\n",
"me=$(basename \"$0\")\n",
"medir=$(mktemp -d \"/tmp/$me.XXXXXX\") || exit 1\n",
"\n",
"arc_type=standard\n",
"\n",
"cat >$medir/symbols.txt <<\\EOFSM\n",
"\\e\t0\n",
"a\t1\n",
"b\t2\n",
"EOFSM\n",
"\n",
"cat >$medir/transitions.fsm <<\\EOFSM\n",
"0\t1\ta\t2\n",
"0\t2\tb\t2\n",
"1\t3\ta\n",
"1\t4\tb\n",
"2\t1\ta\n",
"2\t2\tb\n",
"3\t3\ta\n",
"3\t4\tb\n",
"3\n",
"4\t1\ta\n",
"4\t2\tb\n",
"4\n",
"EOFSM\n",
"\n",
"fstcompile --acceptor \\\n",
" --arc_type=$arc_type \\\n",
" --keep_isymbols --isymbols=$medir/symbols.txt \\\n",
" --keep_osymbols --osymbols=$medir/symbols.txt \\\n",
" $medir/transitions.fsm \"$@\"\n",
"sta=$?\n",
"\n",
"rm -rf $medir\n",
"exit $sta\n",
"\n"
]
},
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"mutable_automaton, zmin>>"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import os\n",
"\n",
"# Save the EFSM description of the automaton in a file.\n",
"with open(\"a.efsm\", \"w\") as file:\n",
" print(efsm, file=file)\n",
"\n",
"# Compile the EFSM into an OpenFST file.\n",
"os.system(\"efstcompile a.efsm >a.fst\")\n",
"\n",
"# Call OpenFST's determinization.\n",
"os.system(\"fstdeterminize a.fst >d.fst\")\n",
"\n",
"# Convert from OpenFST format to EFSM.\n",
"os.system(\"efstdecompile d.fst >d.efsm\")\n",
"\n",
"# Load this file into Python.\n",
"with open(\"d.efsm\", \"r\") as file:\n",
" d = file.read()\n",
" \n",
"# Show the result.\n",
"print(d)\n",
"\n",
"# Now read it as an automaton.\n",
"d_ofst = vcsn.automaton(d, 'efsm')\n",
"d_ofst"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For what it's worth, the above sequence of actions is realized by `a.fstdeterminize()`.\n",
"\n",
"Vcsn and OpenFST compute the same automaton."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"determinized_automaton, zmin>>, vcsn::wet_kind_t::map, false>"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a.determinize()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### efsm for transducers (two-tape automata)\n",
"The following sequence shows the round-trip of a transducer between Vcsn and OpenFST."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"mutable_automaton, letterset>, zmin>>"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"t = a.partial_identity()\n",
"t"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"#! /bin/sh\n",
"\n",
"me=$(basename \"$0\")\n",
"medir=$(mktemp -d \"/tmp/$me.XXXXXX\") || exit 1\n",
"\n",
"arc_type=standard\n",
"\n",
"cat >$medir/isymbols.txt <<\\EOFSM\n",
"\\e\t0\n",
"a\t1\n",
"b\t2\n",
"EOFSM\n",
"\n",
"cat >$medir/osymbols.txt <<\\EOFSM\n",
"\\e\t0\n",
"a\t1\n",
"b\t2\n",
"EOFSM\n",
"\n",
"cat >$medir/transitions.fsm <<\\EOFSM\n",
"0\t1\ta\ta\t2\n",
"0\t2\ta\ta\t2\n",
"0\t2\tb\tb\t2\n",
"1\t3\ta\ta\t0\n",
"1\t3\tb\tb\t0\n",
"2\t1\ta\ta\t0\n",
"2\t2\ta\ta\t0\n",
"2\t2\tb\tb\t0\n",
"3\t0\n",
"EOFSM\n",
"\n",
"fstcompile \\\n",
" --arc_type=$arc_type \\\n",
" --keep_isymbols --isymbols=$medir/isymbols.txt \\\n",
" --keep_osymbols --osymbols=$medir/osymbols.txt \\\n",
" $medir/transitions.fsm \"$@\"\n",
"sta=$?\n",
"\n",
"rm -rf $medir\n",
"exit $sta\n"
]
}
],
"source": [
"tefsm = t.format('efsm')\n",
"print(tefsm)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"mutable_automaton, letterset>, zmin>>"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"vcsn.automaton(tefsm)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Details about the EFSM format\n",
"The EFSM format is a simple format that puts together the various files that OpenFST uses to serialize and deserialize automata: one or two files to describe the labels (called \"symbol tables\"), and one to list the transitions. More details about these files can be found on [FSM Man Pages](https://web.archive.org/web/20141006203106/http://www2.research.att.com/~fsmtools/fsm/man4/fsm.5.html).\n",
"\n",
"When reading an EFSM file, Vcsn expects the following bits:\n",
"\n",
"- a line `arc_type=TYPE` which specifies the weightset. If `TYPE` is `log` or `log64`, this is mapped to the `log` weightset, if it is `standard`, then it is mapped to `zmin` or `rmin`, depending on whether floating points were used.\n",
"\n",
"- a \"here-document\" (the Unix name for embedded files, delimited by `<\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"mutable_automaton, b>>"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = vcsn.B.expression('a+b').standard()\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"@DFA 1 3\n",
"0 a 1\n",
"0 b 3\n"
]
}
],
"source": [
"print(a.format('fado'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### grail (write)\n",
"This format is made to exchange automata with the Grail (http://grailplus.org). Weighted automata are not supported."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"mutable_automaton, b>>"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = vcsn.B.expression('a+b').standard()\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(START) |- 0\n",
"0 a 1\n",
"0 b 3\n",
"1 -| (FINAL)\n",
"3 -| (FINAL)\n"
]
}
],
"source": [
"print(a.format('grail'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### tikz (write)\n",
"This format generates a LaTeX document that uses TikZ syntax to draw the automaton. Note that the layout is not computed: all the states are simply reported in a row. You will have to tune the positions of the states by hand. However, it remains a convenient way to start."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"mutable_automaton, q>>"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = vcsn.Q.expression('<2>a+<3>b').standard()\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\\documentclass{standalone}\n",
" \\usepackage{tikz}\n",
" \\usetikzlibrary{arrows.meta, automata, bending, positioning, shapes.misc}\n",
" \\tikzstyle{automaton}=[shorten >=1pt, >={Stealth[bend,round]}, initial text=]\n",
" \\tikzstyle{accepting}=[accepting by arrow]\n",
"\n",
"\\begin{document}\n",
"\\begin{tikzpicture}[automaton, auto]\n",
" \\node[state,initial] (0) {$0$};\n",
" \\node[state,accepting] (1) [right=of 0] {$1$};\n",
" \\node[state,accepting] (3) [right=of 1] {$3$};\n",
" \\path[->] (0) edge node {$\\left\\langle 2\\right\\rangle a$} (1);\n",
" \\path[->] (0) edge node {$\\left\\langle 3\\right\\rangle b$} (3);\n",
"\\end{tikzpicture}\n",
"\\end{document}\n"
]
}
],
"source": [
"print(a.format('tikz'))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.4.4"
}
},
"nbformat": 4,
"nbformat_minor": 0
}