Jobs/M2 AD 2015 Vcsn for Linguists
|Vcsn for Linguists|
M2 AD 2015 Vcsn for Linguists
5-6 months in 2015
|General presentation of the field||
The classical theory of automata, of transducers and of rational expressions, admits a very elegant and extremely useful extension (eg, in natural language processing) taking into account the concept of weighting. The weights are then taken in a semiring, which can be classical (⟨𝔹, ∨, ∧⟩, ⟨ℤ, +, ×⟩, ⟨ℚ, +, ×⟩, etc..), tropical (⟨ℤ, min, +⟩, etc..), or yet of another type (e.g. rational expressions).
Automata are heavily used in computational linguistics, and conversely, automata used in computational linguistics are "heavy".
Vcsn is a project led by Alexandre Duret-Lutz and Akim Demaille (LRDE). It is a platform for the manipulation of automata, transducers and weighted rational expressions. It is written in C++11 avoiding the classical object-oriented programming in favor of generic programming (template) for more performance. Vcsn is an heir of the Vaucanson 2 project which was developed in partnership with Jacques Sakarovitch (Telecom ParisTech) and Sylvain Lombardy (LaBRI).
Vcsn has a sound base of data structure and algorithms for automata and rational expressions. It is already able to deal with many of the typical needs of linguists. However some specific semirings have not been implemented, and some well-known algorithms are needed.
The objective of this internship is to develop a complete Computational Linguistics toolchain on top of Vcsn, for instance the "SMS to French" project from François Yvon (sms.limsi.fr). To this end, Vcsn will have to be completed with the needed data structures and algorithms, and possibly the existing implementation will have to be overhauled to cope with the extremely demanding size of these real-life automata.
|Benefit for the candidate|
|Place||LRDE: How to get to us|
1000 € gross/month
|Future work opportunities||
If you have performed the internship satisfactorily, we would like it to be followed by a PhD thesis.
<akim at lrde . epita . fr>