# Courses/ThlLog2016

### From LRDE

This page contains the log of the topics that of the [[THL-Course][THL - Language Theory]] for class EPITA 2016 (i.e., from Oct 2016 to Nov 2016). A majority of the students have already followed [[THLR-Course][THLR - Rational Language Theory]]. This course is a preamble to

  * [[TYLA-Course][TYLA - Concepts of Programming Languages]]
* [[CMP1-Course][CMP1 - Compiler Construction 1]]
* [[CMP2-Course][CMP2 - Compiler Construction 2]].


Below, the star symbol (*) designates topics that were not covered for lack of time.

Courses: %TOC%

---+ 1&2: 2010-09-30: Introduction (30min), Languages and Automata (2h30)

  1 Presentation of the Lectures
(THL-Course, CMP1-Course, TYLA-Course, CMP2-Course)
1 Why studying languages
2 Computers
3 Compilers/Interpreters
3 Formal languages (XML, <nop>LaTeX, SQL, etc.)
3 Intrusion Detection Systems
2 Biotechs
3 Fuzzy matching
2 Natural Language Processing
1 Bases
2 Alphabet
2 Word
2 Language
1 Examples
2 a^n
2 a^n b^n
2 binary numbers
2 prime binary numbers
2 C programs
2 French
1 Infinite words
2 traces
2 Omega-words
1 Concatenation
2 Concatenation
2 Epsilon
2 Noncommutativity (*)
3 xy = yx => \exists u: x = u^i, y = u^j
2 Free Monoid
2 Power
2 Length
1 Relations (*)
2 Prefix, Suffix, Factor, Subword
1 Distance between words (*)
2 Longest prefix length
2 Suffix, Factor, Subword
2 Editing distance
1 Order on words (*)
2 Prefix, Suffix etc.
2 Total order: lexicographic
2 Total well founded order: alphabetic (aka radical, military)
1 Operations on Languages
2 Union
2 Intersection
2 Complementation
2 Concatenation
2 Pref, Suf, Fac, Sub  (1)
2 Quot (L_{/w} = w^-1 L = {v \in \sigma^\star : wv \in L})  (*)
2 Power
2 Kleene
1 Defining Languages thanks to Set Operations: Rat
1 Examples
2 a^n
2 binary numbers
2 floats (*)
2 a^n b^n ?
1 Rational Expressions
1 Real World Rational Expressions(*)
2 Syntactic Sugar
2 Syntaxes (*grep, *lex, AWK, Perl, Emacs etc.) (*)
2 Toying with Grep (*)
2 A <nop>RegExp for Pascal {} comments (*)
2 A <nop>RegExp for C strings (*)
2 A <nop>RegExp for C comments (*)
2 A <nop>RegExp for nested {} comments (*)
2 Changing Massively Identifiers thanks to Perl (*)
1 Matching <nop>RegExps: Naive Automata
2 Thompson
2 Epsilon-NFA
1 Epsilon transitions
2 Backward closure
2 Elimination
2 Nondeterministic Automata


---+ 3: 2008-10-18: Deterministic Automata, Rational Languages (1h30)

  1 Understanding Nondeterministic Computations
2 Perfect Fork
2 The Oracle
2 <nop>BackTracking: exponential time
1 Determinization
2 Eliminating nondeterminism
2 Deterministic Automata
2 Exponential Space vs Linear Time
1 Trimmed Automaton
2 Reachability
2 Usefulness
1 Interpreting the states of a DFA for (a|n)*a(a|b)(a|b)
2 Can an automaton understand a^nb^n?
2 The Pumping Lemma
2 !RegExps are not enough
1 Computing the language of an automaton (*)
1 Rational is Recognizable
1 Rational Languages Stability (*)
1 transposition (reversing a language) (*)
1 intersection (*)
1 complementation, importance of determinism and completion (*)
1 Prefix, Suffix, and Factor (*)
1 Subwords(*)
1 Minimization (*)
2 Distinguishable states (*)
One can reach an accepting state on some string, but not the
other.  Accepting states are not required to be equal.
2 Indistinguishable, equivalent states (*)
2 Minimizing a DFA. (*)
1 Rational Languages Tests (*)
2 Emptiness test
2 Membership test
2 Equivalence test (using the state equivalence table)


---+ 4: 2008-10-14: Grammars, Chomsky's Hierarchy (1h30)

  1 Is there a perfect system? (*)
2 Cardinality of languages
2 Cardinality of the set of languages
2 A Language Specification System is a language
2 Cardinality mismatch
1 Naive Rewriting Systems (Semi Thue Systems) (*)
1 Naive Grammars for Small Natural Language Subset
1 Terminal, Nonterminal
1 A Grammar to Generate "a^n b^n c^n"
2 A -> aABC
2 A -> abC
2 CB -> BC
2 bB -> bb
2 bC -> bc
2 cC -> cc
1 Type 0 Generative Grammars
1 Recursively Enumerable
2 If a language is type 0, then it is recursively enumerable (*)
2 If a language is recursively enumerable, then it is type 0
(not proved) (*)
1 Derivation, "Derivation Graph", Sentential Form, Protosentences
1 A Grammar Generating 'Dick, Ceriel, and Noam' Sentences
2 Sentence -> Name | List End
2 Name -> ceriel | dick | noam
2 List -> Name | Name ',' List
2 ',' Name End -> and Name
1 Type 1 Monotonic Grammars
2 Sentence -> Name | List
2 Name -> ceriel | dick | noam
2 List -> <nop>EndName | Name ',' List
2 ',' <nop>EndName -> and Name


---+ 5: 2012-10-22: Grammars, Chomsky's Hierarchy (2h)

  1 If a language is monotonous, it is recursive (*)
1 The converse does not hold
1 Type 1 Context Sensitive Grammars
2 Sentence -> Name | List
2 Name -> ceriel | dick | noam
2 List -> <nop>EndName | Name Comma List
2 Comma <nop>EndName -> and <nop>EndName
2 and <nop>EndName -> and Name
2 Comma -> ','
1 Back to "a^n b^n c^n"
2 S -> abc
2 S -> aSQ
2 bQc -> bbcc
2 cQ -> Qc
1 Reback to "a^n b^n c^n"
2 S -> abC
2 S -> aSQ
2 bQC -> bbCC
2 CQ -> CX
2 CX -> QX
2 QX -> QC
2 C -> c
1 A context sensitive grammar is Monotonous
1 A monotonous grammar can be transformed into context sensitive
1 Type 2 Context Free Grammars
2 Sentence -> Name | List and Name
2 Name -> ceriel | dick | noam
2 List -> Name ',' List | Name
1 Type 3 Rational Grammars
2 Sentence -> ceriel | dick | noam | List
2 List -> ceriel <nop>ListTail | dick <nop>ListTail | noam <nop>ListTail
2 <nop>ListTail -> ',' List | and ceriel | and dick | and noam
1 Linear is Rational
1 Type 4 Finite Choice Grammars
1 Chomsky Hierarchy
1 (Weakly) Equivalent Grammars


---+ 6: 2012-10-22: Grammars, Chomsky's Hierarchy (2h)

  1 Computation Models (FSM, PDA, LBTM, TM)
1 Computation Power (P, NP)
1 Interpreting the Various Types as Levels of Correctness
2 Lexically Correct
2 Syntactically Correct
2 "Semantically" Correct
2 Terminating Programs
2 Correct Programs (i.e., that solve the given problem)
1 Case of the Epsilon-productions
2 Type 1 + A -> epsilon = Type 0
2 Type 2 + A -> epsilon = Type 2
2 Type 3 + A -> epsilon = Type 3


---+ 5: 2008-11-02: Context Free Grammars (1h30)

  1 Context Free Grammars
2 S -> a S b | epsilon
1 Backus Naur Form (*)
1 Extended Backus Naur Form (*)
1 Tiger Grammar (*)
1 Arithmetics I
2 S -> S + S
2 S -> n
1 Rightmost Derivation
1 Derivation Trees
1 Ambiguities
1 How to fix ambiguities
1 Arithmetics II
2 S -> S + S
2 S -> S * S
2 s -> n
1 Disambiguation
1 Completion: unary minus, and parentheses
2 E -> E + T | T
2 T -> T * F | F
2 F -> n | (E) | -F
1 The dangling else (*)
1 Hygiene (*)
2 Undefined Nonterminals
2 Unused Nonterminals
2 Nonproductive Nonterminals
2 Useless Nonterminals
2 Loops


---+ 6: 2008-10-27: Parsing: LL (2h)

  1 Writing a simple parser for shell
2 inst -> 'if' expr 'then' inst 'fi'
2 inst -> 'print' 'foo'
2 expr -> true | false
1 Looking for a criterion, the first token of rules
1 Making it more complex
2 inst -> 'if' expr 'then' inst 'fi'
2 inst -> 'print' 'foo'
2 expr -> cond
2 cond -> true | false
1 Looking for a criterion, FIRST of an LHS
1 Error Recovery
1 Making it worse
2 inst -> 'if' expr 'then' inst 'fi'
2 inst -> 'print' 'foo'
2 expr -> neg cond
2 neg -> 'not' neg | epsilon
2 cond -> true | false
1 Looking for a criterion: FIRST, FOLLOW, and NULLABLE
1 Computing FIRST, FOLLOW and NULLABLE for the above grammar
1 Writing the (strong)-LL(1) table
2 For all A -> alpha
|     | *First (alpha)* |
| *A* | A -> alpha      |
2 For all A -> alpha st. NULLABLE(alpha)
| *A* | A -> alpha   |
1 Applying the theory to
2 grammar
3  Z -> X Y Z
3  Z -> d
3  Y -> c
3  Y -> epsilon
3  X -> Y
3  X -> a
2 Tables
Nullable = { X, Y }
| Symbol | Nullable | First | Follow |
|   Z    |  N       | acd   | $| | Y | Y | c | acd | | X | Y | ac | acd | 2 Parser | | a | c | d | | S | 0/ | 0/ | 0/ | | Z | 1/ | 1/ | 1&2/ | | Y | /4 | 3/4 | /4 | | X | 6/5 | 5/5 | /5 |  ---+ 7: 2012-11-15: Parsing: LL(1) (2h)  1 Applying the LL algorithms onto arithmetics 2 Left Recursion 2 Left Factorization 2 Write the table 2 Insist on the fact that there is no problem with unary/binary -/+. 1 Critique the parser 2 Associativities are wrong, repairing them is intricate. 2 Show the corresponding C code 2 Introducing Rational Expression in Grammars: EBNF 2 Writing the LL parser for the arithmetics 2 Picturing CFG as collections of automata (*) 2 Where is the hidden memory? (automata => rational, but there are (' )'). (*) 1 Tabular vs. Code (*) 2 Register Pressure (no longer very true) 2 Optimizations 2 Prefetching 2 Swich vs. Function Pointer table 2 Virtual Tables vs. Switch (Eiffel vs. C++) (*)   1 Understanding how LL works: decisions are taken at the *beginning* of the rule. 1 Trying to imagine a means to overcome this limitation 1 Recognizing balanced expressions 2 S -> '(' S ')' 2 S -> 'n' 1 Isolating two main actions: shifting/reducing. 1 Adding$, end of input
1 Characterizing the reducible stacks
1 Finding Rational Expressions Characterizing these Stacks
1 Building the automaton directly, without rational expressions


---+ 8: 2012-11-22: Parsing: LR(0), SLR(1), LR(1)

  1 A more difficult grammar
2 E -> E '-' 'n' | 'n'
2 Don't forget the S -> E $rule. 1 Another 2 E -> 'n' '-' E | 'n' 2 Don't forget the S -> E$ rule.
1 Comparing the two automata, computed directly
2 The first one is a tree
2 The second has a cycle
2 How should that be interpreted (finite stack for the first one).
2 Hence LR seems to prefer left assoc.
1 Writing the automata as tables
2 The second automaton has a conflict
2 Introducing the concept of Look-ahead: SLR(1)
2 Back to left rec. vs. right rec.: (S)LR does the two,
but *prefers* left.
2 (recursive) LL *cannot* do left assoc.
1 Going Even Further: The "R/L-value" grammar
2 S -> V = E
2 S -> E
2 E -> V
2 V -> id
2 V -> * E
1 SLR(1): the "R/L-value" grammar above does not work, why?
1 LR(1) grammars
2 LR(1) items, computing LR(1) automatons
2 Compute the LR(1) automaton for the "R/L-value" grammar
1 Some sizes of LR(1) automata
2 arithmetics.
A small grammar but unambiguous.
There are 5 terminals, 3 non terminals, and 6 rules.
2 medium.
A medium size grammar comprising 41 terminals,
38 non terminals, and 93 rules.
2 programming.
A real size grammar of a programming language,
composed of 82 terminals, 68 non terminals, and 234 rules.
|           |*LR(1)*| *LR(1)* |*LALR(1)*|*LALR(1)*|*DRLR(1)*|*DRLR(1)*|
|*Grammar*  |*St.*  |*Entries*| *St.*   |*Entries*| *St.*   |*Entries*|
|arithmetics|22     | 71      | 12      | 26      | 8       | 23      |
|medium     |773    | 6,874   | 173     | 520     | 118     | 781     |
|programming|1,000+ | 15,000+ | 439     | 3,155   | 270     | 3,145   |

  1 LR(1) automata too big: LALR(1)
2 Compute the LR(1) automaton for the "R/L-value"
2 Merging states
2 Reduce the LR(1) automaton for the "R/L-value" grammar to LALR(1)
2 In practice: do not compute the LR(1) automaton
1 Ambiguous arithmetics:
2 E -> E + E
2 E -> E * E
2 E -> n
1 Compute the automaton
1 Yes, of course it's ambiguous
1 Fill the table
1 Using the precedence
1 Using %left', %right', %nonassoc' for associativities
1 This automaton is better
2 Nicer grammar
2 Smaller automaton
2 Fewer transitions


---+ 11: 2012-11-26: Yacc/Bison, Lex/Flex

  1 Architecture of a Compiler
2 Scanner (char stream -> token stream)
2 Parser (token stream -> synthetic representation of the input)
2 The rest (checks, translations)
1 Introduction to scanner/parser generators
2 Lex/Flex
2 Yacc/Bison
1 Introduction to Flex
2 Scanning keywords
2 Scanning integers: yytext
2 Scanning identifiers: yytext + yyleng + yylval
2 Identifier vs. keyword: the "meta rules (longest match, first rule)
2 The case of white spaces
2 A catch-all rule to avoid the default rule (ECHO)
2 Comments are white spaces, so skip them similarly
2 Of course, *it is not possible*
2 Using start conditions to handle the nested comments
2 Using start conditions to handle escapes in strings
1 Introduction to Bison
2 Writing a simple grammar without actions
2 Writing calc: =%left= and so forth
2 Dealing with ambiguous grammars: learn to read Bison's report
2 Writing actions
1 Interface Flex & Bison
2 Write a flex scanner for arithmetics
2 Write a bison parser for arithmetics
2 Put pieces of calculator to work
1 Error recovery


---+ Week 7: 2012-11-26: In depth use of Flex/Bison

  1 =errok= in error recovery
1 Location Tracking (=%locations=)
1 Making a pure parser
1 =%pure-parser=
1 =%parse-param=
1 Adjusting the scanner: =YY_DECL= & pointers
1 Fixing the Dangling else Shift/Reduce:
<verbatim>
cat >1-if.y <<-EOF
%%
exp:
"if" exp "then" exp
| "if" exp "then" exp "else" exp
| "exp"
;
%%
EOF
</verbatim>
1 Fixing a split operator Shift/Reduce
<verbatim>
cat >2-split.y <<-EOF
%%
exp:
exp "(" ")" exp
| "exp"
;
%%
EOF
</verbatim>
1 Understanding the mysterious Reduce/Reduce
<verbatim>
cat >3-promac.y <<-EOF
%%
sentence:
"proc"  argument '.'  | "proc"  parameter ';'
| "macro" argument ';'  | "macro" parameter '.'   ;
argument: "id";
parameter: "id";
%%
EOF
</verbatim>


---+ 2012-11-29: First two hours of CCMP: Deterministic languages, GLR

     1 Fixing Reduce/Reduce problems: lvalues
<verbatim>
cat >5-lvalues.y <<-EOF
%%
exp:
typeid "[" exp "]" "of" exp
| lvalue
;
lvalue:
"id"
| lvalue "[" exp "]"
;
typeid:
"id"
;
%%
EOF
</verbatim>

  1 An Example of Grammar that is not LR(1)
2 G -> G R
2 G -> R
2 R -> id ':' D
2 D -> L id
2 D -> epsilon
1 Writing a few words, trying to "parse" the words by hand
1 Where can the decision be taken?
1 LR(2)
1 What is this grammar about?
hence an LR(k) language is LR(1)
`