# Courses/ThlLog2016

### From LRDE

This page contains the log of the topics that of the [[THL-Course][THL - Language Theory]] for class EPITA 2016 (i.e., from Oct 2016 to Nov 2016). A majority of the students have already followed [[THLR-Course][THLR - Rational Language Theory]]. This course is a preamble to

* [[TYLA-Course][TYLA - Concepts of Programming Languages]] * [[CMP1-Course][CMP1 - Compiler Construction 1]] * [[CMP2-Course][CMP2 - Compiler Construction 2]].

Below, the star symbol (*) designates topics that were not covered for lack of time.

Courses: %TOC%

---+ 1&2: 2010-09-30: Introduction (30min), Languages and Automata (2h30)

1 Presentation of the Lectures (THL-Course, CMP1-Course, TYLA-Course, CMP2-Course) 1 Why studying languages 2 Computers 3 Compilers/Interpreters 3 Formal languages (XML, <nop>LaTeX, SQL, etc.) 3 Intrusion Detection Systems 2 Biotechs 3 Fuzzy matching 2 Natural Language Processing 1 Bases 2 Alphabet 2 Word 2 Language 1 Examples 2 a^n 2 a^n b^n 2 binary numbers 2 prime binary numbers 2 C programs 2 French 1 Infinite words 2 traces 2 Omega-words 1 Concatenation 2 Concatenation 2 Epsilon 2 Noncommutativity (*) 3 xy = yx => \exists u: x = u^i, y = u^j 2 Free Monoid 2 Power 2 Length 1 Relations (*) 2 Prefix, Suffix, Factor, Subword 1 Distance between words (*) 2 Longest prefix length 2 Suffix, Factor, Subword 2 Editing distance 1 Order on words (*) 2 Prefix, Suffix etc. 2 Total order: lexicographic 2 Total well founded order: alphabetic (aka radical, military) 1 Operations on Languages 2 Union 2 Intersection 2 Complementation 2 Concatenation 2 Pref, Suf, Fac, Sub (1) 2 Quot (L_{/w} = w^-1 L = {v \in \sigma^\star : wv \in L}) (*) 2 Power 2 Kleene 1 Defining Languages thanks to Set Operations: Rat 1 Examples 2 a^n 2 binary numbers 2 floats (*) 2 a^n b^n ? 1 Rational Expressions 1 Real World Rational Expressions(*) 2 Syntactic Sugar 2 Syntaxes (*grep, *lex, AWK, Perl, Emacs etc.) (*) 2 Toying with Grep (*) 2 A <nop>RegExp for Pascal {} comments (*) 2 A <nop>RegExp for C strings (*) 2 A <nop>RegExp for C comments (*) 2 A <nop>RegExp for nested {} comments (*) 2 Changing Massively Identifiers thanks to Perl (*) 1 Matching <nop>RegExps: Naive Automata 2 Thompson 2 Epsilon-NFA 1 Epsilon transitions 2 Backward closure 2 Elimination 2 Nondeterministic Automata

---+ 3: 2008-10-18: Deterministic Automata, Rational Languages (1h30)

1 Understanding Nondeterministic Computations 2 Perfect Fork 2 The Oracle 2 <nop>BackTracking: exponential time 1 Determinization 2 Eliminating nondeterminism 2 Deterministic Automata 2 Exponential Space vs Linear Time 1 Trimmed Automaton 2 Reachability 2 Usefulness 1 Interpreting the states of a DFA for (a|n)*a(a|b)(a|b) 2 Can an automaton understand a^nb^n? 2 The Pumping Lemma 2 !RegExps are not enough 1 Computing the language of an automaton (*) 1 Rational is Recognizable 1 Rational Languages Stability (*) 1 transposition (reversing a language) (*) 1 intersection (*) 1 complementation, importance of determinism and completion (*) 1 Prefix, Suffix, and Factor (*) 1 Subwords(*) 1 Minimization (*) 2 Distinguishable states

(*) One can reach an accepting state on some string, but not the other. Accepting states are not required to be equal. 2 Indistinguishable, equivalent states (*) 2 Minimizing a DFA. (*) 1 Rational Languages Tests (*) 2 Emptiness test 2 Membership test 2 Equivalence test (using the state equivalence table)

---+ 4: 2008-10-14: Grammars, Chomsky's Hierarchy (1h30)

1 Is there a perfect system? (*) 2 Cardinality of languages 2 Cardinality of the set of languages 2 A Language Specification System is a language 2 Cardinality mismatch 1 Naive Rewriting Systems (Semi Thue Systems) (*) 1 Naive Grammars for Small Natural Language Subset 1 Terminal, Nonterminal 1 A Grammar to Generate "a^n b^n c^n" 2 A -> aABC 2 A -> abC 2 CB -> BC 2 bB -> bb 2 bC -> bc 2 cC -> cc 1 Type 0 Generative Grammars 1 Recursively Enumerable 2 If a language is type 0, then it is recursively enumerable (*) 2 If a language is recursively enumerable, then it is type 0 (not proved) (*) 1 Derivation, "Derivation Graph", Sentential Form, Protosentences 1 A Grammar Generating 'Dick, Ceriel, and Noam' Sentences 2 Sentence -> Name | List End 2 Name -> ceriel | dick | noam 2 List -> Name | Name ',' List 2 ',' Name End -> and Name 1 Type 1 Monotonic Grammars 2 Sentence -> Name | List 2 Name -> ceriel | dick | noam 2 List -> <nop>EndName | Name ',' List 2 ',' <nop>EndName -> and Name

---+ 5: 2012-10-22: Grammars, Chomsky's Hierarchy (2h)

1 If a language is monotonous, it is recursive (*) 1 The converse does not hold 1 Type 1 Context Sensitive Grammars 2 Sentence -> Name | List 2 Name -> ceriel | dick | noam 2 List -> <nop>EndName | Name Comma List 2 Comma <nop>EndName -> and <nop>EndName 2 and <nop>EndName -> and Name 2 Comma -> ',' 1 Back to "a^n b^n c^n" 2 S -> abc 2 S -> aSQ 2 bQc -> bbcc 2 cQ -> Qc 1 Reback to "a^n b^n c^n" 2 S -> abC 2 S -> aSQ 2 bQC -> bbCC 2 CQ -> CX 2 CX -> QX 2 QX -> QC 2 C -> c 1 A context sensitive grammar is Monotonous 1 A monotonous grammar can be transformed into context sensitive 1 Type 2 Context Free Grammars 2 Sentence -> Name | List and Name 2 Name -> ceriel | dick | noam 2 List -> Name ',' List | Name 1 Type 3 Rational Grammars 2 Sentence -> ceriel | dick | noam | List 2 List -> ceriel <nop>ListTail | dick <nop>ListTail | noam <nop>ListTail 2 <nop>ListTail -> ',' List | and ceriel | and dick | and noam 1 Linear is Rational 1 Type 4 Finite Choice Grammars 1 Chomsky Hierarchy 1 (Weakly) Equivalent Grammars

---+ 6: 2012-10-22: Grammars, Chomsky's Hierarchy (2h)

1 Computation Models (FSM, PDA, LBTM, TM) 1 Computation Power (P, NP) 1 Interpreting the Various Types as Levels of Correctness 2 Lexically Correct 2 Syntactically Correct 2 "Semantically" Correct 2 Terminating Programs 2 Correct Programs (i.e., that solve the given problem) 1 Case of the Epsilon-productions 2 Type 1 + A -> epsilon = Type 0 2 Type 2 + A -> epsilon = Type 2 2 Type 3 + A -> epsilon = Type 3

---+ 5: 2008-11-02: Context Free Grammars (1h30)

1 Context Free Grammars 2 S -> a S b | epsilon 1 Backus Naur Form (*) 1 Extended Backus Naur Form (*) 1 Tiger Grammar (*) 1 Arithmetics I 2 S -> S + S 2 S -> n 1 Rightmost Derivation 1 Derivation Trees 1 Ambiguities 1 How to fix ambiguities 1 Arithmetics II 2 S -> S + S 2 S -> S * S 2 s -> n 1 Disambiguation 1 Completion: unary minus, and parentheses 2 E -> E + T | T 2 T -> T * F | F 2 F -> n | (E) | -F 1 The dangling else (*) 1 Hygiene (*) 2 Undefined Nonterminals 2 Unused Nonterminals 2 Nonproductive Nonterminals 2 Useless Nonterminals 2 Loops

---+ 6: 2008-10-27: Parsing: LL (2h)

1 Writing a simple parser for shell 2 inst -> 'if' expr 'then' inst 'fi' 2 inst -> 'print' 'foo' 2 expr -> true | false 1 Looking for a criterion, the first token of rules 1 Making it more complex 2 inst -> 'if' expr 'then' inst 'fi' 2 inst -> 'print' 'foo' 2 expr -> cond 2 cond -> true | false 1 Looking for a criterion, FIRST of an LHS 1 Error Recovery 1 Making it worse 2 inst -> 'if' expr 'then' inst 'fi' 2 inst -> 'print' 'foo' 2 expr -> neg cond 2 neg -> 'not' neg | epsilon 2 cond -> true | false 1 Looking for a criterion: FIRST, FOLLOW, and NULLABLE 1 Computing FIRST, FOLLOW and NULLABLE for the above grammar 1 Writing the (strong)-LL(1) table 2 For all A -> alpha | | *First (alpha)* | | *A* | A -> alpha | 2 For all A -> alpha st. NULLABLE(alpha) | | *Follow (A)* | | *A* | A -> alpha | 1 Applying the theory to 2 grammar 3 Z -> X Y Z 3 Z -> d 3 Y -> c 3 Y -> epsilon 3 X -> Y 3 X -> a 2 Tables Nullable = { X, Y } | Symbol | Nullable | First | Follow | | Z | N | acd | $ | | Y | Y | c | acd | | X | Y | ac | acd | 2 Parser | | a | c | d | | S | 0/ | 0/ | 0/ | | Z | 1/ | 1/ | 1&2/ | | Y | /4 | 3/4 | /4 | | X | 6/5 | 5/5 | /5 |

---+ 7: 2012-11-15: Parsing: LL(1) (2h)

1 Applying the LL algorithms onto arithmetics 2 Left Recursion 2 Left Factorization 2 Write the table 2 Insist on the fact that there is no problem with unary/binary -/+. 1 Critique the parser 2 Associativities are wrong, repairing them is intricate. 2 Show the corresponding C code 2 Introducing Rational Expression in Grammars: EBNF 2 Writing the LL parser for the arithmetics 2 Picturing CFG as collections of automata (*) 2 Where is the hidden memory? (automata => rational, but there are `(' `)'). (*) 1 Tabular vs. Code (*) 2 Register Pressure (no longer very true) 2 Optimizations 2 Prefetching 2 Swich vs. Function Pointer table 2 Virtual Tables vs. Switch (Eiffel vs. C++) (*)

1 Understanding how LL works: decisions are taken at the *beginning* of the rule. 1 Trying to imagine a means to overcome this limitation 1 Recognizing balanced expressions 2 S -> '(' S ')' 2 S -> 'n' 1 Isolating two main actions: shifting/reducing. 1 Adding $, end of input 1 Characterizing the reducible stacks 1 Finding Rational Expressions Characterizing these Stacks 1 Building the automaton directly, without rational expressions

---+ 8: 2012-11-22: Parsing: LR(0), SLR(1), LR(1)

1 A more difficult grammar 2 E -> E '-' 'n' | 'n' 2 Don't forget the S -> E $ rule. 1 Another 2 E -> 'n' '-' E | 'n' 2 Don't forget the S -> E $ rule. 1 Comparing the two automata, computed directly 2 The first one is a tree 2 The second has a cycle 2 How should that be interpreted (finite stack for the first one). 2 Hence LR seems to prefer left assoc. 1 Writing the automata as tables 2 The second automaton has a conflict 2 Introducing the concept of Look-ahead: SLR(1) 2 Back to left rec. vs. right rec.: (S)LR does the two, but *prefers* left. 2 (recursive) LL *cannot* do left assoc. 1 Going Even Further: The "R/L-value" grammar 2 S -> V = E 2 S -> E 2 E -> V 2 V -> id 2 V -> * E 1 SLR(1): the "R/L-value" grammar above does not work, why? 1 LR(1) grammars 2 Using look-ahead for real 2 LR(1) items, computing LR(1) automatons 2 Compute the LR(1) automaton for the "R/L-value" grammar 1 Some sizes of LR(1) automata 2 arithmetics. A small grammar but unambiguous. There are 5 terminals, 3 non terminals, and 6 rules. 2 medium. A medium size grammar comprising 41 terminals, 38 non terminals, and 93 rules. 2 programming. A real size grammar of a programming language, composed of 82 terminals, 68 non terminals, and 234 rules. | |*LR(1)*| *LR(1)* |*LALR(1)*|*LALR(1)*|*DRLR(1)*|*DRLR(1)*| |*Grammar* |*St.* |*Entries*| *St.* |*Entries*| *St.* |*Entries*| |arithmetics|22 | 71 | 12 | 26 | 8 | 23 | |medium |773 | 6,874 | 173 | 520 | 118 | 781 | |programming|1,000+ | 15,000+ | 439 | 3,155 | 270 | 3,145 |

1 LR(1) automata too big: LALR(1) 2 Compute the LR(1) automaton for the "R/L-value" 2 Merging states 2 Reduce the LR(1) automaton for the "R/L-value" grammar to LALR(1) 2 In practice: do not compute the LR(1) automaton 1 Ambiguous arithmetics: 2 E -> E + E 2 E -> E * E 2 E -> n 1 Compute the automaton 1 Yes, of course it's ambiguous 1 Fill the table 1 Using the precedence 1 Using `%left', `%right', `%nonassoc' for associativities 1 This automaton is better 2 Nicer grammar 2 Smaller automaton 2 Fewer transitions

---+ 11: 2012-11-26: Yacc/Bison, Lex/Flex

1 Architecture of a Compiler 2 Scanner (char stream -> token stream) 2 Parser (token stream -> synthetic representation of the input) 2 The rest (checks, translations) 1 Introduction to scanner/parser generators 2 Lex/Flex 2 Yacc/Bison 1 Introduction to Flex 2 Scanning keywords 2 Scanning integers: yytext 2 Scanning identifiers: yytext + yyleng + yylval 2 Identifier vs. keyword: the "metarules (longest match, first rule)2 The case of white spaces 2 A catch-all rule to avoid the default rule (ECHO) 2 Comments are white spaces, so skip them similarly 2 What about nested comments: write a more powerful regexp. 2 Of course, *it is not possible* 2 Using start conditions to handle the nested comments 2 Using start conditions to handle escapes in strings 1 Introduction to Bison 2 Writing a simple grammar without actions 2 Writing calc: =%left= and so forth 2 Dealing with ambiguous grammars: learn to read Bison's report 2 Writing actions 1 Interface Flex & Bison 2 Write a flex scanner for arithmetics 2 Write a bison parser for arithmetics 2 Put pieces of calculator to work 1 Error recovery

---+ Week 7: 2012-11-26: In depth use of Flex/Bison

1 =errok= in error recovery 1 Location Tracking (=%locations=) 1 Making a pure parser 1 =%pure-parser= 1 =%parse-param= 1 Adjusting the scanner: =YY_DECL= & pointers 1 Addressing Grammar Conflicts 1 Fixing the Dangling else Shift/Reduce: <verbatim> cat >1-if.y <<-EOF %% exp: "if" exp "then" exp | "if" exp "then" exp "else" exp | "exp" ; %% EOF </verbatim> 1 Fixing a split operator Shift/Reduce <verbatim> cat >2-split.y <<-EOF %% exp: exp "(" ")" exp | "exp" ; %% EOF </verbatim> 1 Understanding the mysterious Reduce/Reduce <verbatim> cat >3-promac.y <<-EOF %% sentence: "proc" argument '.' | "proc" parameter ';' | "macro" argument ';' | "macro" parameter '.' ; argument: "id"; parameter: "id"; %% EOF </verbatim>

---+ 2012-11-29: First two hours of CCMP: Deterministic languages, GLR

1 Fixing Reduce/Reduce problems: lvalues <verbatim> cat >5-lvalues.y <<-EOF %% exp: typeid "[" exp "]" "of" exp | lvalue ; lvalue: "id" | lvalue "[" exp "]" ; typeid: "id" ; %% EOF </verbatim>

1 An Example of Grammar that is not LR(1) 2 G -> G R 2 G -> R 2 R -> id ':' D 2 D -> L id 2 D -> epsilon 1 Writing a few words, trying to "parse" the words by hand 1 Where can the decision be taken? 1 LR(2) 1 What is this grammar about? 1 The Yacc paradox 1 How to solve it? 1 This technique (bi-word) scales up (tri-words, k-words) ``hencean LR(k) language is LR(1)1 Of course, not true for grammars 1 Hierarchy of parsing techniques 1 Hierarchy of language difficulties 1 GLR 1 Presentation of Tiger Project 1 Making groups ([[1][Groups]]) 1 Choosing the right book ([[2][Modern-Compiler-Implementation]]) 1 Assignments and the web site (http://tiger.lrde.epita.fr)