Courses/ThlLog2016

From LRDE

This page contains the log of the topics that of the [[THL-Course][THL - Language Theory]] for class EPITA 2016 (i.e., from Oct 2016 to Nov 2016). A majority of the students have already followed [[THLR-Course][THLR - Rational Language Theory]]. This course is a preamble to

  * [[TYLA-Course][TYLA - Concepts of Programming Languages]]
  * [[CMP1-Course][CMP1 - Compiler Construction 1]]
  * [[CMP2-Course][CMP2 - Compiler Construction 2]].

Below, the star symbol (*) designates topics that were not covered for lack of time.

Courses: %TOC%

---+ 1&2: 2010-09-30: Introduction (30min), Languages and Automata (2h30)

  1 Presentation of the Lectures
    (THL-Course, CMP1-Course, TYLA-Course, CMP2-Course)
  1 Why studying languages
     2 Computers
        3 Compilers/Interpreters
        3 Formal languages (XML, <nop>LaTeX, SQL, etc.)
        3 Intrusion Detection Systems
     2 Biotechs
        3 Fuzzy matching
     2 Natural Language Processing
  1 Bases
     2 Alphabet
     2 Word
     2 Language
  1 Examples
     2 a^n
     2 a^n b^n
     2 binary numbers
     2 prime binary numbers
     2 C programs
     2 French
  1 Infinite words
     2 traces
     2 Omega-words
  1 Concatenation
     2 Concatenation
     2 Epsilon
     2 Noncommutativity (*)
        3 xy = yx => \exists u: x = u^i, y = u^j
     2 Free Monoid
     2 Power
     2 Length
  1 Relations (*)
     2 Prefix, Suffix, Factor, Subword
  1 Distance between words (*)
     2 Longest prefix length
     2 Suffix, Factor, Subword
     2 Editing distance
  1 Order on words (*)
     2 Prefix, Suffix etc.
     2 Total order: lexicographic
     2 Total well founded order: alphabetic (aka radical, military)
  1 Operations on Languages
     2 Union
     2 Intersection
     2 Complementation
     2 Concatenation
     2 Pref, Suf, Fac, Sub  (1)
     2 Quot (L_{/w} = w^-1 L = {v \in \sigma^\star : wv \in L})  (*)
     2 Power
     2 Kleene
  1 Defining Languages thanks to Set Operations: Rat
  1 Examples
     2 a^n
     2 binary numbers
     2 floats (*)
     2 a^n b^n ?
  1 Rational Expressions
  1 Real World Rational Expressions(*)
     2 Syntactic Sugar
     2 Syntaxes (*grep, *lex, AWK, Perl, Emacs etc.) (*)
     2 Toying with Grep (*)
     2 A <nop>RegExp for Pascal {} comments (*)
     2 A <nop>RegExp for C strings (*)
     2 A <nop>RegExp for C comments (*)
     2 A <nop>RegExp for nested {} comments (*)
     2 Changing Massively Identifiers thanks to Perl (*)
  1 Matching <nop>RegExps: Naive Automata
     2 Thompson
     2 Epsilon-NFA
  1 Epsilon transitions
     2 Backward closure
     2 Elimination
     2 Nondeterministic Automata

---+ 3: 2008-10-18: Deterministic Automata, Rational Languages (1h30)

  1 Understanding Nondeterministic Computations
     2 Perfect Fork
     2 The Oracle
     2 <nop>BackTracking: exponential time
  1 Determinization
     2 Eliminating nondeterminism
     2 Deterministic Automata
     2 Exponential Space vs Linear Time
  1 Trimmed Automaton
     2 Reachability
     2 Usefulness
  1 Interpreting the states of a DFA for (a|n)*a(a|b)(a|b)
     2 Can an automaton understand a^nb^n?
     2 The Pumping Lemma
     2 !RegExps are not enough
  1 Computing the language of an automaton (*)
  1 Rational is Recognizable
  1 Rational Languages Stability (*)
     1 transposition (reversing a language) (*)
     1 intersection (*)
     1 complementation, importance of determinism and completion (*)
     1 Prefix, Suffix, and Factor (*)
     1 Subwords(*)
  1 Minimization (*)
     2 Distinguishable states
(*) One can reach an accepting state on some string, but not the other. Accepting states are not required to be equal. 2 Indistinguishable, equivalent states (*) 2 Minimizing a DFA. (*) 1 Rational Languages Tests (*) 2 Emptiness test 2 Membership test 2 Equivalence test (using the state equivalence table)

---+ 4: 2008-10-14: Grammars, Chomsky's Hierarchy (1h30)

  1 Is there a perfect system? (*)
     2 Cardinality of languages
     2 Cardinality of the set of languages
     2 A Language Specification System is a language
     2 Cardinality mismatch
  1 Naive Rewriting Systems (Semi Thue Systems) (*)
  1 Naive Grammars for Small Natural Language Subset
  1 Terminal, Nonterminal
  1 A Grammar to Generate "a^n b^n c^n"
     2 A -> aABC
     2 A -> abC
     2 CB -> BC
     2 bB -> bb
     2 bC -> bc
     2 cC -> cc
  1 Type 0 Generative Grammars
  1 Recursively Enumerable
     2 If a language is type 0, then it is recursively enumerable (*)
     2 If a language is recursively enumerable, then it is type 0
       (not proved) (*)
  1 Derivation, "Derivation Graph", Sentential Form, Protosentences
  1 A Grammar Generating 'Dick, Ceriel, and Noam' Sentences
     2 Sentence -> Name | List End
     2 Name -> ceriel | dick | noam
     2 List -> Name | Name ',' List
     2 ',' Name End -> and Name
  1 Type 1 Monotonic Grammars
     2 Sentence -> Name | List
     2 Name -> ceriel | dick | noam
     2 List -> <nop>EndName | Name ',' List
     2 ',' <nop>EndName -> and Name

---+ 5: 2012-10-22: Grammars, Chomsky's Hierarchy (2h)

  1 If a language is monotonous, it is recursive (*)
  1 The converse does not hold
  1 Type 1 Context Sensitive Grammars
     2 Sentence -> Name | List
     2 Name -> ceriel | dick | noam
     2 List -> <nop>EndName | Name Comma List
     2 Comma <nop>EndName -> and <nop>EndName
     2 and <nop>EndName -> and Name
     2 Comma -> ','
  1 Back to "a^n b^n c^n"
     2 S -> abc
     2 S -> aSQ
     2 bQc -> bbcc
     2 cQ -> Qc
  1 Reback to "a^n b^n c^n"
     2 S -> abC
     2 S -> aSQ
     2 bQC -> bbCC
     2 CQ -> CX
     2 CX -> QX
     2 QX -> QC
     2 C -> c
  1 A context sensitive grammar is Monotonous
  1 A monotonous grammar can be transformed into context sensitive
  1 Type 2 Context Free Grammars
     2 Sentence -> Name | List and Name
     2 Name -> ceriel | dick | noam
     2 List -> Name ',' List | Name
  1 Type 3 Rational Grammars
     2 Sentence -> ceriel | dick | noam | List
     2 List -> ceriel <nop>ListTail | dick <nop>ListTail | noam <nop>ListTail
     2 <nop>ListTail -> ',' List | and ceriel | and dick | and noam
  1 Linear is Rational
  1 Type 4 Finite Choice Grammars
  1 Chomsky Hierarchy
  1 (Weakly) Equivalent Grammars

---+ 6: 2012-10-22: Grammars, Chomsky's Hierarchy (2h)

  1 Computation Models (FSM, PDA, LBTM, TM)
  1 Computation Power (P, NP)
  1 Interpreting the Various Types as Levels of Correctness
     2 Lexically Correct
     2 Syntactically Correct
     2 "Semantically" Correct
     2 Terminating Programs
     2 Correct Programs (i.e., that solve the given problem)
  1 Case of the Epsilon-productions
     2 Type 1 + A -> epsilon = Type 0
     2 Type 2 + A -> epsilon = Type 2
     2 Type 3 + A -> epsilon = Type 3

---+ 5: 2008-11-02: Context Free Grammars (1h30)

  1 Context Free Grammars
     2 S -> a S b | epsilon
  1 Backus Naur Form (*)
  1 Extended Backus Naur Form (*)
  1 Tiger Grammar (*)
  1 Arithmetics I
     2 S -> S + S
     2 S -> n
  1 Rightmost Derivation
  1 Derivation Trees
  1 Ambiguities
  1 How to fix ambiguities
  1 Arithmetics II
     2 S -> S + S
     2 S -> S * S
     2 s -> n
  1 Disambiguation
  1 Completion: unary minus, and parentheses
     2 E -> E + T | T
     2 T -> T * F | F
     2 F -> n | (E) | -F
  1 The dangling else (*)
  1 Hygiene (*)
     2 Undefined Nonterminals
     2 Unused Nonterminals
     2 Nonproductive Nonterminals
     2 Useless Nonterminals
     2 Loops

---+ 6: 2008-10-27: Parsing: LL (2h)

  1 Writing a simple parser for shell
     2 inst -> 'if' expr 'then' inst 'fi'
     2 inst -> 'print' 'foo'
     2 expr -> true | false
  1 Looking for a criterion, the first token of rules
  1 Making it more complex
     2 inst -> 'if' expr 'then' inst 'fi'
     2 inst -> 'print' 'foo'
     2 expr -> cond
     2 cond -> true | false
  1 Looking for a criterion, FIRST of an LHS
  1 Error Recovery
  1 Making it worse
     2 inst -> 'if' expr 'then' inst 'fi'
     2 inst -> 'print' 'foo'
     2 expr -> neg cond
     2 neg -> 'not' neg | epsilon
     2 cond -> true | false
  1 Looking for a criterion: FIRST, FOLLOW, and NULLABLE
  1 Computing FIRST, FOLLOW and NULLABLE for the above grammar
  1 Writing the (strong)-LL(1) table
     2 For all A -> alpha
       |     | *First (alpha)* |
       | *A* | A -> alpha      |
     2 For all A -> alpha st. NULLABLE(alpha)
       |     | *Follow (A)* |
       | *A* | A -> alpha   |
  1 Applying the theory to
     2 grammar
       3  Z -> X Y Z
       3  Z -> d
       3  Y -> c
       3  Y -> epsilon
       3  X -> Y
       3  X -> a
     2 Tables
       Nullable = { X, Y }
       | Symbol | Nullable | First | Follow |
       |   Z    |  N       | acd   | $      |
       |   Y    |  Y       | c     | acd    |
       |   X    |  Y       | ac    | acd    |
     2 Parser
       |     |  a    |  c    | d     |
       | S   |  0/   |  0/   | 0/    |
       | Z   |  1/   |  1/   | 1&2/  |
       | Y   |   /4  |  3/4  |    /4 |
       | X   |  6/5  |  5/5  |    /5 |

---+ 7: 2012-11-15: Parsing: LL(1) (2h)

  1 Applying the LL algorithms onto arithmetics
     2 Left Recursion
     2 Left Factorization
     2 Write the table
     2 Insist on the fact that there is no problem with unary/binary -/+.
  1 Critique the parser
     2 Associativities are wrong, repairing them is intricate.
     2 Show the corresponding C code
     2 Introducing Rational Expression in Grammars: EBNF
     2 Writing the LL parser for the arithmetics
     2 Picturing CFG as collections of automata (*)
     2 Where is the hidden memory? (automata => rational,
       but there are `(' `)'). (*)
  1 Tabular vs. Code (*)
     2 Register Pressure (no longer very true)
     2 Optimizations
     2 Prefetching
     2 Swich vs. Function Pointer table
     2 Virtual Tables vs. Switch (Eiffel vs. C++) (*)
  1 Understanding how LL works: decisions are taken at the
    *beginning* of the rule.
  1 Trying to imagine a means to overcome this limitation
  1 Recognizing balanced expressions
     2 S -> '(' S ')'
     2 S -> 'n'
  1 Isolating two main actions: shifting/reducing.
  1 Adding $, end of input
  1 Characterizing the reducible stacks
  1 Finding Rational Expressions Characterizing these Stacks
  1 Building the automaton directly, without rational expressions

---+ 8: 2012-11-22: Parsing: LR(0), SLR(1), LR(1)

  1 A more difficult grammar
     2 E -> E '-' 'n' | 'n'
     2 Don't forget the S -> E $ rule.
  1 Another
     2 E -> 'n' '-' E | 'n'
     2 Don't forget the S -> E $ rule.
  1 Comparing the two automata, computed directly
     2 The first one is a tree
     2 The second has a cycle
     2 How should that be interpreted (finite stack for the first one).
     2 Hence LR seems to prefer left assoc.
  1 Writing the automata as tables
     2 The second automaton has a conflict
     2 Introducing the concept of Look-ahead: SLR(1)
     2 Back to left rec. vs. right rec.: (S)LR does the two,
       but *prefers* left.
     2 (recursive) LL *cannot* do left assoc.
  1 Going Even Further: The "R/L-value" grammar
     2 S -> V = E
     2 S -> E
     2 E -> V
     2 V -> id
     2 V -> * E
  1 SLR(1): the "R/L-value" grammar above does not work, why?
  1 LR(1) grammars
     2 Using look-ahead for real
     2 LR(1) items, computing LR(1) automatons
     2 Compute the LR(1) automaton for the "R/L-value" grammar
  1 Some sizes of LR(1) automata
     2 arithmetics.
       A small grammar but unambiguous.
       There are 5 terminals, 3 non terminals, and 6 rules.
     2 medium.
       A medium size grammar comprising 41 terminals,
       38 non terminals, and 93 rules.
     2 programming.
       A real size grammar of a programming language,
       composed of 82 terminals, 68 non terminals, and 234 rules.
       |           |*LR(1)*| *LR(1)* |*LALR(1)*|*LALR(1)*|*DRLR(1)*|*DRLR(1)*|
       |*Grammar*  |*St.*  |*Entries*| *St.*   |*Entries*| *St.*   |*Entries*|
       |arithmetics|22     | 71      | 12      | 26      | 8       | 23      |
       |medium     |773    | 6,874   | 173     | 520     | 118     | 781     |
       |programming|1,000+ | 15,000+ | 439     | 3,155   | 270     | 3,145   |
  1 LR(1) automata too big: LALR(1)
     2 Compute the LR(1) automaton for the "R/L-value"
     2 Merging states
     2 Reduce the LR(1) automaton for the "R/L-value" grammar to LALR(1)
     2 In practice: do not compute the LR(1) automaton
  1 Ambiguous arithmetics:
     2 E -> E + E
     2 E -> E * E
     2 E -> n
  1 Compute the automaton
  1 Yes, of course it's ambiguous
  1 Fill the table
  1 Using the precedence
  1 Using `%left', `%right', `%nonassoc' for associativities
  1 This automaton is better
     2 Nicer grammar
     2 Smaller automaton
     2 Fewer transitions

---+ 11: 2012-11-26: Yacc/Bison, Lex/Flex

  1 Architecture of a Compiler
     2 Scanner (char stream -> token stream)
     2 Parser (token stream -> synthetic representation of the input)
     2 The rest (checks, translations)
  1 Introduction to scanner/parser generators
     2 Lex/Flex
     2 Yacc/Bison
  1 Introduction to Flex
     2 Scanning keywords
     2 Scanning integers: yytext
     2 Scanning identifiers: yytext + yyleng + yylval
     2 Identifier vs. keyword: the "meta rules (longest match, first rule)
     2 The case of white spaces
     2 A catch-all rule to avoid the default rule (ECHO)
     2 Comments are white spaces, so skip them similarly
     2 What about nested comments: write a more powerful regexp.
     2 Of course, *it is not possible*
     2 Using start conditions to handle the nested comments
     2 Using start conditions to handle escapes in strings
  1 Introduction to Bison
     2 Writing a simple grammar without actions
     2 Writing calc: =%left= and so forth
     2 Dealing with ambiguous grammars: learn to read Bison's report
     2 Writing actions
  1 Interface Flex & Bison
     2 Write a flex scanner for arithmetics
     2 Write a bison parser for arithmetics
     2 Put pieces of calculator to work
  1 Error recovery

---+ Week 7: 2012-11-26: In depth use of Flex/Bison

  1 =errok= in error recovery
  1 Location Tracking (=%locations=)
  1 Making a pure parser
     1 =%pure-parser=
     1 =%parse-param=
     1 Adjusting the scanner: =YY_DECL= & pointers
  1 Addressing Grammar Conflicts
     1 Fixing the Dangling else Shift/Reduce:
     <verbatim>
     cat >1-if.y <<-EOF
     %%
     exp:
       "if" exp "then" exp
     | "if" exp "then" exp "else" exp
     | "exp"
     ;
     %%
     EOF
     </verbatim>
     1 Fixing a split operator Shift/Reduce
     <verbatim>
     cat >2-split.y <<-EOF
     %%
     exp:
       exp "(" ")" exp
     | "exp"
     ;
     %%
     EOF
     </verbatim>
     1 Understanding the mysterious Reduce/Reduce
     <verbatim>
     cat >3-promac.y <<-EOF
     %%
     sentence:
       "proc"  argument '.'  | "proc"  parameter ';'
     | "macro" argument ';'  | "macro" parameter '.'   ;
     argument: "id";
     parameter: "id";
     %%
     EOF
     </verbatim>

---+ 2012-11-29: First two hours of CCMP: Deterministic languages, GLR

     1 Fixing Reduce/Reduce problems: lvalues
     <verbatim>
     cat >5-lvalues.y <<-EOF
     %%
     exp:
       typeid "[" exp "]" "of" exp
     | lvalue
     ;
     lvalue:
       "id"
     | lvalue "[" exp "]"
     ;
     typeid:
       "id"
     ;
     %%
     EOF
     </verbatim>
  1 An Example of Grammar that is not LR(1)
     2 G -> G R
     2 G -> R
     2 R -> id ':' D
     2 D -> L id
     2 D -> epsilon
  1 Writing a few words, trying to "parse" the words by hand
  1 Where can the decision be taken?
  1 LR(2)
  1 What is this grammar about?
  1 The Yacc paradox
  1 How to solve it?
  1 This technique (bi-word) scales up (tri-words, k-words)
    ``hence an LR(k) language is LR(1)
  1 Of course, not true for grammars
  1 Hierarchy of parsing techniques
  1 Hierarchy of language difficulties
  1 GLR
  1 Presentation of Tiger Project
     1 Making groups ([[1][Groups]])
     1 Choosing the right book ([[2][Modern-Compiler-Implementation]])
     1 Assignments and the web site (http://tiger.lrde.epita.fr)