Courses/ThlLog2016

The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

This page contains the log of the topics that of the [[THL-Course][THL - Language Theory]] for class EPITA 2016 (i.e., from Oct 2016 to Nov 2016). A majority of the students have already followed [[THLR-Course][THLR - Rational Language Theory]]. This course is a preamble to

  * [[TYLA-Course][TYLA - Concepts of Programming Languages]]
  * [[CMP1-Course][CMP1 - Compiler Construction 1]]
  * [[CMP2-Course][CMP2 - Compiler Construction 2]].

Below, the star symbol (*) designates topics that were not covered for lack of time.

Courses: %TOC%

---+ 1&2: 2010-09-30: Introduction (30min), Languages and Automata (2h30)

  1 Presentation of the Lectures
    (THL-Course, CMP1-Course, TYLA-Course, CMP2-Course)
  1 Why studying languages
     2 Computers
        3 Compilers/Interpreters
        3 Formal languages (XML, <nop>LaTeX, SQL, etc.)
        3 Intrusion Detection Systems
     2 Biotechs
        3 Fuzzy matching
     2 Natural Language Processing
  1 Bases
     2 Alphabet
     2 Word
     2 Language
  1 Examples
     2 a^n
     2 a^n b^n
     2 binary numbers
     2 prime binary numbers
     2 C programs
     2 French
  1 Infinite words
     2 traces
     2 Omega-words
  1 Concatenation
     2 Concatenation
     2 Epsilon
     2 Noncommutativity (*)
        3 xy = yx => \exists u: x = u^i, y = u^j
     2 Free Monoid
     2 Power
     2 Length
  1 Relations (*)
     2 Prefix, Suffix, Factor, Subword
  1 Distance between words (*)
     2 Longest prefix length
     2 Suffix, Factor, Subword
     2 Editing distance
  1 Order on words (*)
     2 Prefix, Suffix etc.
     2 Total order: lexicographic
     2 Total well founded order: alphabetic (aka radical, military)
  1 Operations on Languages
     2 Union
     2 Intersection
     2 Complementation
     2 Concatenation
     2 Pref, Suf, Fac, Sub  (1)
     2 Quot (L_{/w} = w^-1 L = {v \in \sigma^\star : wv \in L})  (*)
     2 Power
     2 Kleene
  1 Defining Languages thanks to Set Operations: Rat
  1 Examples
     2 a^n
     2 binary numbers
     2 floats (*)
     2 a^n b^n ?
  1 Rational Expressions
  1 Real World Rational Expressions(*)
     2 Syntactic Sugar
     2 Syntaxes (*grep, *lex, AWK, Perl, Emacs etc.) (*)
     2 Toying with Grep (*)
     2 A <nop>RegExp for Pascal {} comments (*)
     2 A <nop>RegExp for C strings (*)
     2 A <nop>RegExp for C comments (*)
     2 A <nop>RegExp for nested {} comments (*)
     2 Changing Massively Identifiers thanks to Perl (*)
  1 Matching <nop>RegExps: Naive Automata
     2 Thompson
     2 Epsilon-NFA
  1 Epsilon transitions
     2 Backward closure
     2 Elimination
     2 Nondeterministic Automata

---+ 3: 2008-10-18: Deterministic Automata, Rational Languages (1h30)

  1 Understanding Nondeterministic Computations
     2 Perfect Fork
     2 The Oracle
     2 <nop>BackTracking: exponential time
  1 Determinization
     2 Eliminating nondeterminism
     2 Deterministic Automata
     2 Exponential Space vs Linear Time
  1 Trimmed Automaton
     2 Reachability
     2 Usefulness
  1 Interpreting the states of a DFA for (a|n)*a(a|b)(a|b)
     2 Can an automaton understand a^nb^n?
     2 The Pumping Lemma
     2 !RegExps are not enough
  1 Computing the language of an automaton (*)
  1 Rational is Recognizable
  1 Rational Languages Stability (*)
     1 transposition (reversing a language) (*)
     1 intersection (*)
     1 complementation, importance of determinism and completion (*)
     1 Prefix, Suffix, and Factor (*)
     1 Subwords(*)
  1 Minimization (*)
     2 Distinguishable states
 (*)
  One can reach an accepting state on some string, but not the
  other.  Accepting states are not required to be equal.
     2 Indistinguishable, equivalent states (*)
     2 Minimizing a DFA. (*)
  1 Rational Languages Tests (*)
     2 Emptiness test
     2 Membership test
     2 Equivalence test (using the state equivalence table)

---+ 4: 2008-10-14: Grammars, Chomsky's Hierarchy (1h30)

  1 Is there a perfect system? (*)
     2 Cardinality of languages
     2 Cardinality of the set of languages
     2 A Language Specification System is a language
     2 Cardinality mismatch
  1 Naive Rewriting Systems (Semi Thue Systems) (*)
  1 Naive Grammars for Small Natural Language Subset
  1 Terminal, Nonterminal
  1 A Grammar to Generate "a^n b^n c^n"
     2 A -> aABC
     2 A -> abC
     2 CB -> BC
     2 bB -> bb
     2 bC -> bc
     2 cC -> cc
  1 Type 0 Generative Grammars
  1 Recursively Enumerable
     2 If a language is type 0, then it is recursively enumerable (*)
     2 If a language is recursively enumerable, then it is type 0
       (not proved) (*)
  1 Derivation, "Derivation Graph", Sentential Form, Protosentences
  1 A Grammar Generating 'Dick, Ceriel, and Noam' Sentences
     2 Sentence -> Name | List End
     2 Name -> ceriel | dick | noam
     2 List -> Name | Name ',' List
     2 ',' Name End -> and Name
  1 Type 1 Monotonic Grammars
     2 Sentence -> Name | List
     2 Name -> ceriel | dick | noam
     2 List -> <nop>EndName | Name ',' List
     2 ',' <nop>EndName -> and Name

---+ 5: 2012-10-22: Grammars, Chomsky's Hierarchy (2h)

  1 If a language is monotonous, it is recursive (*)
  1 The converse does not hold
  1 Type 1 Context Sensitive Grammars
     2 Sentence -> Name | List
     2 Name -> ceriel | dick | noam
     2 List -> <nop>EndName | Name Comma List
     2 Comma <nop>EndName -> and <nop>EndName
     2 and <nop>EndName -> and Name
     2 Comma -> ','
  1 Back to "a^n b^n c^n"
     2 S -> abc
     2 S -> aSQ
     2 bQc -> bbcc
     2 cQ -> Qc
  1 Reback to "a^n b^n c^n"
     2 S -> abC
     2 S -> aSQ
     2 bQC -> bbCC
     2 CQ -> CX
     2 CX -> QX
     2 QX -> QC
     2 C -> c
  1 A context sensitive grammar is Monotonous
  1 A monotonous grammar can be transformed into context sensitive
  1 Type 2 Context Free Grammars
     2 Sentence -> Name | List and Name
     2 Name -> ceriel | dick | noam
     2 List -> Name ',' List | Name
  1 Type 3 Rational Grammars
     2 Sentence -> ceriel | dick | noam | List
     2 List -> ceriel <nop>ListTail | dick <nop>ListTail | noam <nop>ListTail
     2 <nop>ListTail -> ',' List | and ceriel | and dick | and noam
  1 Linear is Rational
  1 Type 4 Finite Choice Grammars
  1 Chomsky Hierarchy
  1 (Weakly) Equivalent Grammars

---+ 6: 2012-10-22: Grammars, Chomsky's Hierarchy (2h)

  1 Computation Models (FSM, PDA, LBTM, TM)
  1 Computation Power (P, NP)
  1 Interpreting the Various Types as Levels of Correctness
     2 Lexically Correct
     2 Syntactically Correct
     2 "Semantically" Correct
     2 Terminating Programs
     2 Correct Programs (i.e., that solve the given problem)
  1 Case of the Epsilon-productions
     2 Type 1 + A -> epsilon = Type 0
     2 Type 2 + A -> epsilon = Type 2
     2 Type 3 + A -> epsilon = Type 3

---+ 5: 2008-11-02: Context Free Grammars (1h30)

  1 Context Free Grammars
     2 S -> a S b | epsilon
  1 Backus Naur Form (*)
  1 Extended Backus Naur Form (*)
  1 Tiger Grammar (*)
  1 Arithmetics I
     2 S -> S + S
     2 S -> n
  1 Rightmost Derivation
  1 Derivation Trees
  1 Ambiguities
  1 How to fix ambiguities
  1 Arithmetics II
     2 S -> S + S
     2 S -> S * S
     2 s -> n
  1 Disambiguation
  1 Completion: unary minus, and parentheses
     2 E -> E + T | T
     2 T -> T * F | F
     2 F -> n | (E) | -F
  1 The dangling else (*)
  1 Hygiene (*)
     2 Undefined Nonterminals
     2 Unused Nonterminals
     2 Nonproductive Nonterminals
     2 Useless Nonterminals
     2 Loops

---+ 6: 2008-10-27: Parsing: LL (2h)

  1 Writing a simple parser for shell
     2 inst -> 'if' expr 'then' inst 'fi'
     2 inst -> 'print' 'foo'
     2 expr -> true | false
  1 Looking for a criterion, the first token of rules
  1 Making it more complex
     2 inst -> 'if' expr 'then' inst 'fi'
     2 inst -> 'print' 'foo'
     2 expr -> cond
     2 cond -> true | false
  1 Looking for a criterion, FIRST of an LHS
  1 Error Recovery
  1 Making it worse
     2 inst -> 'if' expr 'then' inst 'fi'
     2 inst -> 'print' 'foo'
     2 expr -> neg cond
     2 neg -> 'not' neg | epsilon
     2 cond -> true | false
  1 Looking for a criterion: FIRST, FOLLOW, and NULLABLE
  1 Computing FIRST, FOLLOW and NULLABLE for the above grammar
  1 Writing the (strong)-LL(1) table
     2 For all A -> alpha
       |     | *First (alpha)* |
       | *A* | A -> alpha      |
     2 For all A -> alpha st. NULLABLE(alpha)
       |     | *Follow (A)* |
       | *A* | A -> alpha   |
  1 Applying the theory to
     2 grammar
       3  Z -> X Y Z
       3  Z -> d
       3  Y -> c
       3  Y -> epsilon
       3  X -> Y
       3  X -> a
     2 Tables
       Nullable = { X, Y }
       | Symbol | Nullable | First | Follow |
       |   Z    |  N       | acd   | $      |
       |   Y    |  Y       | c     | acd    |
       |   X    |  Y       | ac    | acd    |
     2 Parser
       |     |  a    |  c    | d     |
       | S   |  0/   |  0/   | 0/    |
       | Z   |  1/   |  1/   | 1&2/  |
       | Y   |   /4  |  3/4  |    /4 |
       | X   |  6/5  |  5/5  |    /5 |

---+ 7: 2012-11-15: Parsing: LL(1) (2h)

  1 Applying the LL algorithms onto arithmetics
     2 Left Recursion
     2 Left Factorization
     2 Write the table
     2 Insist on the fact that there is no problem with unary/binary -/+.
  1 Critique the parser
     2 Associativities are wrong, repairing them is intricate.
     2 Show the corresponding C code
     2 Introducing Rational Expression in Grammars: EBNF
     2 Writing the LL parser for the arithmetics
     2 Picturing CFG as collections of automata (*)
     2 Where is the hidden memory? (automata => rational,
       but there are `(' `)'). (*)
  1 Tabular vs. Code (*)
     2 Register Pressure (no longer very true)
     2 Optimizations
     2 Prefetching
     2 Swich vs. Function Pointer table
     2 Virtual Tables vs. Switch (Eiffel vs. C++) (*)

  1 Understanding how LL works: decisions are taken at the
    *beginning* of the rule.
  1 Trying to imagine a means to overcome this limitation
  1 Recognizing balanced expressions
     2 S -> '(' S ')'
     2 S -> 'n'
  1 Isolating two main actions: shifting/reducing.
  1 Adding $, end of input
  1 Characterizing the reducible stacks
  1 Finding Rational Expressions Characterizing these Stacks
  1 Building the automaton directly, without rational expressions

---+ 8: 2012-11-22: Parsing: LR(0), SLR(1), LR(1)

  1 A more difficult grammar
     2 E -> E '-' 'n' | 'n'
     2 Don't forget the S -> E $ rule.
  1 Another
     2 E -> 'n' '-' E | 'n'
     2 Don't forget the S -> E $ rule.
  1 Comparing the two automata, computed directly
     2 The first one is a tree
     2 The second has a cycle
     2 How should that be interpreted (finite stack for the first one).
     2 Hence LR seems to prefer left assoc.
  1 Writing the automata as tables
     2 The second automaton has a conflict
     2 Introducing the concept of Look-ahead: SLR(1)
     2 Back to left rec. vs. right rec.: (S)LR does the two,
       but *prefers* left.
     2 (recursive) LL *cannot* do left assoc.
  1 Going Even Further: The "R/L-value" grammar
     2 S -> V = E
     2 S -> E
     2 E -> V
     2 V -> id
     2 V -> * E
  1 SLR(1): the "R/L-value" grammar above does not work, why?
  1 LR(1) grammars
     2 Using look-ahead for real
     2 LR(1) items, computing LR(1) automatons
     2 Compute the LR(1) automaton for the "R/L-value" grammar
  1 Some sizes of LR(1) automata
     2 arithmetics.
       A small grammar but unambiguous.
       There are 5 terminals, 3 non terminals, and 6 rules.
     2 medium.
       A medium size grammar comprising 41 terminals,
       38 non terminals, and 93 rules.
     2 programming.
       A real size grammar of a programming language,
       composed of 82 terminals, 68 non terminals, and 234 rules.
       |           |*LR(1)*| *LR(1)* |*LALR(1)*|*LALR(1)*|*DRLR(1)*|*DRLR(1)*|
       |*Grammar*  |*St.*  |*Entries*| *St.*   |*Entries*| *St.*   |*Entries*|
       |arithmetics|22     | 71      | 12      | 26      | 8       | 23      |
       |medium     |773    | 6,874   | 173     | 520     | 118     | 781     |
       |programming|1,000+ | 15,000+ | 439     | 3,155   | 270     | 3,145   |

  1 LR(1) automata too big: LALR(1)
     2 Compute the LR(1) automaton for the "R/L-value"
     2 Merging states
     2 Reduce the LR(1) automaton for the "R/L-value" grammar to LALR(1)
     2 In practice: do not compute the LR(1) automaton
  1 Ambiguous arithmetics:
     2 E -> E + E
     2 E -> E * E
     2 E -> n
  1 Compute the automaton
  1 Yes, of course it's ambiguous
  1 Fill the table
  1 Using the precedence
  1 Using `%left', `%right', `%nonassoc' for associativities
  1 This automaton is better
     2 Nicer grammar
     2 Smaller automaton
     2 Fewer transitions

---+ 11: 2012-11-26: Yacc/Bison, Lex/Flex

  1 Architecture of a Compiler
     2 Scanner (char stream -> token stream)
     2 Parser (token stream -> synthetic representation of the input)
     2 The rest (checks, translations)
  1 Introduction to scanner/parser generators
     2 Lex/Flex
     2 Yacc/Bison
  1 Introduction to Flex
     2 Scanning keywords
     2 Scanning integers: yytext
     2 Scanning identifiers: yytext + yyleng + yylval
     2 Identifier vs. keyword: the "meta rules (longest match, first rule)
     2 The case of white spaces
     2 A catch-all rule to avoid the default rule (ECHO)
     2 Comments are white spaces, so skip them similarly
     2 What about nested comments: write a more powerful regexp.
     2 Of course, *it is not possible*
     2 Using start conditions to handle the nested comments
     2 Using start conditions to handle escapes in strings
  1 Introduction to Bison
     2 Writing a simple grammar without actions
     2 Writing calc: =%left= and so forth
     2 Dealing with ambiguous grammars: learn to read Bison's report
     2 Writing actions
  1 Interface Flex & Bison
     2 Write a flex scanner for arithmetics
     2 Write a bison parser for arithmetics
     2 Put pieces of calculator to work
  1 Error recovery

---+ Week 7: 2012-11-26: In depth use of Flex/Bison

  1 =errok= in error recovery
  1 Location Tracking (=%locations=)
  1 Making a pure parser
     1 =%pure-parser=
     1 =%parse-param=
     1 Adjusting the scanner: =YY_DECL= & pointers
  1 Addressing Grammar Conflicts
     1 Fixing the Dangling else Shift/Reduce:
     <verbatim>
     cat >1-if.y <<-EOF
     %%
     exp:
       "if" exp "then" exp
     | "if" exp "then" exp "else" exp
     | "exp"
     ;
     %%
     EOF
     </verbatim>
     1 Fixing a split operator Shift/Reduce
     <verbatim>
     cat >2-split.y <<-EOF
     %%
     exp:
       exp "(" ")" exp
     | "exp"
     ;
     %%
     EOF
     </verbatim>
     1 Understanding the mysterious Reduce/Reduce
     <verbatim>
     cat >3-promac.y <<-EOF
     %%
     sentence:
       "proc"  argument '.'  | "proc"  parameter ';'
     | "macro" argument ';'  | "macro" parameter '.'   ;
     argument: "id";
     parameter: "id";
     %%
     EOF
     </verbatim>

---+ 2012-11-29: First two hours of CCMP: Deterministic languages, GLR

     1 Fixing Reduce/Reduce problems: lvalues
     <verbatim>
     cat >5-lvalues.y <<-EOF
     %%
     exp:
       typeid "[" exp "]" "of" exp
     | lvalue
     ;
     lvalue:
       "id"
     | lvalue "[" exp "]"
     ;
     typeid:
       "id"
     ;
     %%
     EOF
     </verbatim>

  1 An Example of Grammar that is not LR(1)
     2 G -> G R
     2 G -> R
     2 R -> id ':' D
     2 D -> L id
     2 D -> epsilon
  1 Writing a few words, trying to "parse" the words by hand
  1 Where can the decision be taken?
  1 LR(2)
  1 What is this grammar about?
  1 The Yacc paradox
  1 How to solve it?
  1 This technique (bi-word) scales up (tri-words, k-words)
    ``hence an LR(k) language is LR(1)
  1 Of course, not true for grammars
  1 Hierarchy of parsing techniques
  1 Hierarchy of language difficulties
  1 GLR
  1 Presentation of Tiger Project
     1 Making groups ([[1][Groups]])
     1 Choosing the right book ([[2][Modern-Compiler-Implementation]])
     1 Assignments and the web site (http://tiger.lrde.epita.fr)

Courses/ThlLog2016

From LRDE