*Nul n’est censé ignorer la loi.* Everything exposed in this document is expected to be known. The Tiger Compiler Project ************************** This document, revision of April 16, 2018, details the various tasks EPITA students must complete. It is available under various forms: − Assignments in a single HTML file(1). − Assignments in several HTML files(2). − Assignments in PDF(3). − Assignments in text(4). − Assignments in Info(5). The Tiger Compiler Project 1 Introduction 1.1 How to Read this Document 1.2 Why the Tiger Project 1.3 What the Tiger Project is not 1.4 History 1.4.1 Fair Criticism 1.4.2 Tiger 2002 1.4.3 Tiger 2003 1.4.4 Tiger 2004 1.4.5 Tiger 2005 1.4.6 Tiger 2006 1.4.7 Tiger 2005b 1.4.8 Tiger 2007 1.4.9 Tiger 2008 1.4.10 Leopard 2009 1.4.11 Tiger 2010 1.4.12 Tiger 2011 1.4.13 Tiger 2012 1.4.14 Tiger 2013 1.4.15 Tiger 2014 1.4.16 Tiger 2015 1.4.17 Tiger 2016 1.4.18 Tiger 2017 1.4.19 Tiger 2018 1.4.20 Tiger 2019 1.4.21 Tiger 2020 2 Instructions 2.1 Interactions 2.2 Rules of the Game 2.3 Groups 2.4 Coding Style 2.4.1 No Draft Allowed 2.4.2 Use of Foreign Features 2.4.3 File Conventions 2.4.4 Name Conventions 2.4.5 Use of C++ Features 2.4.6 Use of STL 2.4.7 Matters of Style 2.4.8 Documentation Style 2.5 Tests 2.5.1 Writing Tests 2.5.2 Generating the Test Driver 2.6 Submission 2.7 Evaluation 2.7.1 Automated Evaluation 2.7.2 During the Examination 2.7.3 Human Evaluation 2.7.4 Marks Computation 3 Source Code 3.1 Given Code 3.2 Project Layout 3.2.1 The Top Level 3.2.2 The ‘build-aux’ Directory 3.2.3 The ‘lib’ Directory 3.2.4 The ‘lib/misc’ Directory 3.2.5 The ‘src’ Directory 3.2.6 The ‘src/task’ Directory 3.2.7 The ‘src/parse’ Directory 3.2.8 The ‘src/ast’ Directory 3.2.9 The ‘src/bind’ Directory 3.2.10 The ‘src/escapes’ Directory 3.2.11 The ‘src/type’ Directory 3.2.12 The ‘src/object’ Directory 3.2.13 The ‘src/overload’ Directory 3.2.14 The ‘src/astclone’ Directory 3.2.15 The ‘src/desugar’ Directory 3.2.16 The ‘src/inlining’ Directory 3.2.17 The ‘src/temp’ Directory 3.2.18 The ‘src/tree’ Directory 3.2.19 The ‘src/frame’ Directory 3.2.20 The ‘src/translate’ Directory 3.2.21 The ‘src/canon’ Directory 3.2.22 The ‘src/assem’ Directory 3.2.23 The ‘src/target’ Directory 3.2.24 The ‘src/target/mips’ Directory 3.2.25 The ‘src/target/ia32’ Directory 3.2.26 The ‘src/target/arm’ Directory 3.2.27 The ‘src/liveness’ Directory 3.2.28 The ‘src/llvmtranslate’ Directory 3.2.29 The ‘src/regalloc’ Directory 3.3 Given Test Cases 4 Compiler Stages 4.1 Stage Presentation 4.2 PTHL (TC-0), Naive Scanner and Parser 4.2.1 PTHL Goals 4.2.2 PTHL Samples 4.2.3 PTHL Code to Write 4.2.4 PTHL FAQ 4.2.5 PTHL Improvements 4.3 TC-1, Scanner and Parser 4.3.1 TC-1 Goals 4.3.2 TC-1 Samples 4.3.3 TC-1 Given Code 4.3.4 TC-1 Code to Write 4.3.5 TC-1 FAQ 4.3.6 TC-1 Improvements 4.4 TC-2, Building the Abstract Syntax Tree 4.4.1 TC-2 Goals 4.4.2 TC-2 Samples 4.4.2.1 TC-2 Pretty-Printing Samples 4.4.2.2 TC-2 Chunks 4.4.2.3 TC-2 Error Recovery 4.4.3 TC-2 Given Code 4.4.4 TC-2 Code to Write 4.4.5 TC-2 FAQ 4.4.6 TC-2 Improvements 4.5 TC-3, Bindings 4.5.1 TC-3 Goals 4.5.2 TC-3 Samples 4.5.3 TC-3 Given Code 4.5.4 TC-3 Code to Write 4.5.5 TC-3 FAQ 4.5.6 TC-3 Improvements 4.6 TC-R, Unique Identifiers 4.6.1 TC-R Samples 4.6.2 TC-R Given Code 4.6.3 TC-R Code to Write 4.6.4 TC-R FAQ 4.7 TC-E, Computing the Escaping Variables 4.7.1 TC-E Goals 4.7.2 TC-E Samples 4.7.3 TC-E Given Code 4.7.4 TC-E Code to Write 4.7.5 TC-E FAQ 4.7.6 TC-E Improvements 4.8 TC-4, Type Checking 4.8.1 TC-4 Goals 4.8.2 TC-4 Samples 4.8.3 TC-4 Given Code 4.8.4 TC-4 Code to Write 4.8.5 TC-4 Options 4.8.6 TC-4 FAQ 4.8.7 TC-4 Improvements 4.9 TC-D, Removing the syntactic sugar from the Abstract Syntax Tree 4.9.1 TC-D Samples 4.10 TC-I, Function inlining 4.10.1 TC-I Samples 4.11 TC-B, Array bounds checking 4.11.1 TC-B Samples 4.11.2 TC-B FAQ 4.12 TC-A, Ad Hoc Polymorphism (Function Overloading) 4.12.1 TC-A Samples 4.12.2 TC-A Given Code 4.12.3 TC-A Code to Write 4.13 TC-O, Desugaring object constructs 4.13.1 TC-O Samples 4.14 TC-5, Translating to the High Level Intermediate Representation 4.14.1 TC-5 Goals 4.14.2 TC-5 Samples 4.14.2.1 TC-5 Primitive Samples 4.14.2.2 TC-5 Optimizing Cascading If 4.14.2.3 TC-5 Builtin Calls Samples 4.14.2.4 TC-5 Samples with Variables 4.14.3 TC-5 Given Code 4.14.4 TC-5 Code to Write 4.14.5 TC-5 Options 4.14.5.1 TC-5 Bounds Checking 4.14.5.2 TC-5 Optimizing Static Links 4.14.6 TC-5 FAQ 4.14.7 TC-5 Improvements 4.15 TC-6, Translating to the Low Level Intermediate Representation 4.15.1 TC-6 Goals 4.15.2 TC-6 Samples 4.15.2.1 TC-6 Canonicalization Samples 4.15.2.2 TC-6 Scheduling Samples 4.15.3 TC-6 Given Code 4.15.4 TC-6 Code to Write 4.15.5 TC-6 Improvements 4.16 TC-7, Instruction Selection 4.16.1 TC-7 Goals 4.16.2 TC-7 Samples 4.16.3 TC-7 Given Code 4.16.4 TC-7 Code to Write 4.16.5 TC-7 FAQ 4.16.6 TC-7 Improvements 4.17 TC-8, Liveness Analysis 4.17.1 TC-8 Goals 4.17.2 TC-8 Samples 4.17.3 TC-8 Given Code 4.17.4 TC-8 Code to Write 4.17.5 TC-8 FAQ 4.17.6 TC-8 Improvements 4.18 TC-9, Register Allocation 4.18.1 TC-9 Goals 4.18.2 TC-9 Samples 4.18.3 TC-9 Given Code 4.18.4 TC-9 Code to Write 4.18.5 TC-9 FAQ 4.18.6 TC-9 Improvements 4.19 TC-X, IA-32 Back End 4.19.1 TC-X Goals 4.19.2 TC-X Samples 4.19.3 TC-X Given Code 4.19.4 TC-X Code to Write 4.19.5 TC-X FAQ 4.19.6 TC-X Improvements 4.20 TC-Y, ARM Back End 4.20.1 TC-Y Goals 4.20.2 TC-Y Samples 4.20.3 TC-Y Given Code 4.20.4 TC-Y Code to Write 4.20.5 TC-Y FAQ 4.20.6 TC-Y Improvements 4.21 TC-L, LLVM IR 4.21.1 TC-L Goals 4.21.2 TC-L Samples 4.21.3 TC-L Given Code 4.21.4 TC-L Code to Write 4.21.5 TC-L FAQ 4.21.6 TC-L Improvements 5 Tools 5.1 Programming Environment 5.2 Modern Compiler Implementation 5.2.1 First Editions 5.2.2 In Java - Second Edition 5.3 Bibliography 5.4 The GNU Build System 5.4.1 Package Name and Version 5.4.2 Bootstrapping the Package 5.4.3 Making a Tarball 5.4.4 Setting site defaults using ‘CONFIG_SITE’ 5.5 GCC, The GNU Compiler Collection 5.6 Clang, A C language family front end for LLVM 5.7 GDB, The GNU Project Debugger 5.8 Valgrind, The Ultimate Memory Debugger 5.9 Flex & Bison 5.10 HAVM 5.11 MonoBURG 5.12 Nolimips 5.13 SPIM 5.14 SWIG 5.15 Python 5.16 Doxygen Appendix A Appendices A.1 Glossary A.2 GNU Free Documentation License A.2.1 ADDENDUM: How to use this License for your documents A.3 Colophon A.4 List of Files A.5 List of Examples A.6 Index ---------- Footnotes ---------- (1) . (2) . (3) . (4) . (5) . 1 Introduction ************** This document presents the Tiger Project as part of the EPITA(1) curriculum. It aims at the implementation of a Tiger compiler (*note Modern Compiler Implementation::) in C++. ---------- Footnotes ---------- (1) . 1.1 How to Read this Document ============================= If you are a newcomer, you might be afraid by its sheer size. Don’t worry, but in any case, do not give up: as stated in the very beginning of this document, *Nul n’est censé ignorer la loi.* That is to say everything exposed in this document is considered to be known. If it is written but you didn’t know, you are wrong. If it is not written _and_ was not clearly reported in the news, we are wrong. Basically this document contains three kinds of information: _Initial and Permanent_ What you must read and know since the very beginning of the project. This includes most the following chapters: *note Introduction:: (except the *note History:: section), *note Instructions::, and *note Evaluation::. _Incremental_ You should read these parts as and when needed. This includes mostly *note Compiler Stages::. _Auxiliary_ This information is provided to help you: just go there when you feel the need, *note Tools::, and *note Source Code::. If you want to have a better understanding of the project, if you are about to criticize something, be sure to read *note History:: beforehand. There is additional material on the Internet: − The Wiki page for the Tiger Compiler Project(1) is the official home page of the project. It holds related material (e.g., links). − The packages of the tools that we use (Bison, Autoconf etc.) can be found in the Tiger download area(2). − The developer documentation of the Tiger Compiler(3). − Most of the provided material (lecture notes, older exams, current tarballs etc.) is in the Tiger area(4). ---------- Footnotes ---------- (1) . (2) . (3) . (4) . 1.2 Why the Tiger Project ========================= This project is quite different from most other EPITA projects, and has aims at several different goals, in different areas: _Several iterations_ This project is about the only one with which you will live for 4 months (6 months for the brave ones), with the constant needs to fix errors found in earlier stages. _Complete Project_ While the evaluation of most student projects is based on the code, this project restores the deserved emphasis on _documentation_ and _testing_. Because of the duration of the project, you will value the importance of a good (developer’s) documentation (why did we write this 4 months ago?), and of a good test suite (why does TC-2 fails now that we implemented TC-4? When did we break it?). This also means that you have to design a test suite, and maintain it through out the project. _The test suite is an integral part of the project._ _Team Management_ The Tiger Compiler is a long project, running from February to May (and optionally further). Each three person team is likely to experience nasty “human problems”. This is explicitly a part of the project: the team management is a task you have to address. That may well include exclusion of lazy members. _C++_ C++ is by no means an adequate language to _study_ compilers (C would be even worse). Languages such as Haskell(1), Ocaml(2), Stratego(3) are much better suited (actually the latter is even designed to this end). But, as already said, the primary goal is not to learn how to write a compiler: for an EPITA student, learning C++, Design Patterns, and Object Oriented Design is much more important. Note, however, that implementing an industrial strength compiler in C++ makes a lot of sense(4). Bjarne Stroustrup’s list of C++ Applications(5) mentions *note GCC::, *note Clang:: and LLVM, Metrowerks (CodeWarrior), HP, Sun, Intel, M$ as examples. _Understanding Computers_ Too many students still have a very fuzzy mental picture of what a computer is, and how a program runs. Studying compilers helps understanding how it works, and therefore _how to perform a good job_. Although most students will never be asked to write a single line of assembly during their whole lives, knowing assembly is also of help. *Note Bjarne Stroustrup::, for instance, says: Q: What is your opinion, is knowing assembly language useful for programmers nowadays? BS: It is useful to understand how machines work and knowing assembler is almost essential for that. _English_ English is _the_ language for this project, starting with this very document, written by a French person, for French students. You cannot be a good computer scientist with absolutely no fluency in English. The following quote is from Bjarne Stroustrup, who is danish (*note The Design and Evolution of C++::, 6.5.3.2 Extended Character Sets): English has an important role as a common language for programmers, and I suspect that it would be unwise to abandon that without serious consideration. Any attempt to break the importance of English is wrong. For instance, _do not translate this document nor any other_. Ask support to the Yakas, or to the English team. By the past, some oral and written examinations were made in English. It may well be back some day. Some books will help you to improve your English, see *note The Elements of Style::. _Compiler_ The project aims at the implementation of a compiler, but _this is a minor issue_. The field of compilers is a wonderful place where most of computer science is concentrated, that’s why this topic is extremely convenient as long term project. But _it is not the major goal_, the full list of all these items is. The Tiger project is not unique in these regards, see *note Cool - The Classroom Object-Oriented Compiler::, for instance, with many strikingly similar goals, and some profound differences. See also *note Making Compiler Design Relevant for Students who will (Most Likely) Never Design a Compiler::, for an explanation of why compilation techniques have a broader influence than they seem. ---------- Footnotes ---------- (1) . (2) . (3) . (4) The fact that the compiler compiles C++ is virtually irrelevant. (5) . 1.3 What the Tiger Project is not ================================= This section could have been named “What Akim did not say”, or “Common misinterpretations”. The first and foremost misinterpretation would be “Akim says C sucks and is useless”. Wrong. C sucks, definitely, but let’s face it: C is *mandatory* in your education. The fact that C++ is studied afterward does not mean that learning C is a loss of time, it means that since C is basically a subset of C++ it makes sense to learn it first, it also means that (let it be only because it is a superset) C++ provides additional services so it is often a better choice, but even more often _you don’t have the choice_. C++ is becoming a common requirement for programmers, so you also have to learn it, although it “features” many defects (but heredity was not in its favor...). It’s an industrial standard, so learn it, and learn it well: know its strengths and weaknesses. And by the way, of course C++ sucks++. Another common rumor in EPITA has it that “C/Unix programming does not deserve attention after the first period”. Wrong again. First of all its words are wrong: it is a legacy belief that C and Unix require each other: _you can implement advanced system features using other languages than C_ (starting with C++, of course), and of course _C can be used for other tasks than just system programming_. For instance Bjarne Stroustrup’s list of C++ Applications(1) includes: Apple OS X is written in a mix of language, but a few important parts are C++. The two most interesting are: − Finder − IOKit device drivers. (IOKit is the only place where we use C++ in the kernel, though.)[...] Ericsson − TelORB - Distributed operating system with object oriented Microsoft Literally everything at Microsoft is built using recent flavors of Visual C++. The list would include major products like: − Windows XP − Windows NT (NT4 and 2000) − Windows 9x (95, 98, Me) − Microsoft Office (Word, Excel, Access, PowerPoint, Outlook)[...] − Visual Studio CDE The CDE desktop (the standard desktop on many UNIX systems) is written in C++. Mozilla − Firefox − Thunderbird Adobe Systems All major applications are developed in C++: − Photoshop − Illustrator − Acrobat Know C. Learn when it is adequate, and why you need it. Know C++. Learn when it is adequate, and why you need it. Know other languages. Learn when they are adequate, and why you need them. And then, if you are asked to choose, make an educated choice. If there is no choice to be made, just deal with Real Life. ---------- Footnotes ---------- (1) . 1.4 History =========== The Tiger Compiler Project evolves every year, so as to improve its infrastructure, to demonstrate more instructional material and so forth. This section tries to keep a list of these changes, together with the most constructive criticisms from students (or ourselves). If you have information, including criticisms, that should be mentioned here, please send it to us. The years correspond to the class, e.g., Tiger 2005 refers to EPITA class 2005, i.e., the project ran from October 2002 to July (previously September) 2003. 1.4.1 Fair Criticism -------------------- Before diving into the history of the Tiger Compiler Project in EPITA, a whole project in itself for ourselves, with experimental tries and failures, it might be good to review some constraints that can explain why things are the way they are. Understanding these constraints will make it easier to criticize actual flaws, instead of focusing on issues that are mandated by other factors. Bear in mind that Tiger is an instructional project, the purpose of which is detailed above, see *note Why the Tiger Project::. Because the input is a stream of students with virtually no knowledge whatsoever in C++, and our target is a stream of students with good fluency in many constructs and understanding of complex matters, we have to gradually transform them via intermediate forms with increasing skills. In particular this means that by the end of the project, evolved techniques can and should be used, but at the beginning only introductory knowledge should be needed. As an example of a consequence, we cannot have a nice and high-tech AST. Because the insight of compilers is not the primary goal, when a choice is to be made between (i) more interesting work on compiler internals with little C++ novelty, and (ii) providing most of this work and focusing on something else, then we are most likely to select the second option. This means that the Tiger Project is doomed to be a low-tech featureless compiler, with no call graph, no default optimization, no debugging support, no bells, no whistles, and even no etc. Hence, most interested students will sometimes feel we “stole” the pleasure to write nice pieces of code from them; understand that we actually provided code to the _other_ students: you are free to rewrite everything if you wish. 1.4.2 Tiger 2002 ---------------- This is not standard C++ We used to run the standard compiler from NetBSD: ‘egcs’ 1.1.2. This was not standard C++ (e.g., we used to include ‘’, we could use members of the ‘std’ name space unqualified etc.). In addition, we were using ‘hash_map’ which is an SGI extension that is not available in standard C++. It was therefore decided to upgrade the compiler in 2003, and to upgrade the programming style. Wrapping a tarball is impossible During the first edition of the Tiger Compiler project, students had to write their own Makefiles — after all, knowing Make is considered mandatory for an Epitean. This had the most dramatic effects, with a wide range of creative and imaginative ways to have your project fail; for instance: − Forget to ship some files − Ship object files, or even the executable itself. Needless to say that NetBSD executables did not run properly on Akim’s GNU/Linux box. − Ship temporary files (‘*~’, ‘#*#’, etc.). − Ship core dumps (“Wow! This _is_ the heck of an heavy tarball...”). − Ship tarballs in the tarball. − Ship _tarballs of other groups_ in the tarball. It was then hard to demonstrate they were not cheating :) − Have incorrect dependencies that cause magic failures. − Have completely lost confidence in dependencies and Make, and therefore define the ‘all’ target as first running ‘clean’ and then the actual build. As a result Akim grew tired of fixing the tarballs, and in order to have a robust, efficient (albeit some piece of pain in the neck sometimes) distribution (1) we moved to using Automake, and hence Autoconf. There are reasons not to be happy with it, agreed. But there are many more reasons to be sad without it. So Autoconf and Automake are here to stay. Note, however, that you are free to use another system if you wish. Just obey the standard package interface (*note Submission::). The ‘SemantVisitor’ is a nightmare to maintain The ‘SemantVisitor’, which performs both the type checking and the translation to intermediate code, was near to impossible to deliver in pieces to the students: because type checking and translation were so much intertwined, it was not possible to deliver as a first step the type checking machinery template, and then the translation pieces. Students had to fight with non applicable patches. This was fixed in Tiger 2003 by splitting the ‘SemantVisitor’ into ‘TypeVisitor’ and ‘TranslationVisitor’. The negative impact, of course, is a performance loss. Akim is tired during the student defenses Seeing every single group for each compiler stage is a nightmare. Sometimes Akim was not enough aware. ---------- Footnotes ---------- (1) See the shift of language? From tarball to distribution. 1.4.3 Tiger 2003 ---------------- During this year, Akim was helped by: Comaintainers Alexandre Duret-Lutz, Thierry Géraud. Submission dates were: Stage Submission --------------------------------------------- TC-1 Monday, December 18th 2000 at noon TC-2 Friday, February 23th 2001 at noon TC-3 Friday, March 30th 2001 at noon TC-4 Tuesday, June 12th 2001 at noon TC-5 Monday, September 17th 2001 at noon Some groups have reached TC-6. Criticisms include: The C++ compiler is broken Akim had to install an updated version of the C++ compiler since the system team did not want non standard software. Unfortunately, NetBSD turned out to be seriously incompatible with this version of the C++ compiler (its ‘crt1.o’ dumped core on the standard stream constructors, way before calling ‘main’). We had to revert to using the bad native C++ compiler. It is to be noted that some funny guy once replaced the ‘g++’ executable from Akim’s account into ‘rm -rf ~’. Some students and Akim himself have been bitten. The funny thing is that this is when the system administration realized the teacher accounts were not backed up. Fortunately, since that time, decent compilers have been made available, and the Tiger Compiler is now written in strictly standard C++. The AST is rigid Because the members of the AST objects were references, it was impossible to implement any change on it: simplifications, optimization etc. This is fixed in Tiger 2004 where all the members are now pointers, but the interface to these classes still uses references. Akim is even more tired during the student defenses Just as the previous year, see *note Tiger 2002::, but with more groups and more stages. But now there are enough competent students to create a group of assistants, the Yakas, to help the students, and to share the load of defenses. Upgrading is not easy Only tarballs were submitted, making upgrades delicate, error prone, and time consuming. The systematic use of patches between tarballs since the 2004 edition solves this issue. Upgraded tarballs don’t compile Students would like at least to be able to compile a tarball with its holes. To this end, much of the removed code is now inside functions, leaving just what it needed to satisfy the prototype. Unfortunately this is not very easy to do, and conflicts with the next complaint: Filling holes is not interesting In order to scale down the amount of code students have to write, in order to have them focus on instructional material, more parts are submitted almost complete except for a few interesting places. Unfortunately, some students decided to answer the question completely mechanically (copy, paste, tweak until it compiles), instead of focusing of completing their own education. There is not much we can do about this. Some parts will therefore grow; typically some files will be left empty instead of having most of the skeleton ready (prototypes and so forth). This means more work, but more interesting I (Akim) guess. But it conflicts with the previous item... 1.4.4 Tiger 2004 ---------------- During this year, Akim was helped by: Comaintainers Alexandre Duret-Lutz, Raphaël Poss, Robert Anisko, Yann Régis-Gianas, Assistants Arnaud Dumont, Pascal Guedon, Samuel Plessis-Fraissard, Students Cédric Bail, Sébastien Broussaud (Darks Bob), Stéphane Molina (Kain), William Fink. Submission dates were: Stage Submission --------------------------------------------- TC-2 Tuesday, March 4th 2002 at noon TC-3 Friday, March 15th 2002 at noon TC-4 Friday, April 12th 2002 at noon TC-5 Friday, June 14th 2002, at noon TC-6 Monday, July 15th 2002 at noon Criticisms include: The driver is not maintainable The compiler driver was a nightmare to maintain, extend etc. when delivering additional modules etc. This was fixed in 2005 by the introduction of the ‘Task’ model. No sane documentation This was addressed by the use of Doxygen in 2005. No UML documentation The solution is yet to be found. Too many visitors It seems that some students think there were too many visitors to implement. I (Akim) do not subscribe to this view (after all, why not complain that “there are too many programs to implement”, or, in a more C++ vocabulary “there are too many classes to implement”), nevertheless in Tiger 2005 this was addressed by making the ‘EscapeVisitor’ “optional” (actually it became a rush). Too many memory leaks The only memory properly reclaimed is that of the AST. No better answer for the rest of the compiler. This is the most severe flaw in this project, and definitely the worst thing to remember of: what we showed is not what student should learn to do. Though a garbage collector is tempting and well suited for our tasks, its pedagogical content is less interesting: students should be taught how to properly manage the memory. Upgraded tarballs don’t compile Filling holes is not interesting Cannot be solved, see *note Tiger 2003::. Ending on TC-6 is frustrating Several students were frustrated by the fact we had to stop at TC-6: the reference compiler did not have any back-end. Continuing onto TC-7 was offered to several groups, and some of them actually finished the compiler. We took their work, adjusted it, and it became the base of the reference compiler of 2005. The most significant effort was made by Daniel Gazard. Double submission is intractable Students were allowed to deliver twice their project — with a small penalty — if they failed to meet the so-called “first submission deadline”, or if they wanted to improve their score. But it was impossible to organize, and led to too much sloppiness from some students. These problems were addressed with the introduction of “uploads” in Tiger 2005. 1.4.5 Tiger 2005 ---------------- A lot of the following material is the result of discussion with several people, including, but not limited to(1): Comaintainers Benoît Perrot, Raphaël Poss, Assistants Alexis Brouard, Sébastien Broussaud (Darks Bob), Stéphane Molina (Kain), William Fink, Students Claire Calméjane, David Mancel, Fabrice Hesling, Michel Loiseleur. I (Akim) here thank all the people who participated to this edition of this project. It has been a wonderful vintage, thanks to the students, the assistants, and the members of the LRDE. Deliveries were: Stage Submission ------------------------------------------------------------------ TC-0 Friday, January 24th 2003 12:00 TC-1 Friday, February 14th 2003 12:00 TC-2 Friday, March 14th 2003 12:00 TC-4 Friday, April 25th 2003 12:00 TC-3 Rush from Saturday, May 24th at 18:00 to Sunday 12:00 TC-56 Friday, June 20th 2003, 12:00 TC-7 Friday, July 4th 2003 12:00 TC-78 Friday, July 18th 2003 12:00 TC-9 Monday, September 8th 2003 12:00 Criticisms about Tiger 2005 include: Too many memory leaks *Note Tiger 2004::. This is the most significant failure of Tiger as an instructional project: we ought to demonstrate the proper memory management in big project, and instead we demonstrate laziness. Please, criticize us, denounce us, but do not reproduce the same errors. The factors that had pushed to a weak memory management is mainly a lack of coordination between developers: we should have written more things. So don’t do as we did: define the memory management policy for each module, and write it. The 2006 edition pays strict attention to memory allocation. Too long to compile Too much code was in ‘*.hh’ files. Since then the policy wrt file contents was defined (*note File Conventions::), and in Tiger 2006 was adjusted to obey these conventions. Unfortunately, although the improvement was significant, it was not measured precisely. The interfaces between modules have also been cleaned to avoid excessive inter dependencies. Also, when possible, opaque types are used to avoid additional includes. Each module exports forward declarations in a ‘fwd.hh’ file to promote this. For instance, ‘ast/tasks.hh’ today includes: // Forward declarations of ast:: items. #include "ast/fwd.hh" // ... /// Global root node of abstract syntax tree. extern ast::Exp* the_program; // ... where it used to include all the AST headers to define exactly the type ‘ast::Exp’. Upgraded tarballs don’t compile Filling holes is not interesting Cannot be solved, see *note Tiger 2003::. No written conventions Since its inception, the Tiger Compiler Project lacked this very section (*note History::) and that dedicated to coding style (*note Coding Style::) until the debriefing of 2005. As a result, some students or even so co-developers of our own ‘tc’ reproduced errors of the past, changed something for lack of understanding, slightly broke the homogeneity of the coding style etc. Do not make the same mistake: write down your policy. The AST is too poor One would like to insert annotations in the AST, say whether a variable is escaping (to know whether it cannot be in a register, see *note TC-3: TC-3, and *note TC-5: TC-5.), or whether the left hand side of an assignment in ‘Void’ (in which case the translation must not issue an actual assignment), or whether ‘a < b’ is about strings (in which case the translation will issue a hidden call to ‘strcmp’), or the type of a variable (needed when implementing object oriented Tiger), etc., etc. As you can see, the list is virtually infinite. So we would need an extensible system of annotation of the AST. As of September 2003 no solution has been chosen. But we must be cautious not to complicate TC-2 too much (it is already a very steep step). People don’t learn enough C++ It seems that the goal of learning object oriented programming and C++ is sometimes hidden behind the difficult understanding of the Tiger compiler itself. Sometimes students just fill the holes. To avoid this: − The holes will be bigger (conflicting with the ease to compile something, of course) to avoid any mechanical answering. − Each stage is now labeled with its "goals" (e.g., *note TC-2 Goals: TC-2 Goals.) that should help students to understand what is expected from them, and examiners to ask the appropriate questions. The computation of the escapes is too hard The computation of the escapes is too easy If you understood what it means that a variable escapes, then the implementation is so straightforward that it’s almost boring. If you didn’t understand it, you’re dead. Because the understanding of escapes needs a good understanding of the stack management (explained more in details way afterward, during TC-5), many students are deadly lost. We are considering splitting TC-5 into two: TC-5- which would be limited to programs without escaping variables, and TC-5+ with escaping variables _and_ the computation of the escapes. The static-link optimization pass is improperly documented Todo. The use of references is confusing We used to utilize references instead of pointers when the arity of the relation is one; in other words, we used pointers iff 0 was a valid value, and references otherwise. This is nice and clean, but unfortunately it caused great confusion amongst students (who were puzzled before ‘*new’, and, worse yet, ended believing that’s the only way to instantiate objects, even automatic!), and also confused some of the maintainers (for whom a reference does not propagate the responsibility wrt memory allocation/deallocation). Since Tiger 2006, the coding style enforces a more conventional style. Not enough freedom The fact that the modelisation is already settled, together with the extensive skeletons, results in too tight a space for a programmer to experiment alternatives. We try to break these bounds for those who want by providing a generic interface: if you comply with it, you may interchange with your full re-implementation. We also (now explicitly) allow the use of a different tool set. Hints at possible extensions are provided, and finally, alternative implementation are suggested for each stage, for instance see *note TC-2 Improvements: TC-2 Improvements. ---------- Footnotes ---------- (1) Please, let us know whom we forgot! 1.4.6 Tiger 2006 ---------------- Akim has been helped by: Assistants Claire Calméjane, Fabrice Hesling, Marco Tessari, Tristan Lanfrey Deliveries: Stage Kind Submission Supervisor -------------------------------------------------------------------------- TC-0 Wednesday, 2004-02-04 12:00 Anne-Lise Brourhant TC-1 Sunday, 2004-02-08 12:00 Tristan Lanfrey TC-2 Sunday, 2004-03-07 12:00 Anne-Lise Brourhant, Tristan Lanfrey TC-3 Rush Fr., 2004-03-19 18:30 to Sun., Fabrice Hesling 2004-03-21 19:00 TC-4 Sunday, 2004-04-11 19:00 Tristan Lanfrey TC-5 Sunday, 2004-06-06 12:00 Fabrice Hesling TC-6 Sunday, 2004-06-27 12:00 Marco Tessari TC-7 Opt Sunday, 2004-07-11 12:00 Marco Tessari or Fabrice Hesling TC-89 Opt Thursday, 2004-07-29 12:00 Marco Tessari Criticisms about Tiger 2006 include: The interface of ‘symbol::Table’ should be provided On the one hand side, we meant to have students implement it from scratch so we shouldn’t provide the header, and on the other hand, the rest of the (provided) code expects a well defined interface, so we should publish it! The result was confusion and loss of time. The problem actually disappeared: Tiger 2007 no longer depends so heavily on scoped symbol tables. Some examples are incorrectly rejected by the reference compiler The Tiger reference manual does not exclude sick examples such as: let type rec = {} in rec {} end where the type ‘rec’ escapes its scope since the type checker will assign the type ‘rec’ to the ‘let’ construct. Given the suggested implementation, which reclaims memory allocated by the declarations when closing the scope, the compiler dumps core. The new implementation, tested with 2005b, copes with this gracefully: types are destroyed when the AST is. This does not cure the example, which should be invalid IMHO. The following example, from Arnaud Fabre, amplifies the problem. let var box := let type box = {val: string} var box := box {val = "42\n"} in box end in print(box.val) end TC-5 is too hard a stage This is a recurrent complaint. We tried to make it easier by moving more material into earlier stages (e.g., scopes are no longer dealt with by the ‘TranslateVisitor’: the ‘Binder’ did it all). Multiple inheritance is not demonstrated There are several nice opportunities of factoring the AST using multiple inheritance. Tiger 2007 uses them (e.g., ‘Escapable’, ‘Bindable’ etc.). The coding style for types is inconsistent The sources are ambivalent wrt to pointer and reference types. Sometimes ‘type *var’, sometimes ‘type* var’. Obviously the latter is the more “logical”: the space separates the type from the variable name. Unfortunately the declaration semantics in C/C++ introduces pitfalls: ‘int* ip, i’ is equivalent to ‘int* ip; int i;’. That is why I, Akim, was using the ‘type *var’ style, and resisted to expressing _the_ coding style on this regard. The resulting mix of styles was becoming chronic: defining a rule was needed... In favor of ‘type* var’, with the provision that multiple variable declarations are forbidden. More enthusiasm from the assistants It has been suggested that assistants should show more motivation for the Tiger Project. It was suggested that they were not enough involved in the process. For Tiger 2007, there are no less than 10 Tiger assistants (as opposed to 4), and two of them are co-maintaining the reference compiler. Assistants will also be kept more informed of code changes than before. More technical lectures Some regret when programming techniques (e.g., object functions, ‘#include ’) are not taught. My (Akim’s) personal opinion is that students should learn to learn by themselves. It was decided to more emphasize these goals. Also, oral examinations should be _ahead_ the code submission, and that should ensure that students have understood what is expected from them. Formal definition of Booleans The Tiger language enjoys well defined semantics: a given program has a single defined behavior... except if the value of ‘a & b’ or ‘a | b’ is used. To fix this issue, in Tiger 2007 they return either 0 or 1. Amongst other noteworthy changes, after five years of peaceful existence, the stages of the compiler were renamed from T1, T4 etc. to TC-1, TC-4... EPITA moved from “periods” (P1, P2...) to “trimesters” and they stole T1 and so forth from Tiger. 1.4.7 Tiger 2005b ----------------- Akim has been helped by: Comaintainers Arnaud Fabre, Gilles Walbrou, Roland Levillain Assistants Charles Rathouis, Claire Calméjane, Fabrice Hesling, Marco Tessari, Tristan Carel, Tristan Lanfrey, Deliveries: Stage Kind Submission ------------------------------------------------ TC-1 Sun 2004-10-10 12:00 TC-2 Sun 2004-10-24 12:00 TC-3 Sun 2004-11-7 12:00 TC-4 Sun 2004-11-28 12:00 Criticisms about Tiger 2006 include: Use of ‘misc::ident’ Some examples would be most welcome. Well, there is ‘misc/test-indent.cc’, and now the ‘PrintVisitor’ code includes a few examples. ‘test-ref.cc’ This file is used only in TC-5, yet it is submitted at TC-1, so students want to fix it, which is too soon. Tarballs will be adjusted to avoid this. 1.4.8 Tiger 2007 ---------------- Akim has been helped by: Comaintainers Arnaud Fabre, Roland Levillain, Gilles Walbrou Assistants Arnaud Fabre, Bastien Gueguen, Benoît Monin, Chloé Boivin, Fanny Ricour, Gilles Walbrou, Julien Nesme, Philippe Kajmar, Tristan Carel Deliveries: Stage Kind Launch Submission Supervisor --------------------------------------------------------------------------------------------------- TC-0 Wed 2005-03-09 Tue 2005-03-15 23:42 Bastien Gueguen TC-1 Rush Fri 2005-03-18 Sun 2005-03-19 9:00 Guillaume Bousquet TC-2 Mon 2005-03-21 Sun 2005-04-03 Nicolas Rateau TC-3 Rush Fri 2005-04-08 20:00 Sun 2005-04-10 12:00 Fanny Ricour TC-4 Mon 2005-04-18 Sun 2005-05-01 Julien Nesme TC-5 Mon 2005-05-09 Sun 2005-06-05 Benoît Monin TC-6 Mon 2005-06-06 Sun 2005-06-12 Philippe Kajmar TC-7 Mon 2005-06-13 Sun 2005-06-19 Gilles Walbrou TC-8 Mon 2005-06-20 Mon 2005-06-27 Arnaud Fabre TC-9 Mon 2005-06-20 Sun 2005-07-03 Arnaud Fabre Final Wed 2005-07-06 submission Criticisms about Tiger 2007 include: Cheating Too much cheating during TC-5. Some would like more repression; that’s fair enough. We will also be stricter during the exams. Debriefing After a submission, there should be longer debriefings, including details about common errors. Some of the mysterious test cases should be explained (but not given in full). Maybe some bits of C++ code too. Design of the compiler More justification of the overall design is demanded. Some selected parts, typically TC-5, should have a UML presentation. Tarball Keep the tarball simple to use. We have to improve the case of tcsh. Also: give the tarball before the presentation by the assistants. Oral examinations Assistants should be given a map of where to look at. The test suite should be evaluated at each submission. The use of version control too. Optional parts They want more of them! We have more: see *note TC-R: TC-R, *note TC-D: TC-D, and *note TC-I: TC-I. ‘misc::’ tools There should be a presentation of them. TC-3 is too long TC-3, a rush, took several groups by surprise. Some groups would have liked to have the files earlier: in the future we will publish them on the Wednesday, instead of the last minute. Some groups have found it very difficult to be several working together on the same file (‘binder.cc’ of course). This is also a problem in the group management, and use of version control: when tasks are properly assigned, and using a tool such as Subversion, such problems should be minimal. In particular, merges resulting from updates should not be troublesome! Difficult updates result from disordered edition of the files. Dropping the use of a version control manager is not an answer: you will be bitten one day if two people edit concurrently the same file. One option is to split the file, say ‘binder-exp.cc’ and ‘binder-dec.cc’ for instance. I (Akim) leave this to students. The template method template is too hard Some students would have preferred not to have the declaration of ‘Binder::decs_visit’, but the majority prefers: we will stay on this version, but we will emphasize that students are free not to follow our suggestions. TC-5 Several people would like more time to do it. But let’s face it: the time most student spend on the project is independent of the amount of available time. Rather, early oral exams about TC-5 should suffice to prompt students to start earlier. People agree it is harder, and mainly because of compiler construction issues, not C++ issues. But many students prefer to keep it this way, rather than completely giving away the answers to compiler construction related problems. 1.4.9 Tiger 2008 ---------------- We have been helped by: Comaintainers Christophe Duong, Fabien Ouy Assistants Deliveries: Stage Kind Launch Submission Supervisor ------------------------------------------------------------------------------------------- TC-0 Tue 01-03 Fri 01-13 23:42 Christophe Duong TC-1 Rush Fri 03-17 Sun 03-19 12:12 Renaud Lienhart TC-2 Mon 03-20 Thu 03-30 23:42 David Doukhan TC-3 Rush Fri 03-31 Sun 04-02 12:12 Frederick Mousnier-Lompre TC-4 Tue 04-04 Mon 04-24 23:42 Guillaume Deslandes TC-5 Mon 05-01 Sun 05-28 23:42 Alexis Sebbane TC-6 Mon 05-29 Sun 06-11 23:42 Christophe Duong TC-7 Wed 06-14 Wed 06-21 12:00 TC-8 Wed 06-21 Sun 07-2 12:00 TC-9 Mon 07-03 Sun 07-16 12:00 Final Some of the noteworthy changes compared to *note Tiger 2007::: Simplification of the parser The parser is simplified in a number of ways. First the old syntax for imported files, ‘let end’, is simplified into ‘’. We also use GLR starting at TC-2. ‘&’, ‘|’ and the unary minus operator are desugared using concrete syntax transformations. *Note TC-R: TC-R, Unique identifiers This new optional part should be done during TC-3. Leave TC-E for later (with TC-5 or maybe TC-4). Concrete syntax Transformations can now be written using Tiger concrete syntax rather than explicit AST construction in C++. This applies to the ‘DesugarVisitor’, ‘BoundsCheckingVisitor’ and ‘InlineVisitor’. 1.4.10 Leopard 2009 ------------------- We have been helped by: Comaintainers Benoît Tailhades, Alain Vongsouvanh, Razik Yousfi, Benoît Perrot, Benoît Sigoure Assistants Deliveries: Stage Kind Launch Submission Supervisor ------------------------------------------------------------------------------------------------- LC-0 Mon 03-05 Fri 03-16 12:00 LC-1 Rush Fri 03-23 Sun 03-25 12:00 LC-2 Mon 03-26 Fri 04-06 12:00 LC-3 & LC-R Rush Fri 04-06 Sun 04-08 12:00 LC-4 Mon 04-23 Sun 05-06 12:00 LC-5 Mon 05-15 Sun 06-03 12:00 LC-6 Mon 06-04 Sun 06-10 12:00 LC-7 Mon 06-11 Wed 06-20 12:00 LC-8 Thu 06-21 Sun 07-01 12:00 LC-9 Mon 07-02 Sun 07-15 12:00 Some of the noteworthy changes compared to *note Tiger 2008::: Object-Oriented Programming The language is extended with object-oriented features, as described by Andrew Appel in chapter 14 of *note Modern Compiler Implementation::. The syntax is close to Appel’s, with small modifications, see *Note (tiger)Syntactic Specifications::. Leopard To reflect this major addition, the language (and thus the project) is given a new name, _Leopard_. These changes was announced at TC-2, (renamed LC-2). LC-R LC-R is a mandatory part of the LC-3 assignment. 1.4.11 Tiger 2010 ----------------- We have been helped by: Comaintainers Benoît Perrot, Benoît Sigoure, Guillaume Duhamel, Yann Grandmaître, Nicolas Teck Assistants Deliveries: Stage Kind Launch Submission Supervisor ------------------------------------------------------------------------------------------------- TC-0 Mon Nov 05, 2007 Sun Nov 25, 2007 12:00 TC-1 Mon Dec 10, 2007 Sun Dec 16, 2007 12:00 TC-2 Mon Feb 25, 2008 Wed Mar 05, 2008 12:00 TC-3 & TC-R Rush Fri Mar 07, 2008 Sun Mar 09, 2008 12:00 TC-4 Mon Mar 10, 2008 Sun Mar 23, 2008 12:00 TC-5 Mon Mar 24, 2008 Sun Apr 06, 2008 12:00 TC-6 Mon Apr 14, 2008 Sun Apr 20, 2008 12:00 TC-7 Mon Apr 21, 2008 Sun May 04, 2008 12:00 TC-8 Mon May 05, 2008 Sun May 18, 2008 12:00 TC-9 Mon May 19, 2008 Sun Jun 01, 2008 12:00 Some of the noteworthy changes compared to *note Leopard 2009::: The Tiger is back The project is renamed back to its original name. 1.4.12 Tiger 2011 ----------------- This is the tenth year of the Tiger Project. We have been helped by: Assistants Adrien Biarnes, Medhi Ellaffet, Vincent Nguyen-Huu, Yann Grandmaître, Nicolas Teck Deliveries: Stage Kind Launch Submission Supervisor ------------------------------------------------------------------------------------------------- .tig Rush Dec 20, 2008 Dec 21, 2008 TC-0 Jan 05, 2009 Jan 16, 2009 at 12:00 TC-1 Rush Jan 16, 2009 Jan 18, 2009 at 12:00 TC-2 Feb 16, 2009 Feb 25, 2009 at 23:42 TC-3 & TC-R Rush Feb 27, 2009 Mar 01, 2009 at 11:42 TC-4 & TC-E Mar 02, 2009 Mar 15, 2009 at 11:42 TC-5 Mar 16, 2009 Mar 25, 2009 at 23:42 TC-6 Apr 23, 2009 May 03, 2009 at 12:00 TC-7 May 04, 2009 May 17, 2009 TC-8 May 18, 2009 May 31, 2009 TC-9 Jun 29, 2009 Jul 12, 2009 Some of the noteworthy changes compared to *note Tiger 2010::: The Bistromatig A new assignment is given for the ‘.tig’ project: The Bistromatig. It consists in implementing an arbitrary-radix infinite-precision calculator. The project is an adaptation of the famous Bistromathic project, that used to be one of the first C assignments at EPITA in the Old Days. The name was borrowed from Douglas Adams(1)’s invention(2) from _Life, the Universe and Everything_(3). TC-E TC-E is a mandatory part of the TC-4 assignment. ---------- Footnotes ---------- (1) . (2) . (3) . 1.4.13 Tiger 2012 ----------------- This is the eleventh year of the Tiger Project. We have been helped by: Assistants Adrien Biarnes, Rémi Chaintron, Julien Delhommeau, Thomas Joly, Alexandre Laurent, Vincent Lechemin, Matthieu Martin Deliveries: Stage Kind Launch Submission Supervisor ------------------------------------------------------------------------------------------------- .tig Rush Dec 02, 2009 Dec 04, 2009 TC-0 Dec 11, 2009 Dec 20, 2009 TC-1 Jan 11, 2010 Jan 17, 2010 TC-2 Feb 01, 2010 Feb 17, 2010 TC-3 & TC-R Rush Feb 19, 2010 Feb 26, 2010 TC-4 & TC-E Feb 22, 2010 Mar 07, 2010 TC-5 Mar 11, 2010 Mar 22, 2010 TC-6 Apr 19, 2010 May 02, 2010 TC-7 May 12, 2010 May 25, 2010 TC-8 May 25, 2010 Jun 06, 2010 TC-9 Jun 07, 2010 Jun 12, 2010 Some of the noteworthy changes compared to *note Tiger 2011::: Shorter mandatory assignment By decision of the department of studies, the mandatory assignment ends after TC-3. 1.4.14 Tiger 2013 ----------------- This is the twelfth year of the Tiger Project. We have been helped by: Assistants Rémi Chaintron, Julien Grall Deliveries: Stage Kind Launch Submission Supervisor ------------------------------------------------------------------------------------------------- .tig Rush TC-0 TC-1 TC-2 TC-3 & TC-R Rush TC-4 & TC-E TC-5 TC-6 TC-7 TC-8 TC-9 Some of the noteworthy changes compared to *note Tiger 2012::: Build overhaul Silent rules, fewer Makefiles. Bison Variant The parser is storing objects on its stacks, not only pointers. Other recent Bison features are also used. 1.4.15 Tiger 2014 ----------------- This is the thirteenth year of the Tiger Project. We have been helped by: Assistants Jonathan Aigrain, Jules Bovet, Hugo Damme, Michael Denoun, Julien Grall, Christophe Pierre, Paul Similowski CSI students Félix Abecassis Deliveries for Ing1 students: Stage Kind Launch Submission Supervisor ------------------------------------------------------------------------------------------------- .tig Lab Nov 16, 2011 Nov 16, 2011 TC-0 Dec 05, 2011 Dec 18, 2011 at 23:42 TC-1 Rush Jan 30, 2012 at 19:00 Feb 02, 2012 at 18:42 TC-2 Feb 02, 2012 at 19:00 Feb 10, 2012 at 18:42 TC-3 & TC-R Rush Feb 10, 2012 at 19:00 Feb 12, 2012 at 11:42 TC-4 & TC-E Feb 20, 2012 at 19:00 Mar 04, 2012 at 11:42 TC-5 Mar 05, 2012 at 19:00 Mar 18, 2012 at 11:42 TC-6 Apr 23, 2012 at 19:00 May 06, 2012 at 11:42 TC-7 May 21, 2012 at 19:00 Jun 03, 2012 at 11:42 TC-8 Jun 04, 2012 at 19:00 Jun 17, 2012 at 11:42 TC-9 Jul 02, 2012 at 19:00 Jul 15, 2012 at 11:42 Deliveries for AppIng1 students: Stage Kind Launch Submission Supervisor ------------------------------------------------------------------------------------------------- .tig Lab Nov 19, 2011 Nov 19, 2011 TC-0 Dec 05, 2011 Dec 18, 2011 at 23:42 TC-1 Jan 28, 2012 at 10:00 Feb 05, 2012 at 11:42 TC-2 Feb 08, 2012 at 19:00 Feb 17, 2012 at 18:42 TC-3 & TC-R Rush Feb 17, 2012 at 19:00 Feb 19, 2012 at 11:42 Some of the noteworthy changes compared to *note Tiger 2013::: The Logomatig Due to time constraints, the Bistromatig assignment that has been previously used in the past three years for the ‘.tig’ rush has been replaced by a 4-hour lab assignment: The Logomatig. This assignment is about implementing a small interpreter in Tiger for a subset of the Logo language(1). The name of this project is a tribute to Logo, Tiger and the Bistromathic (though there are very few calculations in it). Introduction of C++ 2011 features Since a new C++ standard has been released this year (September 11, 2011), we are introducing some of its features in the Tiger project, namely range-based ‘for’-loops, ‘auto’-typed variables, use of the ‘nullptr’ literal constant, use of explicitly defaulted and deleted functions, template metaprogramming traits provided by the standard library, and use of consecutive right angle brackets in templates. This set of features has been chosen for it is supported both by GCC 4.6 and Clang 3.0. Git Git has replaced Subversion as version control system at EPITA. As of this year, we also provide the code with gaps through a public Git repository(2). This method makes the integration of the code provided at the beginning of each stage easier (with the exception of TC-0, which is still to be done from scratch). ---------- Footnotes ---------- (1) . (2) . 1.4.16 Tiger 2015 ----------------- This is the fourteenth year of the Tiger Project. We have been helped by: Assistants Laurent Gourvénec, Xavier Grand, Frédéric Lefort, Théophile Ranquet, Robin Wils Deliveries for Ing1 students: Stage Kind Launch Submission Supervisor ------------------------------------------------------------------------------------------------- .tig Rush Nov 23, 2012 at 18:42 Nov 25, 2012 at 11:42 PTHL (TC-0) Dec 10, 2012 at 18:42 Dec 23, 2012 at 11:42 TC-1 Rush Feb 11, 2013 at 18:42 Feb 13, 2013 at 23:42 TC-2 Feb 14, 2013 at 18:42 Feb 24, 2013 at 11:42 TC-3 & TC-R Mar 4, 2013 at 18:42 Mar 10, 2013 at 11:42 TC-4 & TC-E Mar 11, 2013 at 18:42 Mar 24, 2013 at 11:42 TC-5 Apr 22, 2013 at 18:42 May 5, 2013 at 11:42 TC-6 May 20, 2013 at 19:00 Jun 2, 2013 at 11:42 TC-7 Jun 2, 2013 at 19:00 Jun 16, 2013 at 11:42 TC-8 Jun 28, 2013 at 19:00 Jul 11, 2013 at 11:42 TC-9 Jul 12, 2013 at 19:00 Jul 21, 2013 at 11:42 Deliveries for AppIng1 students: Stage Kind Launch Submission Supervisor ------------------------------------------------------------------------------------------------- .tig Rush Nov 23, 2012 at 18:42 Nov 25, 2012 at 11:42 PTHL (TC-0) Dec 10, 2012 at 18:42 Dec 23, 2012 at 11:42 TC-1 Feb 11, 2013 at 18:42 Feb 17, 2013 at 11:42 TC-2 Feb 18, 2013 at 18:42 Feb 28, 2013 at 11:42 TC-3 & TC-R Mar 11, 2013 at 18:42 Mar 20, 2013 at 23:42 Some of the noteworthy changes compared to *note Tiger 2014::: TC-0 renamed as PTHL In an effort to emphasize the link between the THL (Formal Languages) lecture and the first stage of the Tiger project, the latter has been renamed as PTHL (“THL Project”). TC-3 is no longer a rush TC-3 has not been a successful step among many students for several years now. It has been deemed by many of them as too complex to be understood and implemented in a couple of days. Therefore we decided to extend the time allotted to this stage so as to give students more chance to pass TC-3. Extension of the mandatory assignment to TC-5 By decision of the department of studies, all Ing1 are required to work on the Tiger project up to TC-5. Subsequent steps remain optional. Use of more C++ 2011 features This year, explicit template instantiation declarations (‘extern template’ clauses) are introduced in the project to control template instantiations in lieu of ‘*.hcc’ files. The set of C++ features used in the Tiger compiler is still supported by both GCC 4.6 and Clang 3.0. 1.4.17 Tiger 2016 ----------------- This is the fifteenth year of the Tiger Project. We have been helped by: Beta testers Anthony Seure, Rémi Weng Assistants Aurélien Baud, Alexis Chotard, Baptiste Covolato, Arnaud Farbos, Laurent Gourvénec, Frédéric Lefort, Vincent Mirzaian-Dehkordi Deliveries for Ing1 students: Stage Kind Launch Submission ----------------------------------------------------------------------- .tig Rush Nov 22, 2013 at 21:00 Nov 24, 2013 at 11:42 PTHL (TC-0) Dec 9, 2013 at 18:42 Dec 22, 2013 at 11:42 TC-1 Rush Feb 17, 2014 at 14:00 Feb 19, 2014 at 23:42 TC-2 Feb 20, 2014 at 09:00 Mar 2, 2014 at 11:42 TC-3 & TC-R Mar 3, 2014 at 19:00 Mar 16, 2014 at 11:42 TC-4 & TC-E Mar 14, 2014 at 19:00 May 4, 2014 at 11:42 TC-5 May 5, 2014 at 19:00 May 24, 2014 at 23:42 TC-6 May 23, 2014 at 19:00 Jun 8, 2014 at 11:42 TC-7 Jun 9, 2014 at 19:00 Jun 22, 2014 at 11:42 TC-8 Jul 7, 2014 at 19:00 Jul 13, 2014 at 11:42 TC-9 Jul 15, 2014 at 10:00 Jul 20, 2014 at 11:42 Deliveries for AppIng1 students: Stage Kind Launch Submission ----------------------------------------------------------------------- .tig Rush Nov 22, 2013 at 21:00 Nov 24, 2013 at 11:42 PTHL (TC-0) Dec 9, 2013 at 18:42 Dec 22, 2013 at 11:42 Some of the noteworthy changes compared to *note Tiger 2015::: Use of even more C++ 2011 features The compiler introduces the following C++ 2011 features: − (standard) smart pointers (‘std::unique_ptr’, ‘std::shared_ptr’); − general-purpose initializer lists; − lambda expressions; − explicit ‘override’s; − template aliases; − new function declarator syntax; − delegating constructors; − non-static data member initializers; − inherited constructors. The whole set of C++ features used in the Tiger compiler is supported by both GCC 4.8 and Clang 3.3. C++ scanner We introduce a C++ scanner this year, still generated by Flex, but implemented as classes. The management of the scanner’s inputs has been improved and responsibilities shared between the scanner and the driver (‘parse::TigerParser’). More Git Usage Starting this year, we deliver code with gaps exclusively through the tc-base public Git repository(1). We no longer provide tarballs nor patches as a means to update students’ code bases. Changes in the language regarding object-oriented constructs The ‘nil’ keyword has been made compatible with objects. Style Many stylistics changes have been performed, mainly to match the EPITA Coding Style. ---------- Footnotes ---------- (1) . 1.4.18 Tiger 2017 ----------------- This is the sixteenth year of the Tiger Project. We have been helped by: Assistants Aurélien Baud, Baptiste Covolato, Pierre De Abreu, Léo Ercolanelli, Arnaud Farbos, Axel Manuel, Vincent Mirzaian-Dehkordi, Matthieu Simon, Jérémie Simon Deliveries for Ing1 students: Stage Kind Launch Submission ----------------------------------------------------------------------- .tig Rush Nov 21, 2014 at 21:00 Nov 23, 2014 at 11:42 PTHL (TC-0) Dec 8, 2014 at 18:42 Dec 21, 2014 at 11:42 TC-1 Rush Feb 4, 2015 at 22:00 Feb 8, 2015 at 11:42 TC-2 Feb 13, 2015 at 22:00 Feb 22, 2015 at 11:42 TC-3 & TC-R Feb 23, 2015 at 22:00 Mar 1, 2015 at 11:42 TC-4 & TC-E Mar 9, 2015 at 19:00 Mar 22, 2015 at 11:42 TC-5 Avr 20, 2015 at 19:00 May 3, 2015 at 11:42 TC-6 May 25, 2015 at 19:00 May 31, 2015 at 11:42 TC-7 Jun 1, 2015 at 19:00 Jun 7, 2015 at 11:42 TC-8 Jun 8, 2015 at 19:00 Jun 14, 2015 at 11:42 TC-9 Jul 6, 2015 at 10:00 Jul 19, 2015 at 11:42 Deliveries for AppIng1 students: Stage Kind Launch Submission ----------------------------------------------------------------------- .tig Rush Nov 21, 2014 at 21:00 Nov 23, 2014 at 11:42 PTHL (TC-0) Dec 8, 2014 at 18:42 Dec 21, 2014 at 11:42 Some of the noteworthy changes compared to *note Tiger 2016::: Use of even more C++ 2011 features The compiler introduces the following C++ 2011 features: − use ‘using’ instead of ‘typedef’; − variadic templates (‘misc::variant’). The C++ features used in the Tiger compiler are supported by both GCC 4.8 and Clang 3.3. Style Many stylistics changes have been performed. TC-Y An ARM back end has been added. All the given code compiles Code given to students compiles even with the ‘// FIXME’ chunks. 1.4.19 Tiger 2018 ----------------- This is the seventeenth year of the Tiger Project. We have been helped by: Assistants Rémi Billon, Pierre-Louis Dagues, Pierre De Abreu, Léo Ercolanelli, Arnaud Gaillard, Axel Manuel, Sébastien Piat, Matthieu Simon, Jérémie Simon, Francis Visoiu Mistrih Deliveries for Ing1 students: Stage Kind Launch Submission ----------------------------------------------------------------------- .tig Rush Nov 20, 2015 at 20:00 Nov 22, 2015 at 11:42 PTHL (TC-0) Dec 7, 2015 at 20:00 Dec 20, 2015 at 11:42 TC-1 Rush Feb 15, 2016 at 20:00 Feb 19, 2016 at 11:42 TC-2 Feb 19, 2016 at 20:00 Feb 28, 2016 at 11:42 TC-3 & TC-R Mar 7, 2016 at 20:00 Mar 20, 2016 at 11:42 TC-4 & TC-E Apr 18, 2016 at 20:00 May 1, 2016 at 11:42 TC-5 May 2, 2016 at 20:00 May 15, 2016 at 11:42 TC-6 May 23, 2016 at 20:00 May 29, 2016 at 11:42 TC-7 May 30, 2016 at 20:00 Jun 5, 2016 at 11:42 TC-8 Jun 6, 2016 at 20:00 Jun 12, 2016 at 11:42 TC-9 Jun 27, 2016 at 20:00 Jul 10, 2016 at 11:42 Some of the noteworthy changes compared to *note Tiger 2017::: ‘type::Type’ visitor Make the ‘type::Type’ class visitable. ‘#pragma once’ Remove the ‘cpp’ guards and replace them with ‘#pragma once’ directives. Use of C++14 Move the standard from C++11 to C++14 since it is fully supported by both GCC 5.0 and Clang 3.4. LLVM translator Add TC-L, a stage for LLVM IR generation. After TC-4, students have two choices: − Continue with *note TC-L::, and stop. − Continue with *note TC-5::, and choose to do TC-backend. Dementors Allow students to fix and push previous stages of TC more often after the final submission. Overfun-object Add support for programs with overload and object. Usable through the new options: • ‘--overfun-object-bindings-compute’ • ‘--overfun-object-types-compute’ • ‘--overfun-object-object-desugar’ 1.4.20 Tiger 2019 ----------------- This is the eighteenth year of the Tiger Project. We have been helped by: Assistants Loïc Banet, Moray Baruh, Rémi Billon, Pierre-Louis Dagues, Arnaud Gaillard, Ashkan Kiaie-Sandjie, Guillaume Marques, Sarasvati Moutoucomarapoule, Cyprien Orfila, Sébastien Piat, Francis Visoiu Mistrih Deliveries for Ing1 students: Stage Kind Launch Submission ----------------------------------------------------------------------- .tig Rush Nov 4, 2016 at 19:00 Nov 6, 2016 at 11:42 PTHL (TC-0) Dec 5, 2016 at 20:00 Dec 18, 2016 at 11:42 TC-1 Rush Jan 30, 2017 at 20:00 Feb 3, 2017 at 11:42 TC-2 Feb 3, 2017 at 20:00 Feb 12, 2017 at 11:42 TC-3 & TC-R Feb 13, 2017 at 20:00 Feb 26, 2017 at 11:42 TC-4 & TC-E Mar 13, 2017 at 20:00 Mar 26, 2017 at 11:42 TC-5 Apr 17, 2017 at 20:00 Apr 30, 2017 at 11:42 TC-6 May 15, 2017 at 20:00 May 21, 2017 at 11:42 TC-7 May 29, 2017 at 20:00 Jun 4, 2017 at 11:42 TC-8 Jun 5, 2017 at 20:00 Jun 11, 2017 at 11:42 TC-9 Jun 26, 2017 at 20:00 Jul 9, 2017 at 11:42 Deliveries for AppIng1 students: Stage Kind Launch Submission ----------------------------------------------------------------------- .tig Rush Nov 4, 2016 at 19:00 Nov 6, 2016 at 11:42 PTHL (TC-0) Dec 5, 2016 at 20:00 Dec 18, 2016 at 11:42 Some of the noteworthy changes compared to *note Tiger 2018::: ast::tasks::the_program Make the_program a smart pointer, removing the ‘--ast-delete’ option. Use of even more C++14 features • use helper type for type_traits • use smart pointers’ make function Style Many stylistics changes have been performed. Debug info Adding support debug information for the Tiger language using LLVM. C++17 Notify future C++17’s changes in comments. Continuous integration Provide a CI for the students. 1.4.21 Tiger 2020 ----------------- This is the nineteenth year of the Tiger Project. We have been helped by: Assistants Loïc Banet, Moray Baruh, Meven Courouble, Maxime Joubert, Ashkan Kiaie-Sandjie, Steven Lariau, Guillaume Marques, Sarasvati Moutoucomarapoule, Cyprien Orfila, Nicolas Poitoux, Loic Reyreaud, Andreas Touly Deliveries for Ing1 students: Stage Kind Launch Submission ----------------------------------------------------------------------- .tig Rush Jan 29, 2018 at 09:00 Jan 31, 2018 at 11:42 TC-0 (PTHL) Jan 29, 2018 at 14:00 Feb 1, 2018 at 19:42 TC-1 Rush Feb 1, 2018 at 20:00 Feb 4, 2018 at 11:42 TC-2 Feb 5, 2018 at 20:00 Feb 25, 2018 at 11:42 TC-3 & TC-R Feb 12, 2018 at 20:00 Mar 11, 2018 at 11:42 TC-4 & TC-E Mar 12, 2018 at 20:00 Mar 25, 2018 at 11:42 TC-5 Apr 16, 2018 at 20:00 Apr 29, 2018 at 11:42 TC-6 May 14, 2018 at 20:00 May 20, 2018 at 11:42 TC-7 May 21, 2018 at 20:00 May 27, 2018 at 11:42 TC-8 Jun 4, 2018 at 20:00 Jun 10, 2018 at 11:42 TC-9 Jun 25, 2018 at 20:00 Jul 8, 2018 at 11:42 Deliveries for AppIng1 students: Stage Kind Launch Submission ----------------------------------------------------------------------- .tig Rush Jan 29, 2018 at 09:00 Jan 31, 2018 at 11:42 TC-0 (PTHL) Jan 29, 2018 at 14:00 Feb 1, 2018 at 19:42 Some of the noteworthy changes compared to *note Tiger 2019::: C++17 • use structured bindings • use std::variant instead of boost::variant • use class template argument deduction • use if(init; condition) callee/caller-save Swap callee-save and caller-save order Desugar Add desugar implementation for ArrayExp during TC-O enum class Replace enums with enum classes _main Ensure _main existence and correct prototype in the AST Metavar Remove MetavarExp and Metavariable AST nodes Namespaces Use nested namespaces Smart pointers Replace some raw pointers with unique_ptr or shared_ptr target::*::rewrite_program Add alternative rewrite_program implementation Setup a debian-sid Dockerfile Provide a docker with requirements to build tc 2 Instructions ************** 2.1 Interactions ================ Bear in mind that if you are writing, it is to be read, so pay attention to your reader. The right place Using mails is almost always wrong: first ask around you, then try to find the assistants in their lab, and finally post into ‘assistants.tiger’. You need to have a very good reason to send a message to the assistants or to Akim and Etienne, as it usually annoys us, which is not in your interest. The newsgroup ‘assistants.tiger’ is dedicated to the Compiler Construction lecture, the Tiger project, and related matters (e.g. assignments in Tiger itself). Any other material is off topic. A meaningful title Find a meaningful subject. Don’t do that Do this -------------------------------------------------------------------------- Problem in TC-1 Cannot generate location.hh make check make check fails on test-ref A legal content Pieces of critical code (e.g., precedence section in the parser, the string handling in the scanner, or whatever _you_ are supposed to find by yourself) are not to be published. This includes the test cases. While posting a simple test case is tolerated, sending many of them, or simply one that addresses a specific common failure (e.g., some obscure cases for escapes) is strictly forbidden. A complete content If you experience a problem that you fail to solve, make a report as complete as possible: include pieces of code (*unless the code is critical and shall not be published*) and the full error message from the compiler/tool. The following text by Simon Tatham is enlightening; its scope goes way beyond the Tiger Project: How to Report Bugs Effectively(1). See also *note How not to go about a programming assignment::, item “Be clever when using electronic mail”. A legible content Use French or English. Epitean is definitely not a language. A pertinent content Trolls are not welcome. ---------- Footnotes ---------- (1) . 2.2 Rules of the Game ===================== As any other assignment, the Tiger Project comes with its rules to follow. -- Rule: Thou Shalt Not Copy Code -- Rule: Thou Shalt Not Possess Thy Neighbor's Code It is _strictly_ forbidden to possess code that is not yours. You are encouraged to work with others, but don’t get a copy of their code. *Note How not to go about a programming assignment::, for more hints on what will _not_ be accepted. -- Rule: Tests are part of the project -- Rule: Do not copy tests or test frame works Test cases and test engines development are parts of the Tiger Project. As such the same rules apply as for code. -- Rule: If something is fishy, say it If something illegal happened in the course of a stage, let us know, arrangements _might_ be possible. If _we_ find out, the rules will be strictly applied. It already happened that third year students have had to redo the Tiger Project because their code was found in another group: -42/20 is seldom benign. -- Rule: Don't hesitate working with other groups Don’t bother everybody instead of trying first. Conversely, once you did your best, don’t hesitate working with others. 2.3 Groups ========== Starting with TC-1, assignments are to be done by groups of three. The first cause of failures to the Tiger project is human problems within the groups. We cannot stress too much the importance of constituting a good group of four people. The Tiger project starts way before your first line of code: it begins with the selection of your partners. Here are a few tips, collected wisdom from the previous failures. _You work for yourself, not for grades_ Yes, we know, when you’re a student grades are what matters. But close your eyes, make a step backwards, and look at yourself for a minute, from behind. You see a student, some sort of a larva, which will turn into a grownup. The larva stage lasts 3 to 4 years, while the hard working social insect is there for 40+ years: a 5% ratio without the internships. Three minutes out of an hour. These years are made to prepare you to the rest of your life, to provide you with what it takes to enjoy a lifelong success in jobs. So don’t waste these three minutes by just cheating, paying little attention to what you are given, or by just waiting for this to end. The opportunity to learn is a unique moment in life: treasure it, even if it hurts, if it’s hard, because you may well regret these three minutes for much of your life. _Start recruiting early_ Making a team is not easy. Take the time to know the people, talk with them, and prepare your group way before beginning the project. The whole TC-0 is a test bed for you to find good partners. _Don’t recruit good lazy friends_ If s/he’s lazy, you’ll have to scold her/him. If s/he’s a friend, that will be hard. Plus it will be even harder to report your problems to us. _Recruit people you can depend on_ Trust should be your first criterion. _Members should have similar programming skills_ _Weak programmers should run away from skilled programmers_ The worst “good idea” is “I’m a poor programmer, I should be in a group of skilled programmers: I will learn a lot from them”. Experience shows this is wrong. What actually happens is as follows. At the first stage, the leader assigns you a task. You try and fail, for weeks. In the meanwhile, the other members teach you lots of facts, but (i) you can’t memorize everything and end up saying “hum hum” without having understood, and (ii) because they don’t understand what you don’t understand, they are often poor teachers. The day before the submission, the leader does your assignments to save the group. You learned nothing, or quite. Second stage: same beginning, you are left with your assignment, but the other members are now bothered by your asking questions: why should they answer, since you don’t understand what they say (remember: they are poor teachers because they don’t understand your problems), and you don’t seem to remember anything! The day before the submission, they do your work. From now on, they won’t even ask you for anything: “fixing” you is much more time consuming than just doing it by themselves. Oral examinations reveal you neither understand nor do anything, hence your grades are bad, and you win another round of first year... Take our advice: if you have difficulties with programming, be with other people like you. Your chances are better together, and anyway you are allowed to ask for assistance from other groups. _Don’t mix repeaters with first year students_ Repeaters have a much better understanding of the project than they think: they know its history, some parts of the code, etc. This will introduce a difference of skills from the beginning, which will remain till the end. It will result in the first year students having not participated enough to learn what was to be learned. _Don’t pick up old code_ This item is especially intended to repeaters: you might be tempted to keep the code from last year, believing this will spare you some work. It may not be so. Indeed, every year the specifications and the provided code change, sometimes with dramatic impact on the whole project. Struggling with an old code base to meet the new standard is a long, error prone, and uninteresting work. You might spend more time trying to preserve your old code than what is actually needed to implement the project from scratch. Not to mention that of course the latter has a much stronger educational impact. _Diagnose and cure drifts_ When a dysfunction appears, fix it, don’t let it grow. For instance, if a member never works in spite of the warnings, don’t cover him: he will have the whole group drown. It usually starts with one member making more work on Tiger, less on the rest of the curriculum, and then he gets tired all the time, with bad mood etc. Don’t walk that way: denounce the problems, send ultimatums to this person, and finally, warn the assistants you need to reconfigure your group. _Reconfigure groups when needed_ Members can leave a group for many reasons: dropped EPITA, dropped Tiger, joined one of the schools’ laboratories, etc. If your group is seriously unbalanced (two skilled people is OK, otherwise be three), ask for a reconfiguration in the news. _Tiger is a part of your curriculum_ Tiger should neither be 0 nor 100% of your curriculum: find the balance. It is not easy to find it, but that’s precisely one thing EPITA teaches: balancing overloads. 2.4 Coding Style ================ This section could have been named “Strong and Weak Requirements”, as it includes not only mandatory features from your compiler (memory management), but also tips and advice. As the captain Barbossa would put it, “actually, it’s more of a guideline than a rule.” 2.4.1 No Draft Allowed ---------------------- The code you deliver _must_ be clean. In particular, when some code is provided, and you have to fill in the blanks denoted by ‘FIXME: Some code has been deleted.’. Sometimes you will have to write the code from scratch. In any case, _dead code and dead comments must be removed_. You are free to leave comments spotting places where you fixed a ‘FIXME:’, but never leave a fixed ‘FIXME:’ in your code. Nor any irrelevant comment. The official compiler for this project, is GNU C++ Compiler, 5.0 or higher (*note GCC::). 2.4.2 Use of Foreign Features ----------------------------- If, and only if, you already have enough fluency in C++ to be willing to try something wilder, then the following exception is made for you. Be warned: along the years the Tiger project was polished to best fit the typical epitean learning curve, trying to escape this curve is also taking a major risk. By the past, some students tried different approaches, and ended with unmaintainable pieces of code. If you _and your group_ are sure you can afford some additional difficulty (for additional benefits), then you may use the following extra tools. _You have to warn the examiners_ that you use these tools. You also have to take care of harnessing ‘configure.ac’ to make sure that what you need is available on the testing environment. Be also aware that you are likely to obtain less help from us if you use tools that we don’t master: You are on your own, but, hey!, that’s what you’re looking for, ain’t it? The Loki Library *Note Modern C++ Design::, for more information about Loki. The Boost Library As provided by the unstable Debian packages ‘libboost-*’. *Note Boost.org::. Any Other Parser or Scanner Generator If you dislike Flex and/or Bison _but you already know how to use them_, then you are welcome to use other technologies. If you think about something not listed here, please send us your proposal; acceptance is required to use them. 2.4.3 File Conventions ---------------------- There are some strict conventions to obey wrt the files and their contents. -- Rule: One class ‘LikeThis’ per files ‘like-this.*’ Each class ‘LikeThis’ is implemented in a single set of file named ‘like-this.*’. Note that the mixed case class names are mapped onto lower case words separated by dashes. There can be exceptions, for instance auxiliary classes used in a single place do not need a dedicated set of files. -- Rule: ‘*.hh’: Declarations The ‘*.hh’ should contain only declarations, i.e., prototypes, ‘extern’ for variables etc. Inlined short methods are accepted when there are few of them, otherwise, create an ‘*.hxx’ file. The documentation should be here too. There is no good reason for huge objects to be defined here. As much as possible, avoid including useless headers (GotW007(1), GotW034(2)): − when detailed knowledge of a class is not needed, instead of #include write // Fwd decl. class Foo; or better yet: use the appropriate ‘fwd.hh’ file (read below). − if you need output streams, then include ‘ostream’, not ‘iostream’. Actually, if you merely need to declare the existence of streams, you might want to include ‘iosfwd’. -- Rule: ‘*.hxx’: Inlined definitions Some definitions should be loaded in different places: templates, inline functions etc. Declare and document them in the ‘*.hh’ file, and implement them in the ‘*.hxx’ file. The ‘*.hh’ file _last_ includes the ‘*.hxx’ file, conversely ‘*.hxx’ _first_ includes ‘*.hh’. Read below. -- Rule: ‘*.cc’: Definitions of functions and variables Big objects should be defined in the ‘*.cc’ file corresponding to the declaration/documentation file ‘*.hh’. There are less clear cut cases between ‘*.hxx’ and ‘*.cc’. For instance short but time consuming functions should stay in the ‘*.cc’ files, since inlining is not expected to speed up significantly. As another example features that require massive header inclusions are better defined in the ‘*.cc’ file. As a concrete example, consider the ‘accept’ methods of the AST classes. They are short enough to be eligible for an ‘*.hxx’ file: void LetExp::accept(Visitor& v) { v(*this); } We will leave them in the ‘*.cc’ file though, since this way only the ‘*.cc’ file needs to load ‘ast/visitor.hh’; the ‘*.hh’ is kept short, both directly (its contents) and indirectly (its includes). -- Rule: Explicit template instantiation There are several strategies to compile templates. The most common strategy consists in leaving the code in a ‘*.hxx’ file, and letting every user of the class template instantiate the code. While correct, this approach has several drawbacks: − Because the ‘*.hh’ file includes the ‘*.hxx’ file, each time a simple declaration of a template is needed, the full implementation comes with it. And if the implementation requires other declarations such as ‘std::iostream’, you force all the client code to parse the ‘iostream’ header! − The instantiation is performed several times, which is time and space consuming. − The dependencies are tight: the clients of the template depend upon its implementation. To circumvent these problems, we may control template instantiations using _explicit template instantiation definitions_ (available since C++ 1998) and _declarations_ (introduced by C++ 2011). This mechanism is compatible with the way templates are usually handled in the Tiger compiler, i.e., where both template declarations _and_ definitions are accessible from the included header, though often indirectly (see above). We use the following two-fold strategy: − First, we add an explicit template _definition_ in the implementation file of the template’s client (for instance ‘temp/temp.cc’) to instruct the compiler that we want to instantiate a template (e.g. ‘misc::endo_map’) for a given (set of) parameter(s) (e.g. ‘temp::Temp’) in this compilation unit (‘temp/temp.o’). This explicit template definition is performed using a ‘template’ clause. /** ** \file temp/temp.cc ** \brief temp::Temp. */ #include // ... namespace misc { // Explicit template instantiation definition to generate the code. template class endo_map; } File 2.1: ‘temp.cc’ − Then, we _block_ the automatic (implicit) instantiation of the template for this (set of) parameter(s), which would otherwise be triggered by default by the compiler when the implementation of the template is made available to it—which is the case in our example, since the header of the template (‘misc/endomap.hh’) also includes its implementation (‘misc/endomap.hxx’). To do so, we add an explicit template instantiation _declaration_ matching the previous explicit template definition, using an ‘extern template’ clause. /** ** \file temp/temp.hh ** \brief Fresh temps. */ #pragma once #include namespace temp { struct Temp { /* ... */ }; } // ... namespace misc { // Explicit template instantiation declaration. extern template class endo_map; } File 2.2: ‘temp.hh’ Any translation unit containing this explicit declaration will not generate this very template instantiation, unless an explicit definition is seen (in our case, this will happen within ‘temp/temp.cc’ only). You will notice that both the approach and the syntax used here recall the ones used to declare and define global variables in C and C++. We can further improve the previous design by factoring explicit instantiation code using the preprocessor. /** ** \file temp/temp.hh ** \brief Fresh temps. */ #pragma once #include #ifndef MAYBE_EXTERN # define MAYBE_EXTERN extern #endif namespace temp { struct Temp { /* ... */ }; } // ... namespace misc { // Explicit template instantiation declaration. MAYBE_EXTERN template class endo_map; } File 2.3: ‘temp-factored.hh’ /** ** \file temp/temp.cc ** \brief temp::Temp. */ #define MAYBE_EXTERN #include #undef MAYBE_EXTERN // ... File 2.4: ‘temp-factored.cc’ Explicit template instantiation declarations (not definitions) are only available since C++ 2011. Before that, we used to introduce a fourth type of file, ‘*.hcc’: files that had to be compiled once for each concrete template parameter. -- Rule: Guard included files (‘*.hh’ & ‘*.hxx’) Use the ‘#pragma once’ directive to ensure the contents of a file is read only once. This is critical for ‘*.hh’ and ‘*.hxx’ files that include one another. One typically has: /** ** \file sample/sample.hh ** \brief Declaration of sample::Sample. **/ #pragma once // ... #include File 2.5: ‘sample/sample.hh’ /** ** \file sample/sample.hxx ** \brief Inlined definition of sample::Sample. **/ #pragma once #include // ... File 2.6: ‘sample/sample.hxx’ -- Rule: ‘fwd.hh’: forward declarations Dependencies can be a major problem during big project developments. It is not acceptable to “recompile the world” when a single file changes. To fight this problem, you are encouraged to use ‘fwd.hh’ files that contain simple forward declarations. Everything that defeat the interest of ‘fwd.hh’ file must be avoided, e.g., including actual header files. These forward files should be included by the ‘*.hh’ instead of more complete headers. The expected benefit is manifold: − A forward declaration is much shorter. − Usually actual definitions rely on other classes, so other ‘#include’s etc. Forward declarations need nothing. − While it is not uncommon to change the interface of a class, changing its name is infrequent. Consider for example ‘ast/visitor.hh’, which is included directly or indirectly by many other files. Since it needs a declaration of each AST node one could be tempted to use ‘ast/all.hh’ which includes virtually all the headers of the ‘ast’ module. Hence all the files including ‘ast/visitor.hh’ will bring in the whole ‘ast’ module, where the much shorter and much simpler ‘ast/fwd.hh’ would suffice. Of course, usually the ‘*.cc’ files need actual definitions. -- Rule: Module, namespace, and directory ‘likethis’ The compiler is composed of several modules that are dedicated to a set of coherent specific tasks (e.g., parsing, AST handling, register allocation etc.). A module name is composed of lower case letters exclusively, ‘likethis’, not ‘like_this’ nor ‘like-this’. This module’s files are stored in the directory with the same name, which is also that of the namespace in which all the symbols are defined. Contrary to file names, we do not use dashes to avoid clashes with Swig and ‘namespace’. -- Rule: ‘libMODULE.*’: Pure interface The interface of the MODULE module contains only _pure_ functions: these functions should not depend upon globals, nor have side effects of global objects. Global variables are forbidden here. -- Rule: ‘tasks.*’: Impure interface Tasks are _the_ place for side effects. That’s where globals such as the current AST, the current assembly program, etc., are defined and modified. ---------- Footnotes ---------- (1) . (2) . 2.4.4 Name Conventions ---------------------- -- Rule: Stay out of reserved names The standard reserves a number of identifier classes, most notably ‘_*’ [17.4.3.1.2]: Each name that begins with an underscore is reserved to the implementation for use as a name in the global namespace. Using ‘_*’ is commonly used for CPP guards (‘_FOO_HH_’), private members (‘_foo’), and internal functions (‘_foo ()’): don’t. -- Rule: Name your classes ‘LikeThis’ Class should be named in mixed case; for instance ‘Exp’, ‘StringExp’, ‘TempMap’, ‘InterferenceGraph’ etc. This applies to class templates. *Note CStupidClassName::. -- Rule: Name public members ‘like_this’ No upper case letters, and words are separated by an underscore. -- Rule: Name private/protected members ‘like_this_’ It is extremely convenient to have a special convention for private and protected members: you make it clear to the reader, you avoid gratuitous warnings about conflicts in constructors, you leave the “beautiful” name available for public members etc. We used to write ‘_like_this’, but this goes against the standard, see *note Stay out of reserved names::. For instance, write: class IntPair { public: IntPair(int first, int second) : first_(first) , second_(second) { } protected: int first_, second_; } *Note CStupidClassName::. -- Rule: Name your ‘using’ type alias ‘FOO_type’ When declaring a ‘using’ type alias, name the type ‘FOO_type’ (where FOO is obviously the part that changes). For instance: using map_type = std::map; using symtab_type = std::list; We used to use ‘FOO_t’, unfortunately this (pseudo) name space is reserved by POSIX. -- Rule: Name the parent class ‘super_type’ It is often handy to define the type of “the” super class (when there is a single one); use the name ‘super_type’ in that case. For instance most Visitors of the AST start with: class TypeChecker: public ast::DefaultVisitor { public: using super_type = ast::DefaultVisitor; using super_type::operator(); // ... (Such ‘using’ clauses are subject to the current visibility modifier, hence the ‘public’ beforehand.) -- Rule: Hide auxiliary classes Hide auxiliary/helper classes (i.e., classes private to a single compilation unit, not declared in a header) in functions, or in an anonymous namespace. Instead of: struct Helper { ... }; void doit() { Helper h; ... } write: namespace { struct Helper { ... }; } void doit() { Helper h; ... } or void doit() { struct Helper { ... } h; ... } The risk otherwise is to declare two classes with the same name: the linker will ignore one of the two silently. The resulting bugs are often difficult to understand. 2.4.5 Use of C++ Features ------------------------- -- Rule: Hunt Leaks Use every possible means to release the resources you consume, especially memory. Valgrind can be a nice assistant to track memory leaks (*note Valgrind::). To demonstrate different memory management styles, you are invited to use different features in the course of your development: proper use of destructors for the AST, use of a factory for ‘symbol’, ‘Temp’ etc., use of ‘std::unique_ptr’ starting with the ‘Translate’ module, and finally use of reference counting via smart pointers for the intermediate representation. -- Rule: Hunt code duplication Code duplication is your enemy: the code is less exercised (if there are two routines instead of one, then the code is run half of the time only), and whenever an update is required, you are likely to forget to update all the other places. Strive to prevent code duplication from sneaking into your code. Every C++ feature is good to prevent code duplication: inheritance, templates etc. -- Rule: Prefer ‘dynamic_cast’ of references Of the following two snippets, the first is preferred: const IntExp& ie = dynamic_cast(exp); int val = ie.value_get(); const IntExp* iep = dynamic_cast(&exp); assert(iep); int val = iep->value_get(); While upon type mismatch the second ‘abort’s, the first throws a ‘std::bad_cast’: they are equally safe. -- Rule: Use virtual methods, not type cases Do not use type cases: if you want to dispatch by hand to different routines depending upon the actual class of objects, you probably have missed some use of virtual functions. For instance, instead of bool compatible_with(const Type& lhs, const Type& rhs) { if (&lhs == &rhs) return true; if (dynamic_cast(&lhs)) if (dynamic_cast(&rhs)) return true; if (dynamic_cast(&rhs)) if (dynamic_cast(&lhs)) return true; return false; } write bool Record::compatible_with(const Type& rhs) { return &rhs == this || dynamic_cast(&rhs); } bool Nil::compatible_with(const Type& rhs) { return dynamic_cast(&rhs); } -- Rule: Use ‘dynamic_cast’ for type cases Did you read the previous item, “Use virtual methods, not type cases”? If not, do it now. If you really _need_ to write type dispatching, carefully chose between ‘typeid’ and ‘dynamic_cast’. In the case of ‘tc’, where we sometimes need to down cast an object or to check its membership to a specific subclass, we don’t need ‘typeid’, so use ‘dynamic_cast’ only. They address different needs: ‘dynamic_cast’ for (sub-)membership, ‘typeid’ for exact type The semantics of testing a ‘dynamic_cast’ vs. a comparison of a ‘typeid’ are not the same. For instance, think of a class ‘A’ with subclass ‘B’ with subclass ‘C’; then compare the meaning of the following two snippets: // Is `a' containing an object of exactly the type B? bool test1 = typeid(a) == typeid(B); // Is `a' containing an object of type B, or a subclass of B? bool test2 = dynamic_cast(&a); Non polymorphic entities ‘typeid’ works on hierarchies without ‘vtable’, or even builtin types (‘int’ etc.). ‘dynamic_cast’ requires a dynamic hierarchy. Beware of ‘typeid’ on static hierarchies; for instance consider the following code, courtesy from Alexandre Duret-Lutz: #include struct A { // virtual ~A() {}; }; struct B: A { }; int main() { A* a = new B; std::cout << typeid(*a).name() << std::endl; } it will “answer” that the ‘typeid’ of ‘*a’ is ‘A’(!). Using ‘dynamic_cast’ here will simply not compile(1). If you provide ‘A’ with a virtual function table (e.g., uncomment the destructor), then the ‘typeid’ of ‘*a’ is ‘B’. Compromising the future for the sake of speed Because the job performed by ‘dynamic_cast’ is more complex, it is also significantly slower that ‘typeid’, but hey! better slow and safe than fast and furious. You might consider that today, a strict equality test of the object’s class is enough and faster, but can you guarantee there will never be new subclasses in the future? If there will be, code based ‘dynamic_cast’ will probably behave as expected, while code based ‘typeid’ will probably not. More material can be found the chapter 8 of *note Thinking in C++ Volume 2::: Run-time type identification(2). -- Rule: Use const references in arguments to save copies (EC22) We use const references in arguments (and return value) where otherwise a passing by value would have been adequate, but expensive because of the copy. As a typical example, accessors ought to return members by const reference: const Exp& OpExp::lhs_get() const { return lhs_; } Small entities can be passed/returned by value. -- Rule: Use references for aliasing When you need to have several names for a single entity (this is the definition of “aliasing”), use references to create aliases. Note that passing an argument to a function for side effects is a form of aliasing. For instance: template void swap(T& a, T& b) { T c = a; a = b; b = c; } -- Rule: Use pointers when passing an object together with its management When an object is created, or when an object is _given_ (i.e., when its owner leaves the management of the object’s memory to another entity), use pointers. This is consistent with C++: ‘new’ creates an object, returns it together with the responsibility to call ‘delete’: it uses pointers. For instance, note the three pointers below, one for the return value, and two for the arguments: OpExp* opexp_builder(OpExp::Oper oper, Exp* lhs, Exp* rhs) { return new OpExp(oper, lhs, rhs); } -- Rule: Avoid static class members (EC47) More generally, “Ensure that non-local static objects are initialized before they’re used”, as reads the title of EC47. Non local static objects (such as ‘std::cout’ etc.) are initialized by the C++ system even before ‘main’ is called. Unfortunately there is no guarantee on the order of their initialization, so if you happen to have a static object which initialization depends on that of another object, expect the worst. Fortunately this limitation is easy to circumvent: just use a simple Singleton implementation, that relies on a _local_ static variable. This is covered extensively in EC47. -- Rule: Use ‘foo_get’, not ‘get_foo’ Accessors have standardized names: ‘foo_get’ and ‘foo_set’. There is an alternative attractive standard, which we don’t follow: class Class { public: int foo(); void foo(int foo); private: int foo_; } or even class Class { public: int foo(); Class& foo(int foo); // Return *this. private: int foo_; } which enables idioms such as: { Class obj; obj.foo(12) .bar(34) .baz(56) .qux(78) .quux(90); } -- Rule: Use ‘dump’ as a member function returning a stream You should always have a means to print a class instance, at least to ease debugging. Use the regular ‘operator<<’ for standalone printing functions, but ‘dump’ as a member function. Use this kind of prototype: std::ostream& TREE::dump(std::ostream& ostr [, ...]) const where the ellipsis denote optional additional arguments. ‘dump’ returns the stream. ---------- Footnotes ---------- (1) For instance, ‘g++’ reports an ‘error: cannot dynamic_cast `a' (of type `struct A*') to type `struct B*' (source type is not polymorphic)’. (2) . 2.4.6 Use of STL ---------------- -- Rule: Specify comparison types for associative containers of pointers (ES20) For instance, instead of declaring using temp_set_type = std::set; declare /// Object function to compare two Temp*. struct temp_compare { bool operator()(const Temp* s1, const Temp* s2) const { return *s1 < *s2; } }; using temp_set_type = std::set; temp_set_type my_set; Or, using C++11 lambdas: /// Lambda to compare two Temp*. auto temp_compare = [](const Temp* s1, const Temp* s2) { return *s1 < *s2; }; using temp_set_type = std::set; temp_set_type my_set{temp_compare}; Scott Meyers mentions several good reasons, but leaves implicit a very important one: if you don’t, since the outputs will be based on the order of the pointers in memory, and since (i) this order may change if your allocation pattern changes and (ii) this order depends of the environment you run, then _you cannot compare outputs (including traces)_. Needless to say that, at least during development, this is a serious misfeature. -- Rule: Prefer standard algorithms to hand-written loops (ES43) Using ‘for_each’, ‘find’, ‘find_if’, ‘transform’ etc. is preferred over explicit loops. This is for (i) efficiency, (ii) correctness, and (iii) maintainability. Knowing these algorithms is mandatory for who claims to be a C++ programmer. -- Rule: Prefer member functions to algorithms with the same names (ES44) For instance, prefer ‘my_set.find(my_item)’ to ‘find (my_item, my_set.begin(), my_set.end())’. This is for efficiency: the former has a logarithmic complexity, versus... linear for the latter! You may find the Item 44 of Effective STL(1) on the Internet. ---------- Footnotes ---------- (1) . 2.4.7 Matters of Style ---------------------- The following items are more a matter of style than the others. Nevertheless, you are asked to follow this style. -- Rule: 80 columns maximum Stick to 80 column programming. As a matter of fact, stick to 76 or 78 columns most of the time, as it makes it easier to keep the diffs within the limits. And if you post/mail these diffs, people are likely to reply to the message, hence the suggestion of 76 columns, as for emails. -- Rule: Order class members by visibility first When declaring a class, start with public members, then protected, and last private members. Inside these groups, you are invited to group by category, i.e., methods, types, and members that are related should be grouped together. The motivation is that private members should not even be visible in the class declaration (but of course, it is mandatory that they be there for the compiler), and therefore they should be “hidden” from the reader. This is an example of what should *not* be done: class Foo { public: Foo(std::string, int); virtual ~Foo(); private: using string_type = std::string; public: std::string bar_get() const; void bar_set(std::string); private: string_type bar_; public: int baz_get() const; void baz_set(int); private: int baz_; } rather, write: class Foo { public: Foo(std::string, int); virtual ~Foo(); std::string bar_get() const; void bar_set(std::string); int baz_get() const; void baz_set(int); private: using string_type; = std::string string_type bar_; int baz_; } and add useful Doxygen comments. -- Rule: Keep superclasses on the class declaration line When declaring a derived class, try to keep its list of superclasses on the same line. Leave a space at least on the right hand side of the colon. If there is not enough room to do so, leave the colon on the class declaration line (the opposite applies for constructor, see *note Put initializations below the constructor declaration::). class Derived: public Base { // ... }; /// Object function to compare two Temp*. struct temp_ptr_less { bool operator()(const Temp* s1, const Temp* s2) const; }; -- Rule: Don't use ‘inline’ in declarations Use ‘inline’ in implementations (i.e., ‘*.hxx’, possibly ‘*.cc’)), not during declarations (‘*.hh’ files). -- Rule: Use ‘override’ If a method was once declared ‘virtual’, it remains virtual, there is no need to repeat it. However, be sure to explicitly mark it as ‘override’ so that your compiler can verify it. class Base { public: // ... virtual void foo() = 0; }; class Derived: public Base { public: // ... void foo() override; }; -- Rule: Pointers and references are part of the type Pointers and references are part of the type, and should be put near the type, not near the variable. int* p; // not `int *p;' list& l; // not `list &l;' void* magic(); // not `void *magic();' -- Rule: Do not declare many variables on one line Use int* p; int* q; instead of int *p, *q; The former declarations also allow you to describe each variable. -- Rule: Leave no space between template name and effective parameters Write std::list l; std::pair, int> p; with a space after the comma. There is no need for a space between two closing ‘>’ (since C++ 2011): std::list> ls; These rules apply for casts: // Come on baby, light my fire. int* p = static_cast(42); -- Rule: Leave one space between TEMPLATE and formal parameters Write template struct pair; with one space separating the keyword ‘template’ from the list of formal parameters. -- Rule: Leave no space between a function name and its argument(s), either formal or actual int foo(int n) { return bar(n); } The ‘()’ operator is not a list of arguments. class Foo { public: Foo(); virtual ~Foo(); bool operator()(int n); }; -- Rule: Put initializations below the constructor declaration Don’t put or initializations or constructor invocations on the same line as you declare the constructor. As a matter of fact, don’t even leave the colon on that line. Instead of ‘A::A(): B(), C()’, write either: A::A() : B() , C() { } or A::A() : B(), C() { } The rationale is that the initialization belongs more to the body of the constructor than its signature. And when dealing with exceptions leaving the colon above would yield a result even worse than the following. A::A() try : B() , C() { } catch (...) { } 2.4.8 Documentation Style ------------------------- -- Rule: Write correct English Nowadays most editors provide interactive spell checking, including for sources (strings and comments). For instance, see ‘flyspell-mode’ in Emacs, and in particular the ‘flyspell-prog-mode’. To trigger this automatically, install the following in your ‘~/.emacs.el’: (add-hook 'c-mode-hook 'flyspell-prog-mode 1) (add-hook 'c++-mode-hook 'flyspell-prog-mode 1) (add-hook 'cperl-mode-hook 'flyspell-prog-mode 1) (add-hook 'makefile-mode-hook 'flyspell-prog-mode 1) (add-hook 'python-mode-hook 'flyspell-prog-mode 1) (add-hook 'sh-mode-hook 'flyspell-prog-mode 1) and so forth. End comments with a period. -- Rule: Be concise For documentation as for any other kind of writing, the shorter, the better: hunt useless words. *Note The Elements of Style::, for an excellent set of writing guidelines. Here are a few samples of things to avoid: Don’t document the definition instead of its object Don’t write: /// Declaration of the Foo class. class Foo { ... }; Of course you’re documenting the definition of the entities! “Declaration of the” is totally useless, just use ‘/// Foo class’. But read bellow. Don’t qualify obvious entity kinds Don’t write: /// Foo class. class Foo { public: /// Construct a Foo object. Foo(Bar& bar) ... }; It is so obvious that you’re documenting the class and the constructor that you should not write it down. Instead of documenting the _kind_ of an entity (class, function, namespace, destructor...), document its _goal_. /// Wrapper around Bar objects. class Foo { public: /// Bind to \a bar. Foo(Bar& bar) ... }; -- Rule: Use the Imperative Use the imperative when documenting, as if you were giving order to the function or entity you are describing. When describing a function, there is no need to repeat “function” in the documentation; the same applies obviously to any syntactic category. For instance, instead of: /// \brief Swap the reference with another. /// The method swaps the two references and returns the first. ref& swap(ref& other); write: /// \brief Swap the reference with another. /// Swap the two references and return the first. ref& swap(ref& other); The same rules apply to ChangeLogs. -- Rule: Use ‘rebox.el’ to mark up paragraphs Often one wants to leave a clear markup to separate different matters. For declarations, this is typically done using the Doxygen ‘\name ... \{ ... \}’ sequence; for implementation files use ‘rebox.el’ (*note rebox.el::). -- Rule: Write Documentation in Doxygen Documentation is a genuine part of programming, just as testing. We use Doxygen (*note Doxygen::) to maintain the developer documentation of the Tiger Compiler. The quality of this documentation can change the grade. Beware that Doxygen puts the first letter of documentation in upper case. As a result, /// \file ast/arrayexp.hh /// \brief ast::ArrayExp declaration. will not work properly, since Doxygen will transform ‘ast::ArrayExp’ into ‘Ast::ArrayExp’, which will not be recognized as an entity name. As a workaround, write the slightly longer: /// \file ast/arrayexp.hh /// \brief Declaration of ast::ArrayExp. Of course, Doxygen documentation is not appropriate everywhere. -- Rule: Document namespaces in ‘lib*.hh’ files -- Rule: Document classes in their ‘*.hh’ file There must be a single location, that’s our standard. -- Rule: Use ‘\directive’ Prefer backslash (‘\’) to the commercial at (‘@’) to specify directives. -- Rule: Prefer C Comments for Long Comments Prefer C comments (‘/** ... */’) to C++ comments (‘/// ...’). This is to ensure consistency with the style we use. -- Rule: Prefer C++ Comments for One Line Comments Because it is lighter, instead of /** \brief Name of this program. */ extern const char* program_name; prefer /// Name of this program. extern const char* program_name; For instance, instead of /* Construct an InterferenceGraph. */ InterferenceGraph(const std::string& name, const assem::instrs_t& instrs, bool trace = false); or /** @brief Construct an InterferenceGraph. ** @param name its name, hopefully based on the function name ** @param instrs the code snippet to study ** @param trace trace flag **/ InterferenceGraph(const std::string& name, const assem::instrs_t& instrs, bool trace = false); or /// \brief Construct an InterferenceGraph. /// \param name its name, hopefully based on the function name /// \param instrs the code snippet to study /// \param trace trace flag InterferenceGraph(const std::string& name, const assem::instrs_t& instrs, bool trace = false); write /** \brief Construct an InterferenceGraph. \param name its name, hopefully based on the function name \param instrs the code snippet to study \param trace trace flag */ InterferenceGraph(const std::string& name, const assem::instrs_t& instrs, bool trace = false); 2.5 Tests ========= As stated in *note Rules of the Game::, writing a test framework and tests is part of the exercise. As a starting point, we provide a tarball containing a few Tiger files, see *note Given Test Cases::. *They are not enough*: your test suite should be continually expanding. 2.5.1 Writing Tests ------------------- In three occasions tests are “easy” to write: − The specifications of the language are a fine source for many tests. For instance the specification of integer literals show several cases to exercise. − If your compiler crashes or fails, before even trying to fix it, include the test case in your test suite. − If you are developing a component for the compiler, you can certainly feel the weak points. Immediately write a test for these. *Note Testing student-made compilers::, for many hints on what tests you need to write. 2.5.2 Generating the Test Driver -------------------------------- Unless your whole test infrastructure is embedded in a single file (which is not a good idea), we advise you to generate any script used to run your tests so that they can be run from a directory other than the source directory where they reside. This is especially useful to maintain several builds (e.g. with different compilers or compiler flags) in parallel (see the section on VPATH Builds(1) in Automake’s manual) and when running ‘make distcheck’ (see the section on Checking the Distribution(2)), as source and build directories are distinct in these circumstances. The simplest way to generate a script is to rely on ‘configure’. For instance, the following line in ‘configure.ac’ generates a script ‘tests/testsuite’ from the input ‘tests/testsuite.in’, while performing variables substitutions (in particular ‘@srcdir@’ and similar variables): AC_CONFIG_FILES([tests/testsuite], [chmod a=rx tests/testsuite]) The template file ‘tests/testsuite.in’ can then leverage this information to find data in the source directory. E.g., if tests are located in the ‘tests/’ subdirectory of the top source directory, the beginning of ‘tests/testsuite.in’ might look like this: #! /bin/sh # @configure_input@ # Where the tests can be found. testdir="@abs_top_srcdir@/tests" # ... Another strategy to generate scripts is to use ‘make’, as suggested by Autoconf’s manual (see the section on Installation Directory Variables(3)). ---------- Footnotes ---------- (1) . (2) . (3) . 2.6 Submission ============== We use two kinds of project submissions in the project. − For PTHL (*note PTHL (TC-0)::), your _sources_ must be pushed through the ‘master’ branch to the central Git repository at submission time. Follow the instructions given by the teaching assistants. − From TC-1 on, your code must still be submitted through git, following indications given on the assistants’s intranet. However, the assistants "moulinette" will use the _tarball_ built by the command ‘make distcheck’. Be sure that the created tarball has a correct name. If ‘bardec_f’ is the head of your group, the tarball must be ‘bardec_f-tc-N.tar.bz2’ where N is the number of the “release” (*note Package Name and Version::). The following commands must work properly: $ bunzip2 -cd bardec_f-tc-N.tar.bz2 | tar xvf - $ cd bardec_f-tc-N $ export CC=gcc+-5.0 CXX=g++-5.0 $ mkdir _build $ cd _build $ ../configure $ make $ src/tc /tmp/test.tig $ make distcheck For more information on the tools, see *note The GNU Build System::, *note GCC::. 2.7 Evaluation ============== Some stages are evaluated only by a program, and others are evaluated both by humans, and a program. 2.7.1 Automated Evaluation -------------------------- Each stage of the compiler will be evaluated by an automatic corrector. Soon after your work is submitted, the logs are available on the assistants’ intranet. Automated evaluation enforces the requirements: you _must_ stick to what is being asked. For instance, for TC-E it is explicitly asked to display something like: var /* escaping */ i : int := 2 so if you display any of the following outputs var i : int /* escaping */ := 2 var i /* escaping */ : int := 2 var /* Escapes */ i : int := 2 be sure to fail all the tests, even if the computation is correct. 2.7.2 During the Examination ---------------------------- When you are defending your projects, here are a few rules to follow: _Don’t talk_ Don’t talk unless you are asked to: when a person is asked a question, s/he is the only one to answer. You must not talk to each other either: often, when one cannot answer a question, the question is asked to another member. It is then obvious why the members of the group shall not talk. _Don’t touch the screen_ Don’t touch my display! You have nice fingers, but I don’t need their prints on my screen. _Tell the truth_ If there is something the examiner must know (someone did not work on the project at all, some files are coming from another group etc.), _say it immediately_, for, if we discover that by ourselves, you will be severely sanctioned. _Learn_ It is explicitly stated that you _can_ not have worked on a stage _provided this was an agreement with the group_. But it is also explicitly stated that _you must have learned what was to be learned from that compiler stage_, which includes C++ techniques, Bison and Flex mastering, object oriented concepts, design patterns and so forth. _Complain now!_ If you don’t agree with the notation, say it immediately. Private messages about “this is unfair: I worked much more than bardec_f but his grade is better than mine” are _thrown away_. Conversely, there is something we wish to make clear: examiners will probably be harsh (maybe even very harsh), but this does not mean they disrespect you, or judge you badly. You are here to defend your project and knowledge, they are here to stress them, to make sure they are right. Learning to be strong under pressure is part of the exercise. Don’t burst into tears, react! Don’t be shy, that’s not the proper time: you are selling them something, and they will never buy something from someone who cries when they are criticizing his product. You should also understand that human examination is the moment where we try to evaluate who, or what group, needs help. We are here to diagnose your project and provide solutions to your problems. If you know there is a problem in your project, but you failed to fix it, tell it to the examiner! _Work with her/him_ to fix your project. 2.7.3 Human Evaluation ---------------------- The point of this evaluation is to measure, among other things: the quality of the code How clean it is, amount of code duplication, bad hacks, standards violations (e.g., ‘stderr’ is forbidden in proper C++ code) and so forth. It also aims at detecting cheaters, who will be severely punished (mark = -42). the knowledge each member acquired While we do not require that each member worked on a stage, we do require that each member (i) knows how the stage works and (ii) has perfectly understood the (C++, Bison etc.) techniques needed to implement the stage. Each stage comes with a set of goals (see *note PTHL Goals::, for instance) on which you will be interrogated. *Examiners: the human grade.* The examiner should not take (too much) the automated tests into account to decide the mark: the mark is computed later, taking this into account, so don’t do it twice. *Examiners: broken tarballs.* If you fixed the tarball or made whatever modification, _run_ ‘make distcheck’ again, and update the delivered tarball. Do not keep old tarballs, do not install them in a special place: just replace the first tarball with it, but say so in the ‘eval’ file. The rationale is simple: only tarballs pass the tests, and every tarball must be able to pass the tests. If you don’t do that, then someone else will have to do it again. 2.7.4 Marks Computation ----------------------- Because the Tiger Compiler is a project with stages, the computation of the marks depends on the stages too. To spell it out explicitly: _A stage is penalized by bad results on tests performed for previous stages._ It means, for instance, that a TC-3 compiler will be exercised on TC-1, TC-2, and TC-3. If there are still errors on TC-1 and TC-2 tests, they will pessimize the result of TC-3 tests. The older the errors are, the more expensive they are. 3 Source Code ************* 3.1 Given Code ============== Starting with TC-1, code with gaps is provided through the tc-base public Git repository(1). We used to provide code through tarballs and patches before, but we only rely on Git now. This approach is the best one, as ‘git merge’ is arguably simpler than ‘patch’ and has other advantages (like preserving the execution bit of scripts, identifying the origin of every line of code using ‘git blame’, etc.). Each commit containing the contents of a new stage is labeled with a ‘CLASS-tc-base-X.Y’ tag. Here is the recommended strategy to use this repository. 1. At TC-1, subscribe to the repository, fetch its contents and integrate the given code using ‘git merge’ with the commit labeled ‘2020-tc-base-1.0’ into your ‘master’ branch: $ git remote add tc-base https://gitlab.lrde.epita.fr/tiger/tc-base.git $ git fetch tc-base $ git merge 2020-tc-base-1.0 Fix the conflicts and record the merge commit: $ git add src/tc.cc ... $ git commit 2. For any subsequent stage ‘m’, all you will need to do is fetch the new commits from the ‘tc-base’ repository and merge the code given at stage ‘m’ into yours (and of course, fix the conflicts). E.g.: $ git fetch tc-base $ git merge 2020-tc-base-M.0 ---------- Footnotes ---------- (1) . 3.2 Project Layout ================== This section describes the _mandatory_ layout of the package. 3.2.1 The Top Level ------------------- ‘AUTHORS.txt’ In the top level of the distribution, there must be a file ‘AUTHORS.txt’ which contents is as follows: Fabrice Bardèche Jean-Paul Sartre Jean-Paul Deux Jean-Paul Belmondo The group leader is first. Do not include emails other than those of EPITA. We repeat: give the ‘login@epita.fr’ address. Starting from TC-1, the file ‘AUTHORS.txt’ is distributed thanks to the ‘EXTRA_DIST’ variable in the top-level ‘Makefile.am’, but pay attention to the spelling. ‘ChangeLog’ Optional. The list of the changes made in the compiler, with the dates and names of the people who worked on it. See the Emacs key binding ‘C-x 4 a’. ‘README.txt’ Various free information. ‘NEWS.txt’ Optional. Summary of changes introduced by each release. ‘lib/’ This directory contains helping tools, that are not specific to the project. ‘src/’ All the sources are in this directory. ‘tests/’ Your own test suite. You should make it part of the project, and ship it like the rest of the package. Actually, it is abnormal not to have a test suite here. 3.2.2 The ‘build-aux’ Directory ------------------------------- -- File: bison++.in (build-aux/bin) This is a wrapper around Bison, tailored to produce C++ parsers. Compared to ‘bison’, ‘bison++’ updates the output files only if changed. For a file such as ‘location.hh’, virtually included by the whole front-end, this is a big win. Also, ‘bison’ outputs ‘\file location.hh’ in Doxygen documentation, which clashes with ‘ast/location.hh’. ‘bison++’ changes this into ‘\file parse/location.hh’. -- File: flex++.in (build-aux/bin) A wrapper around Flex, to simplify and improve the generation of C++ scanners. -- File: monoburg++.in (build-aux/bin) Likewise for MonoBURG. -- File: rebox.el (build-aux/) This file provides two new Emacs functions, ‘M-x rebox-comment’ and ‘M-x rebox-region’. They build and maintain nice looking boxed comments in most languages. Once installed (read it for instructions), write a simple comment such as: // Comments end with a period. then move your cursor into this comment and press ‘C-u 2 2 3 M-q’ to get: /*-----------------------------. | Comments end with a period. | `-----------------------------*/ ‘2 2 3’ specifies the style of the comment you want to build. Once the comment built, ‘M-q’ suffices to refill it. Run ‘C-u - M-q’ for an interactive interface. -- File: tiger.el (build-aux) -- File: panther.el (build-aux) Theses files provide Emacs major modes for Tiger programs (‘*.tig’) and Panther (“object-less” Tiger) programs (‘*.pan’ files). Read them to get installation instructions. -- File: tiger-ftdetect.vim (build-aux) -- File: tiger-syntax.vim (build-aux) Vim scripts to detect and enable syntax hilighting for Tiger files. 3.2.3 The ‘lib’ Directory ------------------------- 3.2.4 The ‘lib/misc’ Directory ------------------------------ Convenient C++ tools. -- File: contract.* (lib/misc/) A useful improvement over ‘cassert’. -- File: error.* (lib/misc/) The class ‘misc::error’ implements an error register. Because libraries are expected to be pure, they cannot issue error messages to the error output, nor exit with failure. One could pass call-backs (as functions or as objects) to set up error handling. Instead, we chose to register the errors in an object, and have the library functions return this register: it is up to the caller to decide what to do with these errors. Note also that direct calls to ‘std::exit’ bypass stack unwinding. In other words, with ‘std::exit’ (instead of ‘throw’) your application leaks memory. An instance of ‘misc::error’ can be used as if it were a stream to output error messages. It also keeps the current exit status until it is “triggered”, i.e., until it is thrown. Each module has its own error handler. For instance, the ‘Binder’ has an ‘error_’ attribute, and uses it to report errors: void Binder::error(const ast::Ast& loc, const std::string& msg) { error_ << misc::error::bind << loc.location_get() << ": " << msg << std::endl; } Then the task system fetches the local error handler, and merges it into the global error handler ‘error’ (see ‘common.*’). Some tasks trigger the error handler: if errors were registered, an exception is raised to exit the program cleanly. The following code demonstrates both aspects. void bindings_compute() { // bind::bind returns the local error handler. error << ::bind::bind(*ast::tasks::the_program); error.exit_on_error(); } -- File: escape.* (lib/misc/) This file implements a means to output string while escaping non printable characters. An example: std::cout << "escape(\"\111\") = " << escape("\"\111\"") << std::endl; Understanding how ‘escape’ works is required starting from TC-2. -- File: flex-lexer.hh (lib/misc/) The skeleton of the C++ scanner. Adapted from Flex’s ‘FlexLexer.h’ and used as a replacement, thanks to ‘flex++’ (*note flex++.in::). -- File: graph.* (lib/misc/) This file contains a generic implementation of oriented and undirected graphs. Understanding how ‘graph’ works is required starting from TC-8. -- File: indent.* (lib/misc/) Exploiting regular ‘std::ostream’ to produce indented output. -- File: ref.* (lib/misc/) Smart pointers implementing reference counting. -- File: set.* (lib/misc/) A wrapper around ‘std::set’ that introduce convenient operators (‘operator+’ and so forth). -- File: scoped-map.* (lib/misc/) The handling of ‘misc::scoped_map’, generic scoped map, serving as a basis for symbol tables used by the ‘Binder’. ‘misc::scoped_map’ maps a ‘Key’ to a ‘Data’ (that should ring a bell...). You are encouraged to implement something simple, based on stacks (see ‘std::stack’, or better yet, ‘std::vector’) and maps (see ‘std::map’). It must provide this interface: -- void: put (const Key& key, const Data& value) Associate VALUE to KEY in the current scope. -- Data: get (const Key& key) const If KEY was associated to some ‘Data’ in the open scopes, return the most recent insertion. Otherwise, if ‘Data’ is a pointer type, then return the empty pointer, else throw a ‘std::range_error’. To implement this feature, see (1) -- std::ostream&: dump (std::ostream& ostr) const Send the content of this table on OSTR _in a human-readable manner_, and return the stream. -- void: scope_begin () Open a new scope. -- void: scope_end () Close the last scope, forgetting everything since the latest ‘scope_begin()’. -- File: symbol.* (lib/misc/) In a program, the rule for identifiers is to be used many times: at least once for its definition, and once for each use. Just think about the number of occurrences of ‘size_t’ in a C program for instance. To save space one keeps a single copy of each identifier. This provides additional benefits: the address of this single copy can be used as a key: comparisons (equality or order) are much faster. The class ‘misc::symbol’ is an implementation of this idea. See the lecture notes, ‘scanner.pdf’(2). ‘misc::symbol’ is based on ‘misc::unique’. -- File: timer.* (lib/misc/) A class that makes it possible to have timings of processes, similarly to ‘gcc’’s ‘--time-report’, or ‘bison’’s ‘--report=time’. It is used in the ‘Task’ machinery, but can be used to provide better timings (e.g., separating the scanner from the parser). -- File: unique.* (lib/misc/) A generic class implementing the Flyweight design pattern. It maps identical objects to a unique reference. -- File: variant.* (lib/misc/) A wrapper over ‘std::variant’ supporting conversion operators. ---------- Footnotes ---------- (1) . (2) . 3.2.5 The ‘src’ Directory ------------------------- -- File: common.hh (src/) Used throughout the project. -- File: tc (src/) Your compiler. -- File: tc.cc (src/) Main entry. Called, the “driver”. 3.2.6 The ‘src/task’ Directory ------------------------------ No namespace for the time being, but it should be ‘task’. Delivered for TC-1. A generic scheme to handle the components of our compiler, and their dependencies. 3.2.7 The ‘src/parse’ Directory ------------------------------- Namespace ‘parse’. Delivered during TC-1. -- File: scantiger.ll (src/parse/) The scanner. -- File: parsetiger.yy (src/parse/) The parser. -- File: position.hh (src/ast/) Keeping track of a point (cursor) in a file. -- File: location.hh (src/ast/) Keeping track of a range (two cursors) in a (or two) file. -- File: libparse.hh (src/ast/) which prototypes what ‘tc.cc’ needs to know about the module ‘parse’. 3.2.8 The ‘src/ast’ Directory ----------------------------- Namespace ‘ast’, delivered for TC-2. Implementation of the abstract syntax tree. The file ‘ast/README’ gives an overview of the involved class hierarchy. -- File: location.hh (src/ast/) Imports Bison’s ‘parse::location’. -- File: visitor.hh (src/ast/) Abstract base class of the compiler’s visitor hierarchy. Actually, it defines a class template ‘GenVisitor’, which expects an argument which can be either ‘misc::constify_traits’ or ‘misc::id_traits’. This allows to define two parallel hierarchies: ‘ConstVisitor’ and ‘Visitor’, similar to ‘iterator’ and ‘const_iterator’. The understanding of the template programming used _is not required at this stage_ as it is quite delicate, and goes far beyond your (average) current understanding of templates. -- File: default-visitor.* (src/ast/) Implementation of the ‘GenDefaultVisitor’ class template, which walks the abstract syntax tree, doing nothing. This visitor does not define visit methods for nodes related to object-oriented constructs (classes, methods, etc.); thus it is an abstract class, and is solely used as a basis for deriving other visitors. It is instantiated twice: ‘GenDefaultVisitor’ and ‘GenDefaultVisitor’. -- File: non-object-visitor.* (src/ast/) Implementation of the ‘GenNonObjectVisitor’ class template, which walks the abstract syntax tree, doing nothing, but aborting on nodes related to object-oriented constructs (classes, methods, etc.). This visitor is abstract and is solely used as a basis for deriving other visitors (*note TC-2 FAQ::). It is instantiated twice: ‘GenNonObjectVisitor’ and ‘GenNonObjectVisitor’. -- File: object-visitor.* (src/ast/) Implementation of the ‘GenObjectVisitor’ class template, which walks object-related nodes of an abstract syntax tree, doing nothing. This visitor is abstract and is solely used as a basis for deriving other visitors. It is instantiated twice: ‘GenObjectVisitor’ and ‘GenObjectVisitor’. -- File: pretty-printer.* (src/ast/) The ‘PrettyPrinter’ class, which pretty-prints an AST back into Tiger concrete syntax. -- File: typable.* (src/ast/) This class is not needed before TC-4 (*note TC-4::). Auxiliary class from which typable AST node classes should derive. It has a simple interface made to manage a pointer to the type of the node: -- void: type_set (const type::Type*) -- const: type::Type* type_get () const Accessors to the type of this node. -- void: accept (ConstVisitor& v) const -- void: accept (Visitor& v) These methods are abstract, as in ‘ast::Ast’. -- File: type-constructor.* (src/ast/) This class is not needed before TC-4 (*note TC-4::). Auxiliary class from which should derive AST nodes that construct a type (e.g., ‘ast::ArrayTy’). Its interface is similar to that of ‘ast::Typable’ with one big difference: ‘ast::TypeConstructor’ is responsible for de-allocating that type. -- void: created_type_set (const type::Type*) -- const: type::Type* created_type_get () const Accessors to the _created_ type of this node. -- void: accept (ConstVisitor& v) const -- void: accept (Visitor& v) It is convenient to be able to visit these, but it is not needed. -- File: escapable.* (src/ast/) This class is needed only for TC-E (*note TC-E::). Auxiliary class from which AST node classes that denote the declaration of variables and formal arguments should derive. Its role is to encode a single Boolean value: whether the variable escapes or not. The natural interface includes ‘escape_get’ and ‘escape_set’ methods. 3.2.9 The ‘src/bind’ Directory ------------------------------ Namespace ‘bind’. Binding uses to definitions. -- File: binder.* (src/bind/) The ‘bind::Binder’ visitor. Binds uses to definitions (works on syntax _without_ object). -- File: renamer.* (src/bind/) The ‘bind::Renamer’ visitor. Renames every identifier to a unique name (works on syntax _without_ object). 3.2.10 The ‘src/escapes’ Directory ---------------------------------- Namespace ‘escapes’. Compute the escaping variables. -- File: escapes-visitor.* (src/escapes/) The ‘escapes::EscapesVisitor’. 3.2.11 The ‘src/type’ Directory ------------------------------- Namespace ‘type’. Type checking. -- File: libtype.* (src/type/) The interface of the Type module. It exports a single procedure, ‘types_check’. -- File: types.hh (src/type/) -- File: type.* (src/type/) -- File: array.* (src/type/) -- File: attribute.* (src/type/) -- File: builtin-types.* (src/type/) -- File: class.* (src/type/) -- File: field.* (src/type/) -- File: function.* (src/type/) -- File: method.* (src/type/) -- File: named.* (src/type/) -- File: record.* (src/type/) The definitions of all the types. Built-in types (‘Int’, ‘String’ and ‘Void’) are defined in ‘src/type/builtin-types.*’. -- File: nil.* (src/type/) The ‘Nil’ type is holding information about the real record type that it’s hiding. The ‘record_type’ represents the actual type that the ‘nil’ was meant to be used with. The ‘record_type’ is set during the type-checker in the parent nodes of the node holding a ‘Nil’ type. -- File: type-checker.* (src/type/) The ‘type::TypeChecker’ visitor. Computes the types of an AST and adds type labels to the corresponding nodes (works on syntax _without_ object). -- File: pretty-printer.* (src/type/) The ‘type::PrettyPrinter’ visitor which pretty-prints ‘type::Type’s in a human-readable way. Used to output nice type errors. 3.2.12 The ‘src/object’ Directory --------------------------------- -- File: binder.* (src/object/) The ‘object::Binder’ visitor. Binds uses to definitions (works on syntax with objects). Inherits from ‘bind::Binder’. -- File: type-checker.* (src/object/) The ‘object::TypeChecker’ visitor. Computes the types of an AST and adds type labels to the corresponding nodes (works on syntax with objects). Inherits from ‘type::TypeChecker’. -- File: renamer.* (src/object/) The ‘object::Renamer’ visitor. Renames every identifier to a unique name (works on syntax with objects), and keeps a record of the names of the renamed classes. Inherits from ‘bind::Renamer’. -- File: desugar-visitor.* (src/object/) The ‘object::DesugarVisitor’ visitor. Transforms an AST with objects into an AST without objects. 3.2.13 The ‘src/overload’ Directory ----------------------------------- Namespace ‘overload’. Overloading function support. 3.2.14 The ‘src/astclone’ Directory ----------------------------------- -- File: cloner.* (src/astclone) The ‘astclone::Cloner’ visitor. Duplicate an AST. This copy is purely structural: the clone is similar to the original tree, but any existing binding or type information is _not_ preserved. 3.2.15 The ‘src/desugar’ Directory ---------------------------------- -- File: desugar-visitor.* (src/desugar) The ‘desugar::DesugarVisitor’ visitor. Remove constructs that can be considered as syntactic sugar using other language constructs. For instance, turn ‘for’ loops into ‘while’ loops, string comparisons into function calls. Inherits from ‘astclone::Cloner’, so the desugared AST is a modified copy of the initial tree. -- File: bounds-checking-visitor.* (src/desugar) The ‘desugar::BoundsCheckingVisitor’ visitor. Add dynamic array bounds checks while duplicating an ‘ast’. Inherits from ‘astclone::Cloner’, so the result is a modified copy of the input AST. 3.2.16 The ‘src/inlining’ Directory ----------------------------------- -- File: inliner.* (src/inlining) The ‘desugar::Inliner’ visitor. Perform inline expansion of functions. -- File: pruner.* (src/inlining) The ‘desugar::Pruner’ visitor. Prune useless function declarations within an ‘ast’. 3.2.17 The ‘src/temp’ Directory ------------------------------- Namespace ‘temp’, delivered for TC-5. -- File: identifier.* (src/temp/) Provides the class template ‘Identifier’ built upon ‘misc::variant’ and used to implement ‘temp::Temp’ and ‘temp::Label’. Also contains the generic ‘IdentifierCompareVisitor’, used to compare two identifiers. ‘Identifier’ handles maps of ‘Identifiers’. For instance, the ‘Temp’ ‘t5’ might be allocated the register ‘$t2’, in which case, when outputting ‘t5’, we should print ‘$t2’. Maps stored in the xalloc’d slot ‘Identifier::map’ of streams implements such a correspondence. In addition, the ‘operator<<’ of the ‘Identifier’ class template itself "knows" when such a mapping is active, and uses it. -- File: label.* (src/temp/) We need labels for ‘jump’s, for functions, strings etc. Implemented as an instantiation of the ‘temp::Identifier’ scheme. -- File: temp.* (src/temp/) So called “temporaries” are pseudo-registers: we may allocate as many temporaries as we want. Eventually the register allocator will map those temporaries to either an actual register, or it will allocate a slot in the activation block (aka frame) of the current function. Implemented as an instantiation of the ‘temp::Identifier’ scheme. -- File: temp-set.* (src/temp/) A set of temporaries, along with its ‘operator<<’. 3.2.18 The ‘src/tree’ Directory ------------------------------- Namespace ‘tree’, delivered for TC-5. The implementation of the intermediate representation. The file ‘tree/README’ should give enough explanations to understand how it works. Reading the corresponding explanations in Appel’s book is mandatory. It is worth noting that contrary to A. Appel, just as we did for ‘ast’, we use n-ary structures. For instance, where Appel uses a binary ‘seq’, we have an n-ary ‘seq’ which allows us to put as many statements as we want. To avoid gratuitous name clashes, what Appel denotes ‘exp’ is denoted ‘sxp’ (Statement Expression), implemented in ‘translate::Sxp’. Please, pay extra attention to the fact that there are ‘temp::Temp’ used to create unique temporaries (similar to ‘misc::symbol’), and ‘tree::Temp’ which is the intermediate representation instruction denoting a temporary (hence a ‘tree::Temp’ needs a ‘temp::Temp’). Similarly, on the one hand, there is ‘temp::Label’ which is used to create unique labels, and on the other hand there are ‘tree::Label’ which is the IR statement to _define_ to a label, and ‘tree::Name’ used to _refer_ to a label (typically, a ‘tree::Jump’ needs a ‘tree::Name’ which in turn needs a ‘temp::Label’). -- File: fragment.* (src/tree/) It implements ‘tree::Fragment’, an abstract class, ‘tree::DataFrag’ to store the literal strings, and ‘tree::ProcFrag’ to store the routines. -- File: fragments.* (src/tree/) Lists of ‘tree::Fragment’. -- File: visitor.* (src/tree/) Implementation of ‘tree::Visitor’ and ‘tree::ConstVisitor’ to implement function objects on ‘tree::Fragments’. In other words, these visitors implement polymorphic operations on ‘tree::Fragment’. 3.2.19 The ‘src/frame’ Directory -------------------------------- Namespace ‘frame’, delivered for TC-5. -- File: access.* (src/frame/) An ‘Access’ is a location of a variable: on the stack, or in a temporary. -- File: frame.* (src/frame/) A ‘Frame’ knows only what are the “variables” it contains. 3.2.20 The ‘src/translate’ Directory ------------------------------------ Namespace ‘translate’. Translation to intermediate code translation. It includes: -- File: libtranslate.* (src/translate/) The interface. -- File: access.* (src/translate/) Static link aware versions of ‘level::Access’. -- File: level.* (src/translate/) ‘translate::Level’ are wrappers ‘frame::Frame’ that support the static links, so that we can find an access to the variables of the “parent function”. -- File: exp.hh (src/translate/) Implementation of ‘translate::Ex’ (expressions), ‘Nx’ (instructions), ‘Cx’ (conditions), and ‘Ix’ (‘if’) shells. They wrap ‘tree::Tree’ to delay their translation until the actual use is known. -- File: translation.hh (src/translate/) functions used by the ‘translate::Translator’ to translate the AST into HIR. For instance, it contains ‘Exp* simpleVar(const Access& access, const Level& level)’, ‘Exp* callExp(const temp::Label& label, std::list args)’ etc. which are routines that produce some ‘Tree::Exp’. They handle all the ‘unCx’ etc. magic. -- File: translator.hh (src/translate/) Implements the class ‘Translator’ which performs the IR generation thanks to ‘translation.hh’. It must not be polluted with translation details: it is only coordinating the AST traversal with the invocation of translation routines. For instance, here is the translation of an ‘ast::SimpleVar’: virtual void operator()(const SimpleVar& e) { exp_ = simpleVar(*var_access_[e.def_get()], *level_); } 3.2.21 The ‘src/canon’ Directory -------------------------------- Namespace ‘canon’. 3.2.22 The ‘src/assem’ Directory -------------------------------- Namespace ‘assem’, delivered for TC-7. This directory contains the implementation of the Assem language: yet another intermediate representation that aims at encoding an assembly language, plus a few needed features so that register allocation can be performed afterward. Given in full. -- File: instr.* (src/assem/) -- File: move.* (src/assem/) -- File: oper.* (src/assem/) -- File: label.* (src/assem/) -- File: comment.* (src/assem/) Implementation of the basic types of assembly instructions. -- File: fragment.* (src/assem/) Implementation of ‘assem::Fragment’, ‘assem::ProcFrag’, and ‘assem::DataFrag’. They are very similar to ‘tree::Fragment’: aggregate some information that must remain together, such as a ‘frame::Frame’ and the instructions (a list of ‘assem::Instr’). -- File: visitor.hh (src/assem/) The root of assembler visitors. -- File: layout.hh (src/assem/) A pretty printing visitor for ‘assem::Fragment’. -- File: libassem.* (src/assem/) The interface of the module, and its implementation. 3.2.23 The ‘src/target’ Directory --------------------------------- Namespace ‘target’, delivered for TC-7. Some data on the back end. -- File: cpu.* (src/target/) Description of a CPU: everything about its registers, and its word size. -- File: target.* (src/target/) Description of a target (language): its CPU, its assembly (‘target::Assembly’), and it translator (‘target::Codegen’). -- File: assembly.* (src/target/) The abstract class ‘target::Assembly’, the interface for elementary assembly instructions generation. -- File: codegen.* (src/target/) The abstract class ‘target::Codegen’, the interface for all our back ends. -- Directory: mips (src/target/) -- Directory: ia32 (src/target/) -- Directory: arm (src/target/) The instruction selection per se split into a generic part, and a target specific (MIPS, IA-32 and ARM) part. *Note src/target/mips::, *note src/target/ia32:: and *note src/target/arm::. -- File: libtarget.* (src/target/) Converting ‘tree::Fragment’s into ‘assem::Fragment’s. -- File: tiger-runtime.c (src/target/) This is the Tiger runtime, written in C, based on Andrew Appel’s ‘runtime.c’(1). The actual ‘runtime.s’ file for MIPS was written by hand, but the IA-32 was a compiled version of this file. It should be noted that: Strings Strings are implemented as 4 bytes to encode the length, and then a 0-terminated à la C string. The length part is due to conformance to the Tiger Reference Manual, which specifies that 0 is a regular character that can be part of the strings, but it is nevertheless terminated by 0 to be compliant with SPIM/Nolimips’ ‘print’ syscall. This might change in the future. Special Strings There are some special strings: 0 and 1 character long strings are all implemented via a singleton. That is to say there is only one allocated string ‘""’, a single ‘"1"’ etc. These singletons are allocated by ‘main’. It is essential to preserve this invariant/convention in the whole runtime. ‘strcmp’ vs. ‘stringEqual’ We don’t know how Appel wants to support ‘"bar" < "foo"’ since he doesn’t provide ‘strcmp’. We do. His implementation of equality is more efficient than ours though, since he can decide just be looking at the lengths. That could be improved in the future... ‘main’ The runtime has some initializations to make, such as strings singletons, and then calls the compiled program. This is why the runtime provides ‘main’, and calls ‘tc_main’, which is the “main” that your compiler should provide. ---------- Footnotes ---------- (1) . 3.2.24 The ‘src/target/mips’ Directory -------------------------------------- Namespace ‘target::mips’, delivered for TC-7. Code generation for MIPS R2000. -- File: cpu.* (src/target/mips/) The description of the MIPS (actually, SPIM/Nolimips) CPU. -- File: spim-assembly.* (src/target/mips/) Our assembly language (syntax, opcodes and layout); it abstracts the generation of MIPS R2000 instructions. ‘target::mips::SpimAssembly’ derives from ‘target::Assembly’. -- File: spim-layout.* (src/target/mips/) How MIPS (and SPIM/Nolimips) fragments are to be displayed. In other words, that’s where the (global) syntax of the target assembly file is selected. -- File: codegen.* (src/target/mips/) -- File: tree.brg (src/target/mips/) -- File: exp.brg (src/target/mips/) -- File: binop.brg (src/target/mips/) -- File: call.brg (src/target/mips/) -- File: temp.brg (src/target/mips/) -- File: mem.brg (src/target/mips/) -- File: stm.brg (src/target/mips/) -- File: move.brg (src/target/mips/) -- File: move_load.brg (src/target/mips/) -- File: move_store.brg (src/target/mips/) -- File: cjump.brg (src/target/mips/) -- File: prologue.hh (src/target/mips/) -- File: epilogue.cc (src/target/mips/) A translator from LIR to ASSEM using the MIPS R2000 instruction set defined by ‘target::mips::SpimAssembly’. It is implemented as a dynamic programming algorithm generated by MonoBURG from a set of ‘brg’ files. ‘target::mips::Codegen’ derives from ‘target::Codegen’. -- File: target.* (src/target/mips/) The main back end, based on a MIPS CPU and a MIPS code generator. -- File: runtime.s (src/target/mips/) -- File: runtime.cc (src/target/mips/) The Tiger runtime in MIPS assembly language: ‘print’ etc. The C++ file ‘runtime.cc’ is built from ‘runtime.s’: do not edit the former. *Note src/target::, ‘tiger-runtime.c’. 3.2.25 The ‘src/target/ia32’ Directory -------------------------------------- Namespace ‘target::ia32’, delivered for TC-7. Code generation for IA-32. This is not part of the student project, but it is left to satisfy their curiosity. In addition its presence is a sane invitation to respect the constraints of a multi-back-end compiler. -- File: cpu.* (src/target/ia32) Description of the i386 CPU. -- File: gas-assembly.* (src/target/ia32/) The IA-32 assembly language (syntax, opcodes and layout); it abstracts the generation of IA-32 instructions using the GNU Assembler (Gas) syntax. ‘target::ia32::GasAssembly’ derives from ‘target::Assembly’. -- File: gas-layout.* (src/target/ia32/) How IA-32 fragments are to be displayed. In other words, that’s where the (global) syntax of the target assembly file is selected. -- File: codegen.* (src/target/ia32/) -- File: tree.brg (src/target/ia32/) -- File: exp.brg (src/target/ia32/) -- File: binop.brg (src/target/ia32/) -- File: call.brg (src/target/ia32/) -- File: temp.brg (src/target/ia32/) -- File: mem.brg (src/target/ia32/) -- File: stm.brg (src/target/ia32/) -- File: move.brg (src/target/ia32/) -- File: move_load.brg (src/target/ia32/) -- File: move_store.brg (src/target/ia32/) -- File: cjump.brg (src/target/ia32/) -- File: prologue.hh (src/target/ia32/) -- File: epilogue.cc (src/target/ia32/) A translator from LIR to ASSEM using the IA-32 instruction set defined by ‘target::ia32::GasAssembly’. It is implemented as a dynamic programming algorithm generated by MonoBURG from a set of ‘brg’ files. ‘target::ia32::Codegen’ derives from ‘target::Codegen’. -- File: target.* (src/target/ia32/) The IA-32 back-end, based on an IA-32 CPU and an IA-32 code generator. -- File: runtime-gnu-linux.s (src/target/ia32/) -- File: runtime-freebsd.s (src/target/ia32/) The GNU/Linux and FreeBSD Tiger runtimes in IA-32 assembly language: ‘print’ etc. The C++ files ‘runtime-gnu-linux.cc’ and ‘runtime-freebsd.cc’ are built from ‘runtime-gnu-linux.s’ and ‘runtime-freebsd.s’: do not edit the former. *Note src/target::, ‘tiger-runtime.c’. 3.2.26 The ‘src/target/arm’ Directory ------------------------------------- Namespace ‘target::arm’, delivered for TC-7. Code generation for ARM. This is not part of the student project, but it is left to satisfy their curiosity. In addition its presence is a sane invitation to respect the constraints of a multi-back-end compiler. -- File: cpu.* (src/target/arm) Description of the ARMV7 CPU. -- File: arm-assembly.* (src/target/arm/) The ARM assembly language (syntax, opcodes and layout); it abstracts the generation of ARM instructions. ‘target::arm::ArmAssembly’ derives from ‘target::Assembly’. -- File: arm-layout.* (src/target/arm/) How ARM fragments are to be displayed. In other words, that’s where the (global) syntax of the target assembly file is selected. -- File: arm-codegen.* (src/target/arm/) -- File: tree.brg (src/target/arm/) -- File: exp.brg (src/target/arm/) -- File: binop.brg (src/target/arm/) -- File: call.brg (src/target/arm/) -- File: temp.brg (src/target/arm/) -- File: mem.brg (src/target/arm/) -- File: stm.brg (src/target/arm/) -- File: move.brg (src/target/arm/) -- File: move_load.brg (src/target/arm/) -- File: move_store.brg (src/target/arm/) -- File: cjump.brg (src/target/arm/) -- File: prologue.hh (src/target/arm/) -- File: epilogue.cc (src/target/arm/) A translator from LIR to ASSEM using the ARM instruction set defined by ‘target::arm::ArmAssembly’. It is implemented as a dynamic programming algorithm generated by MonoBURG from a set of ‘brg’ files. ‘target::arm::Codegen’ derives from ‘target::Codegen’. -- File: target.* (src/target/arm/) The ARM back-end, based on an ARM CPU and an ARM code generator. -- File: runtime.s (src/target/arm/) The Tiger runtime in ARM assembly language: ‘print’ etc. 3.2.27 The ‘src/liveness’ Directory ----------------------------------- Namespace ‘liveness’, delivered for TC-8. -- File: flowgraph.* (src/liveness/) ‘FlowGraph’ implementation. -- File: test-flowgraph.cc (src/liveness/) ‘FlowGraph’ test. -- File: liveness.* (src/liveness/) Computing the live-in and live-out information from the ‘FlowGraph’. -- File: interference-graph.* (src/liveness/) Computing the ‘InterferenceGraph’ from the live-in/live-out information. 3.2.28 The ‘src/llvmtranslate’ Directory ---------------------------------------- Namespace ‘llvmtranslate’, delivered for TC-5. Translate the AST to LLVM intermediate code using the LLVM libraries. -- File: escapes-collector.* (src/llvmtranslate/) The ‘FrameBuilder’ and the ‘EscapesCollector’. LLVM IR doesn’t support static link and nested functions. In order to translate those functions to LLVM IR, we use Lambda Lifting, which consists in passing a pointer to the escaped variables to the nested function using that variable. In order to do that, we need a visitor to collect these kind of variables and associate them to each function. This visitor is the ‘EscapesCollector’. In order for the ‘EscapesCollector’ to work properly, the variables located in the function’s frame have to be excluded. The ‘FrameBuilder’ is building a frame for the ‘EscapesCollector’ to use. -- File: libllvmtranslate.* (src/llvmtranslate/) The interface. -- File: llvm-type-visitor.* (src/llvmtranslate/) The LLVM IR is a typed language. In order to ensure type safety, the Tiger types (‘type::Type’) have to be translated to LLVM types (‘llvm::Type’). In order to do that, this visitor defined in *note src/llvmtranslate:: is used to traverse the type hierarcy and translate it to LLVM types. -- File: translator.hh (src/llvmtranslate/) Implements the class ‘Translator’ which performs the LLVM IR generation using the LLVM API. For instance, here is the translation of a ‘ast::SimpleVar’: virtual void operator()(const SimpleVar& e) { value_ = builder_.CreateLoad(access_var(e), e.name_get().get()); } -- File: tiger-runtime.c (src/llvmtranslate/) This is the specific runtime for *note TC-L::. It is based on the original runtime, with some adaptations for LLVM. It is compiled to LLVM IR in ‘$(build_dir)/src/llvmtranslate/runtime.ll’, then a function ‘llvmtranslate::runtime_string()’ is generated in ‘$(build_dir)/src/llvmtranslate/runtime.cc’. This function is used by the task ‘--llvm-runtime-display’ to print the runtime along the LLVM IR. Strings Strings are implemented as ‘char*’ 0-terminated buffers, like C strings. Functions Most of the built-ins are just calls to the C standard library functions. Characters Since the type ‘char’ doesn’t exist in TC, a ‘char’ is nothing more than a ‘string’ of length 1. In order to avoid allocations every time a character is asked for, an array containing all the characters followed by a ‘\0’ is initialized at the beginning of the program. ‘main’ The runtime initializes the one-character strings, then calls ‘tc_main’, which is the ‘main’ that your compiler should have provided. 3.2.29 The ‘src/regalloc’ Directory ----------------------------------- Namespace ‘regalloc’, register allocation, delivered for TC-9. -- File: color.* (src/regalloc/) Coloring an interference graph. -- File: regallocator.* (src/regalloc/) Repeating the coloration until it succeeds (no spills). -- File: libregalloc.* (src/regalloc/) Removing useless ‘move’s once the register allocation performed, and allocating the register for fragments. -- File: test-regalloc.cc (src/regalloc/) Exercising this. 3.3 Given Test Cases ==================== We provide a few test cases: _you must write your own tests_. _Writing tests is part of the project._ Do not just copy test cases from other groups, as you will not understand why they were written. The initial test suite is available for download at ‘tests.tgz’(1). It contains the following directories: ‘good’ These programs are correct. ‘bind’ These programs have bind mismatches. ‘syntax’ These programs have syntactial errors. ‘type’ These programs contain type mismatches. ---------- Footnotes ---------- (1) . 4 Compiler Stages ***************** The compiler will be written in several steps, described below. 4.1 Stage Presentation ====================== The following sections adhere to a standard layout in order to present each stage N: Introduction The first few lines specify the last time the section was updated, the class for which it is written, and the submission dates. It also briefly describes the stage. TN Goals, What this stage teaches This section details the goals of the stage as a teaching exercise. Be sure that examiners will make sure you understood these points. They also have instructions to ask questions about previous stages. TN Samples, See TN work Actual examples generated from the reference compilers are exhibited to present and “specify” the stage. TN Given Code, Explanation on the provided code This subsection points to the on line material we provide, introduces its components, quickly presents their designs and so forth. Check out the developer documentation of the Tiger Compiler(1) for more information, as the code is (hopefully) properly documented. TN Code to Write, Explanation on what you have to write But of course, this code is not complete; this subsection provides hints on what is expected, and where. TN Options, Want some more? During some stages, those who find the main task too easy can implement more features. These sections suggest possible additional features. TN FAQ, Questions not to ask Each stage sees a blossom of new questions, some of which being extremely pertinent. We selected the most important ones, those that you should be aware of, contrary to many more questions that you ought to find and ask yourselves. These sections answer this few questions. And since they are already answered, you should not ask them... TN Improvements, Other Designs The Tiger Compiler is an instructional project the audience of which is _learning_ C++. Therefore, although by the end of the development, in the latter stages, we can expect able C++ programmers, most of the time we have to refrain from using advanced designs, or intricate C++ techniques. These sections provide hints on what could have been done to improve the stage. You can think of these sections as material you ought to read once the project is over and you are a grown-up C++ programmer. ---------- Footnotes ---------- (1) . 4.2 PTHL (TC-0), Naive Scanner and Parser ========================================= *2020-PTHL submission is Wednesday, February 1st 2018 at 19:42*. This section has been updated for EPITA-2020 on 2015-11-16. TC-0 is a weak form of TC-1: the scanner and the parser are written, but the framework is simplified (*note TC-1 Code to Write::). The grammar is also simpler: object-related productions are not to be supported at this stage (*note PTHL Improvements::). No command line option is supported. 4.2.1 PTHL Goals ---------------- Things to learn during this stage that you should remember: − Writing/debugging a scanner with Flex. − Using start conditions to handle non-regular issues within the scanner. − Using ‘yy::parser::make_SYMBOL’ to build whole symbols (containing a token type, a location, and possibly a semantic value) and pass them to the parser. − Writing/debugging a parser with Bison. − Resolving simple conflicts due to precedences and associativities thanks to directives (e.g., ‘%left’ etc.). − Resolving hard conflicts with loop unrolling. The case of lvalue vs. array instantiation is of first importance. 4.2.2 PTHL Samples ------------------ First, please note that all the samples, including in this section, are generated with a TC-1+ compliant compiler: its behavior differs from that of a TC-0 compiler. In particular, for the time being, forget about the options (‘-X’ and ‘--parse’). Running TC-0 basically consists in looking at exit values: print("Hello, World!\n") File 4.1: ‘simple.tig’ $ tc simple.tig Example 4.1: ‘tc simple.tig’ The following example demonstrates the scanner and parser tracing. The glyphs “error→” and “⇒” are typographic conventions to specify respectively the standard error stream and the exit status. _They are not part of the output per se_. $ SCAN=1 PARSE=1 tc -X --parse simple.tig error→Parsing file: "simple.tig" error→Starting parse error→Entering state 0 error→Reading a token: --(end of buffer or a NUL) error→--accepting rule at line 196("print") error→Next token is token "identifier" (simple.tig:1.1-5: print) error→Shifting token "identifier" (simple.tig:1.1-5: print) error→Entering state 2 error→Reading a token: --accepting rule at line 138("(") error→Next token is token "(" (simple.tig:1.6: ) error→Reducing stack 0 by rule 100 (line 626): error→ $1 = token "identifier" (simple.tig:1.1-5: print) error→-> $$ = nterm funid (simple.tig:1.1-5: print) error→Entering state 36 error→Next token is token "(" (simple.tig:1.6: ) error→Shifting token "(" (simple.tig:1.6: ) error→Entering state 85 error→Reading a token: --accepting rule at line 197(""") error→--accepting rule at line 266("Hello, World!") error→--accepting rule at line 253("\n") error→--accepting rule at line 228(""") error→Next token is token "string" (simple.tig:1.7-23: Hello, World! error→) error→Shifting token "string" (simple.tig:1.7-23: Hello, World! error→) error→Entering state 1 error→Reducing stack 0 by rule 4 (line 296): error→ $1 = token "string" (simple.tig:1.7-23: Hello, World! error→) error→-> $$ = nterm exp (simple.tig:1.7-23: "Hello, World!\n") error→Entering state 131 error→Reading a token: --accepting rule at line 139(")") error→Next token is token ")" (simple.tig:1.24: ) error→Reducing stack 0 by rule 45 (line 417): error→ $1 = nterm exp (simple.tig:1.7-23: "Hello, World!\n") error→-> $$ = nterm args.1 (simple.tig:1.7-23: "Hello, World!\n") error→Entering state 133 error→Next token is token ")" (simple.tig:1.24: ) error→Reducing stack 0 by rule 44 (line 412): error→ $1 = nterm args.1 (simple.tig:1.7-23: "Hello, World!\n") error→-> $$ = nterm args (simple.tig:1.7-23: "Hello, World!\n") error→Entering state 132 error→Next token is token ")" (simple.tig:1.24: ) error→Shifting token ")" (simple.tig:1.24: ) error→Entering state 174 error→Reducing stack 0 by rule 6 (line 304): error→ $1 = nterm funid (simple.tig:1.1-5: print) error→ $2 = token "(" (simple.tig:1.6: ) error→ $3 = nterm args (simple.tig:1.7-23: "Hello, World!\n") error→ $4 = token ")" (simple.tig:1.24: ) error→-> $$ = nterm exp (simple.tig:1.1-24: print("Hello, World!\n")) error→Entering state 25 error→Reading a token: --(end of buffer or a NUL) error→--accepting rule at line 134(" error→") error→--(end of buffer or a NUL) error→--EOF (start condition 0) error→Now at end of input. error→Reducing stack 0 by rule 1 (line 287): error→ $1 = nterm exp (simple.tig:1.1-24: print("Hello, World!\n")) error→-> $$ = nterm program (simple.tig:1.1-24: ) error→Entering state 24 error→Now at end of input. error→Shifting token "end of file" (simple.tig:2.1: ) error→Entering state 63 error→Cleanup: popping token "end of file" (simple.tig:2.1: ) error→Cleanup: popping nterm program (simple.tig:1.1-24: ) error→Parsing string: function _main() = (_exp(0); ()) error→Starting parse error→Entering state 0 error→Reading a token: --(end of buffer or a NUL) error→--accepting rule at line 164("function") error→Next token is token "function" (:1.1-8: ) error→Shifting token "function" (:1.1-8: ) error→Entering state 8 error→Reading a token: --accepting rule at line 133(" ") error→--accepting rule at line 195("_main") error→Next token is token "identifier" (:1.10-14: _main) error→Shifting token "identifier" (:1.10-14: _main) error→Entering state 43 error→Reading a token: --accepting rule at line 138("(") error→Next token is token "(" (:1.15: ) error→Shifting token "(" (:1.15: ) error→Entering state 93 error→Reading a token: --accepting rule at line 139(")") error→Next token is token ")" (:1.16: ) error→Reducing stack 0 by rule 95 (line 605): error→-> $$ = nterm funargs (:1.16: ) error→Entering state 144 error→Next token is token ")" (:1.16: ) error→Shifting token ")" (:1.16: ) error→Entering state 186 error→Reading a token: --accepting rule at line 133(" ") error→--accepting rule at line 152("=") error→Next token is token "=" (:1.18: ) error→Reducing stack 0 by rule 86 (line 567): error→-> $$ = nterm typeid.opt (:1.17: ) error→Entering state 215 error→Next token is token "=" (:1.18: ) error→Shifting token "=" (:1.18: ) error→Entering state 231 error→Reading a token: --accepting rule at line 133(" ") error→--accepting rule at line 138("(") error→Next token is token "(" (:1.20: ) error→Shifting token "(" (:1.20: ) error→Entering state 12 error→Reading a token: --accepting rule at line 191("_exp") error→Next token is token "_exp" (:1.21-24: ) error→Shifting token "_exp" (:1.21-24: ) error→Entering state 21 error→Reading a token: --accepting rule at line 138("(") error→Next token is token "(" (:1.25: ) error→Shifting token "(" (:1.25: ) error→Entering state 60 error→Reading a token: --accepting rule at line 113("0") error→Next token is token "integer" (:1.26: 0) error→Shifting token "integer" (:1.26: 0) error→Entering state 106 error→Reading a token: --accepting rule at line 139(")") error→Next token is token ")" (:1.27: ) error→Shifting token ")" (:1.27: ) error→Entering state 164 error→Reducing stack 0 by rule 37 (line 397): error→ $1 = token "_exp" (:1.21-24: ) error→ $2 = token "(" (:1.25: ) error→ $3 = token "integer" (:1.26: 0) error→ $4 = token ")" (:1.27: ) error→-> $$ = nterm exp (:1.21-27: print("Hello, World!\n")) error→Entering state 48 error→Reading a token: --accepting rule at line 148(";") error→Next token is token ";" (:1.28: ) error→Reducing stack 0 by rule 48 (line 424): error→ $1 = nterm exp (:1.21-27: print("Hello, World!\n")) error→-> $$ = nterm exps.1 (:1.21-27: print("Hello, World!\n")) error→Entering state 49 error→Next token is token ";" (:1.28: ) error→Shifting token ";" (:1.28: ) error→Entering state 99 error→Reading a token: --accepting rule at line 133(" ") error→--accepting rule at line 138("(") error→Next token is token "(" (:1.30: ) error→Shifting token "(" (:1.30: ) error→Entering state 12 error→Reading a token: --accepting rule at line 139(")") error→Next token is token ")" (:1.31: ) error→Reducing stack 0 by rule 52 (line 436): error→-> $$ = nterm exps.0.2 (:1.31: ) error→Entering state 51 error→Next token is token ")" (:1.31: ) error→Shifting token ")" (:1.31: ) error→Entering state 100 error→Reducing stack 0 by rule 11 (line 321): error→ $1 = token "(" (:1.30: ) error→ $2 = nterm exps.0.2 (:1.31: ) error→ $3 = token ")" (:1.31: ) error→-> $$ = nterm exp (:1.30-31: ()) error→Entering state 153 error→Reading a token: --(end of buffer or a NUL) error→--accepting rule at line 139(")") error→Next token is token ")" (:1.32: ) error→Reducing stack 0 by rule 51 (line 431): error→ $1 = nterm exps.1 (:1.21-27: print("Hello, World!\n")) error→ $2 = token ";" (:1.28: ) error→ $3 = nterm exp (:1.30-31: ()) error→-> $$ = nterm exps.2 (:1.21-31: print("Hello, World!\n"), ()) error→Entering state 50 error→Reducing stack 0 by rule 53 (line 437): error→ $1 = nterm exps.2 (:1.21-31: print("Hello, World!\n"), ()) error→-> $$ = nterm exps.0.2 (:1.21-31: print("Hello, World!\n"), ()) error→Entering state 51 error→Next token is token ")" (:1.32: ) error→Shifting token ")" (:1.32: ) error→Entering state 100 error→Reducing stack 0 by rule 11 (line 321): error→ $1 = token "(" (:1.20: ) error→ $2 = nterm exps.0.2 (:1.21-31: print("Hello, World!\n"), ()) error→ $3 = token ")" (:1.32: ) error→-> $$ = nterm exp (:1.20-32: ( error→ print("Hello, World!\n"); error→ () error→)) error→Entering state 239 error→Reading a token: --(end of buffer or a NUL) error→--EOF (start condition 0) error→Now at end of input. error→Reducing stack 0 by rule 93 (line 598): error→ $1 = token "function" (:1.1-8: ) error→ $2 = token "identifier" (:1.10-14: _main) error→ $3 = token "(" (:1.15: ) error→ $4 = nterm funargs (:1.16: ) error→ $5 = token ")" (:1.16: ) error→ $6 = nterm typeid.opt (:1.17: ) error→ $7 = token "=" (:1.18: ) error→ $8 = nterm exp (:1.20-32: ( error→ print("Hello, World!\n"); error→ () error→)) error→-> $$ = nterm fundec (:1.1-32: error→function _main() = error→ ( error→ print("Hello, World!\n"); error→ () error→ )) error→Entering state 35 error→Now at end of input. error→Reducing stack 0 by rule 91 (line 593): error→ $1 = nterm fundec (:1.1-32: error→function _main() = error→ ( error→ print("Hello, World!\n"); error→ () error→ )) error→-> $$ = nterm fundecs (:1.1-32: error→function _main() = error→ ( error→ print("Hello, World!\n"); error→ () error→ )) error→Entering state 34 error→Now at end of input. error→Reducing stack 0 by rule 54 (line 447): error→-> $$ = nterm decs (:1.33: ) error→Entering state 83 error→Reducing stack 0 by rule 57 (line 451): error→ $1 = nterm fundecs (:1.1-32: error→function _main() = error→ ( error→ print("Hello, World!\n"); error→ () error→ )) error→ $2 = nterm decs (:1.33: ) error→-> $$ = nterm decs (:1.1-32: error→function _main() = error→ ( error→ print("Hello, World!\n"); error→ () error→ )) error→Entering state 27 error→Reducing stack 0 by rule 2 (line 289): error→ $1 = nterm decs (:1.1-32: error→function _main() = error→ ( error→ print("Hello, World!\n"); error→ () error→ )) error→-> $$ = nterm program (:1.1-32: ) error→Entering state 24 error→Now at end of input. error→Shifting token "end of file" (:1.33: ) error→Entering state 63 error→Cleanup: popping token "end of file" (:1.33: ) error→Cleanup: popping nterm program (:1.1-32: ) Example 4.2: ‘SCAN=1 PARSE=1 tc -X --parse simple.tig’ A lexical error must be properly diagnosed _and reported_. The following (generated) examples display the location: _this is not required for TC-0_; nevertheless, an error message on the standard error output is required. "\z does not exist." File 4.2: ‘back-zee.tig’ $ tc -X --parse back-zee.tig error→back-zee.tig:1.1-3: unrecognized escape: \z ⇒2 Example 4.3: ‘tc -X --parse back-zee.tig’ Similarly for syntactical errors. a++ File 4.3: ‘postinc.tig’ $ tc -X --parse postinc.tig error→postinc.tig:1.3: syntax error, unexpected + ⇒3 Example 4.4: ‘tc -X --parse postinc.tig’ 4.2.3 PTHL Code to Write ------------------------ We don’t need several directories, you can program in the top level of the package. You must write: ‘src/scantiger.ll’ The scanner. ‘lval’ supports strings, integers and even symbols. Nevertheless, symbols (i.e., identifiers) are returned as plain C++ strings for the time being: the class ‘misc::symbol’ is introduced in TC-1. If the environment variable ‘SCAN’ is defined (to whatever value) Flex scanner debugging traces are enabled, i.e., set the variable ‘yy_flex_debug’ to 1. ‘src/parsetiger.yy’ The parser, and maybe ‘main’ if you wish. Bison advanced features will be used in TC-1. − Use C++ (e.g., C++ I/O streams, strings, etc.) − Use C++ features of Bison. − Use locations. − Use ‘%expect 0’ to have Bison report conflicts are genuine errors. − Use the ‘%require "3.0"’ directive to prevent any problem due to old versions of Bison. − Use the ‘%define api.value.type variant’ directive to ask Bison for C++ object support in the semantic values. Without this, Bison uses ‘union’, which can be used to store objects (just Plain Old Data), hence pointers and dynamic allocation must be used. − Use the ‘%define api.token.constructor’ directive to request that symbols be handled as a whole (token type, location, and possibly semantic value) in the scanner through ‘parse::parser::make_SYMBOL’ routines. − Use the environment variable ‘PARSE’ to enable parser traces, i.e., to set ‘yydebug’ to 1, run: PARSE=1 tc foo.tig − Use ‘%printer’ to improve the tracing of semantic values. For instance, %define api.value.type variant %token INT "integer" %printer { yyo << $$; } ‘Makefile’ This file is mandatory. Running ‘make’ _must build an executable ‘tc’ in the root directory_. The GNU Build System is not mandatory: TC-1 introduces Autoconf, Automake etc. You may use it, in which case we will run ‘configure’ before ‘make’. 4.2.4 PTHL FAQ -------------- Translating escapes in the scanner (or not) Escapes in string can be translated at the scanning stage, or kept as is. That is, the string ‘"\n"’ can produce a token ‘STRING’ with the semantic value ‘\n’ (translation) or ‘\\n’ (no translation). You are free to choose your favorite implementation, but keep in mind that if you translate, you’ll have to “untranslate” later (i.e., convert ‘\n’ back to ‘\\n’). We encourage you to do this translation, but the other solution is also correct, as long as the next steps of your compiler follow the same conventions as your input. You must check for bad escapes whatever solution you choose. Must lexical and syntactic extensions be implemented? No. Language extensions (*note (tiger)Language Extensions::) such as metavariables keywords (‘_decs’, ‘_exp’, ‘_lvalue’, ‘_namety’) and casts (‘_cast’) are not required for PTHL. Handling metavariables constructs becomes mandatory at TC-2 (*note TC-2 Code to Write::) where they are used within TWEASTs (Text With Embedded AST, see ‘ast.pdf’(1)), while casts are only needed for the optional bounds checking assignment (*note TC-B::). What values can be represented by an ‘int’? The set of valid integer values is the set of signed 32-bit integers in 2’s complement, that is the integer interval [-2^{31}, 2^{31}-1]. What values can be represented by an integer literal? Although an integer _value_ can be any number in [-2^{31}, 2^{31}-1], it is however not possible to represent the _literal_ -2^{31} (= -2147483648) for technical reasons. It is however possible to create an integer value representing this number. To put it in nutshell, the following declaration is not valid: var i := -2147483648 whereas this one is: var i := -2147483647 - 1 ---------- Footnotes ---------- (1) . 4.2.5 PTHL Improvements ----------------------- Possible improvements include: Using ‘%destructor’ You may use ‘%destructor’ to reclaim the memory lost during the error recovery. It is mandated in TC-2, see *note TC-2 FAQ::. Parser driver You may implement a parser driver to handle the parsing context (flags, open files, etc.). Note that a driver class will be (partially) provided at TC-1. Handling object-related constructs from PTHL Your scanner and parser are not required to support OO constructs at PTHL, but you can implement them in your LALR(1) parser if you want. (Fully supporting them at TC-2 is highly recommended though, during the conversion of your LALR(1) parser to a GLR one.) Object-related productions from the Tiger grammar(1) are: # Class definition (canonical form). ty ::= ‘class’ [ ‘extends’ type-id ] ‘{’ classfields ‘}’ # Class definition (alternative form). dec ::= ‘class’ id [ ‘extends’ type-id ] ‘{’ classfields ‘}’ classfields ::= { classfield } # Class fields. classfield ::= # Attribute declaration. vardec # Method declaration. | ‘method’ id ‘(’ tyfields ‘)’ [ ‘:’ type-id ] ‘=’ exp # Object creation. exp ::= ‘new’ type-id # Method call. exp ::= lvalue ‘.’ id ‘(’ [ exp { ‘,’ exp }] ‘)’ ---------- Footnotes ---------- (1) . 4.3 TC-1, Scanner and Parser ============================ *2020-TC-1 submission for Ing1 students is Sunday, February 4th 2018 at 11:42.* This section has been updated for EPITA-2020 on 2016-01-27. Scanner and parser are properly running, but the abstract syntax tree is not built yet. Differences with PTHL (TC-0) include: GNU Build System Autoconf, Automake are used. Options, Tasks The compiler supports basic options via in the Task module. *Note (tiger)Invoking tc::, for the list of options to support. Locations The locations are properly computed and reported in the error messages. Relevant lecture notes include ‘dev-tools.pdf’(1) and ‘scanner.pdf’(2). ---------- Footnotes ---------- (1) . (2) . 4.3.1 TC-1 Goals ---------------- Things to learn during this stage that you should remember: Basic use of the GNU Build System Autoconf, Automake. The initial set up of the project will best be done via ‘autoreconf -fvim’, but once the project initiated (i.e., ‘configure’ and the ‘Makefile.in’s exist) you should depend on ‘make’ only. *Note The GNU Build System::. Integration into an existing framework Putting your own code into the provided code base. Basic C++ classes The classes ‘Location’ and ‘Position’ provide a good start to study foreign C++ classes. Your understanding them will be controlled, including the ‘operator’s. Location Tracking Issues within the scanner and the parser. Implementation of a few simple C++ classes The code for ‘misc::symbol’ and ‘misc::unique’ is incomplete. A first standard container: ‘std::set’ The implementation of the ‘misc::unique’ class relies on ‘std::set’. The Flyweight design pattern The ‘misc::unique’ class is an implementation of the Flyweight design pattern. Version Control System Using the Git version control system is mandatory. Your understanding of it will be checked. 4.3.2 TC-1 Samples ------------------ The only information the compiler provides is about lexical and syntax errors. If there are no errors, the compiler shuts up, and exits successfully: /* An array type and an array variable. */ let type arrtype = array of int var arr1 : arrtype := arrtype [10] of 0 in arr1[2] end File 4.4: ‘test01.tig’ $ tc -X --parse test01.tig Example 4.5: ‘tc -X --parse test01.tig’ If there are lexical errors, the exit status is 2, and an error message is output on the standard error output. Its format is standard and mandatory: file, (precise) location, and then the message (*note (tiger)Errors::). 1 /* This comments starts at /* 2.2 */ File 4.5: ‘unterminated-comment.tig’ $ tc -X --parse unterminated-comment.tig error→unterminated-comment.tig:2.2-3.0: unexpected end of file in a comment ⇒2 Example 4.6: ‘tc -X --parse unterminated-comment.tig’ If there are syntax errors, the exit status is set to 3: let var a : nil := () in 1 end File 4.6: ‘type-nil.tig’ $ tc -X --parse type-nil.tig error→type-nil.tig:1.13-15: syntax error, unexpected nil, expecting identifier or _namety ⇒3 Example 4.7: ‘tc -X --parse type-nil.tig’ If there are errors which are non lexical, nor syntactic (Windows will not pass by me): $ tc C:/TIGER/SAMPLE.TIG error→tc: cannot open `C:/TIGER/SAMPLE.TIG': No such file or directory ⇒1 Example 4.8: ‘tc C:/TIGER/SAMPLE.TIG’ The option ‘--parse-trace’, which relies on Bison’s ‘%debug’ and ‘%printer’ directives, must work properly(1): a + "a" File 4.7: ‘a+a.tig’ $ tc -X --parse-trace --parse a+a.tig error→Parsing file: "a+a.tig" error→Starting parse error→Entering state 0 error→Reading a token: Next token is token "identifier" (a+a.tig:1.1: a) error→Shifting token "identifier" (a+a.tig:1.1: a) error→Entering state 2 error→Reading a token: Next token is token "+" (a+a.tig:1.3: ) error→Reducing stack 0 by rule 90 (line 585): error→ $1 = token "identifier" (a+a.tig:1.1: a) error→-> $$ = nterm varid (a+a.tig:1.1: a) error→Entering state 33 error→Reducing stack 0 by rule 38 (line 402): error→ $1 = nterm varid (a+a.tig:1.1: a) error→-> $$ = nterm lvalue (a+a.tig:1.1: a) error→Entering state 26 error→Next token is token "+" (a+a.tig:1.3: ) error→Reducing stack 0 by rule 35 (line 395): error→ $1 = nterm lvalue (a+a.tig:1.1: a) error→-> $$ = nterm exp (a+a.tig:1.1: a) error→Entering state 25 error→Next token is token "+" (a+a.tig:1.3: ) error→Shifting token "+" (a+a.tig:1.3: ) error→Entering state 74 error→Reading a token: Next token is token "string" (a+a.tig:1.5-7: a) error→Shifting token "string" (a+a.tig:1.5-7: a) error→Entering state 1 error→Reducing stack 0 by rule 4 (line 296): error→ $1 = token "string" (a+a.tig:1.5-7: a) error→-> $$ = nterm exp (a+a.tig:1.5-7: "a") error→Entering state 119 error→Reading a token: Now at end of input. error→Reducing stack 0 by rule 29 (line 376): error→ $1 = nterm exp (a+a.tig:1.1: a) error→ $2 = token "+" (a+a.tig:1.3: ) error→ $3 = nterm exp (a+a.tig:1.5-7: "a") error→-> $$ = nterm exp (a+a.tig:1.1-7: (a + "a")) error→Entering state 25 error→Now at end of input. error→Reducing stack 0 by rule 1 (line 287): error→ $1 = nterm exp (a+a.tig:1.1-7: (a + "a")) error→-> $$ = nterm program (a+a.tig:1.1-7: ) error→Entering state 24 error→Now at end of input. error→Shifting token "end of file" (a+a.tig:2.1: ) error→Entering state 63 error→Cleanup: popping token "end of file" (a+a.tig:2.1: ) error→Cleanup: popping nterm program (a+a.tig:1.1-7: ) error→Parsing string: function _main() = (_exp(0); ()) error→Starting parse error→Entering state 0 error→Reading a token: Next token is token "function" (:1.1-8: ) error→Shifting token "function" (:1.1-8: ) error→Entering state 8 error→Reading a token: Next token is token "identifier" (:1.10-14: _main) error→Shifting token "identifier" (:1.10-14: _main) error→Entering state 43 error→Reading a token: Next token is token "(" (:1.15: ) error→Shifting token "(" (:1.15: ) error→Entering state 93 error→Reading a token: Next token is token ")" (:1.16: ) error→Reducing stack 0 by rule 95 (line 605): error→-> $$ = nterm funargs (:1.16: ) error→Entering state 144 error→Next token is token ")" (:1.16: ) error→Shifting token ")" (:1.16: ) error→Entering state 186 error→Reading a token: Next token is token "=" (:1.18: ) error→Reducing stack 0 by rule 86 (line 567): error→-> $$ = nterm typeid.opt (:1.17: ) error→Entering state 215 error→Next token is token "=" (:1.18: ) error→Shifting token "=" (:1.18: ) error→Entering state 231 error→Reading a token: Next token is token "(" (:1.20: ) error→Shifting token "(" (:1.20: ) error→Entering state 12 error→Reading a token: Next token is token "_exp" (:1.21-24: ) error→Shifting token "_exp" (:1.21-24: ) error→Entering state 21 error→Reading a token: Next token is token "(" (:1.25: ) error→Shifting token "(" (:1.25: ) error→Entering state 60 error→Reading a token: Next token is token "integer" (:1.26: 0) error→Shifting token "integer" (:1.26: 0) error→Entering state 106 error→Reading a token: Next token is token ")" (:1.27: ) error→Shifting token ")" (:1.27: ) error→Entering state 164 error→Reducing stack 0 by rule 37 (line 397): error→ $1 = token "_exp" (:1.21-24: ) error→ $2 = token "(" (:1.25: ) error→ $3 = token "integer" (:1.26: 0) error→ $4 = token ")" (:1.27: ) error→-> $$ = nterm exp (:1.21-27: (a + "a")) error→Entering state 48 error→Reading a token: Next token is token ";" (:1.28: ) error→Reducing stack 0 by rule 48 (line 424): error→ $1 = nterm exp (:1.21-27: (a + "a")) error→-> $$ = nterm exps.1 (:1.21-27: (a + "a")) error→Entering state 49 error→Next token is token ";" (:1.28: ) error→Shifting token ";" (:1.28: ) error→Entering state 99 error→Reading a token: Next token is token "(" (:1.30: ) error→Shifting token "(" (:1.30: ) error→Entering state 12 error→Reading a token: Next token is token ")" (:1.31: ) error→Reducing stack 0 by rule 52 (line 436): error→-> $$ = nterm exps.0.2 (:1.31: ) error→Entering state 51 error→Next token is token ")" (:1.31: ) error→Shifting token ")" (:1.31: ) error→Entering state 100 error→Reducing stack 0 by rule 11 (line 321): error→ $1 = token "(" (:1.30: ) error→ $2 = nterm exps.0.2 (:1.31: ) error→ $3 = token ")" (:1.31: ) error→-> $$ = nterm exp (:1.30-31: ()) error→Entering state 153 error→Reading a token: Next token is token ")" (:1.32: ) error→Reducing stack 0 by rule 51 (line 431): error→ $1 = nterm exps.1 (:1.21-27: (a + "a")) error→ $2 = token ";" (:1.28: ) error→ $3 = nterm exp (:1.30-31: ()) error→-> $$ = nterm exps.2 (:1.21-31: (a + "a"), ()) error→Entering state 50 error→Reducing stack 0 by rule 53 (line 437): error→ $1 = nterm exps.2 (:1.21-31: (a + "a"), ()) error→-> $$ = nterm exps.0.2 (:1.21-31: (a + "a"), ()) error→Entering state 51 error→Next token is token ")" (:1.32: ) error→Shifting token ")" (:1.32: ) error→Entering state 100 error→Reducing stack 0 by rule 11 (line 321): error→ $1 = token "(" (:1.20: ) error→ $2 = nterm exps.0.2 (:1.21-31: (a + "a"), ()) error→ $3 = token ")" (:1.32: ) error→-> $$ = nterm exp (:1.20-32: ( error→ (a + "a"); error→ () error→)) error→Entering state 239 error→Reading a token: Now at end of input. error→Reducing stack 0 by rule 93 (line 598): error→ $1 = token "function" (:1.1-8: ) error→ $2 = token "identifier" (:1.10-14: _main) error→ $3 = token "(" (:1.15: ) error→ $4 = nterm funargs (:1.16: ) error→ $5 = token ")" (:1.16: ) error→ $6 = nterm typeid.opt (:1.17: ) error→ $7 = token "=" (:1.18: ) error→ $8 = nterm exp (:1.20-32: ( error→ (a + "a"); error→ () error→)) error→-> $$ = nterm fundec (:1.1-32: error→function _main() = error→ ( error→ (a + "a"); error→ () error→ )) error→Entering state 35 error→Now at end of input. error→Reducing stack 0 by rule 91 (line 593): error→ $1 = nterm fundec (:1.1-32: error→function _main() = error→ ( error→ (a + "a"); error→ () error→ )) error→-> $$ = nterm fundecs (:1.1-32: error→function _main() = error→ ( error→ (a + "a"); error→ () error→ )) error→Entering state 34 error→Now at end of input. error→Reducing stack 0 by rule 54 (line 447): error→-> $$ = nterm decs (:1.33: ) error→Entering state 83 error→Reducing stack 0 by rule 57 (line 451): error→ $1 = nterm fundecs (:1.1-32: error→function _main() = error→ ( error→ (a + "a"); error→ () error→ )) error→ $2 = nterm decs (:1.33: ) error→-> $$ = nterm decs (:1.1-32: error→function _main() = error→ ( error→ (a + "a"); error→ () error→ )) error→Entering state 27 error→Reducing stack 0 by rule 2 (line 289): error→ $1 = nterm decs (:1.1-32: error→function _main() = error→ ( error→ (a + "a"); error→ () error→ )) error→-> $$ = nterm program (:1.1-32: ) error→Entering state 24 error→Now at end of input. error→Shifting token "end of file" (:1.33: ) error→Entering state 63 error→Cleanup: popping token "end of file" (:1.33: ) error→Cleanup: popping nterm program (:1.1-32: ) Example 4.9: ‘tc -X --parse-trace --parse a+a.tig’ Note that (i), ‘--parse’ is needed, (ii), it cannot see that the variable is not declared nor that there is a type checking error, since type checking... is not implemented, and (iii), the output might be slightly different, depending upon the version of Bison you use. But what matters is that one can see the items: ‘"identifier" a’, ‘"string" a’. ---------- Footnotes ---------- (1) For the time being, forget about ‘-X’. 4.3.3 TC-1 Given Code --------------------- Some code is provided through the ‘tc-base’ repository; use tags ‘2020-tc-base-1.0’ to integrate it with your existing code base. See *note Given Code:: for more information on using the ‘tc-base’ Git repository. See *note The Top Level::, *note src::, *note src/parse::, *note lib/misc::. 4.3.4 TC-1 Code to Write ------------------------ Be sure to read Flex and Bison documentations and tutorials, see *note Flex & Bison::. ‘configure.ac’ ‘Makefile.am’ Include your own test suite in the ‘tests’ directory, and hook it to ‘make check’. ‘src/parse/scantiger.ll’ The scanner must be completed to read strings, identifiers etc. and track locations. − Strings will be stored as C++ ‘std::string’. See the following code for the basics. ... \" grown_string.clear(); BEGIN SC_STRING; { /* Handling of the strings. Initial " is eaten. */ \" { BEGIN INITIAL; return TOKEN_VAL(STRING, grown_string); } ... \\x[0-9a-fA-F]{2} { grown_string.append(1, strtol(yytext + 2, 0, 16)); } ... } − Symbols (i.e., identifiers) must be returned as ‘misc::symbol’ objects, not strings. − The locations are tracked. The class ‘Location’ to use is produced by Bison: ‘src/parse/location.hh’. To track of locations, adjust your scanner, use ‘YY_USER_ACTION’ and the ‘yylex’ prologue: ... %% %{ // Everything here is run each time yylex is _invoked_. %} "if" return TOKEN(IF); ... %% ... See the lecture notes, and read the C++ chapter of (1), GNU Bison’s documentation. Note that the version being used for the Tiger project may differ from the latest public release, thus students should build their own documentation by running ‘make html’ in the provided Bison tarball. Pay special attention to its “Complete C++ Example(2)” which is _very much_ like our set up. ‘src/parse/parsetiger.yy’ − The grammar must be complete but without actions. − Use ‘%printer’ to implement ‘--parse-trace’ support for terminals (*note TC-1 Samples::) ‘src/parse/tiger-parser.cc’ The class ‘TigerParser’ drives the lexing and parsing of input file. Its implementation in ‘src/parse/tiger-parser.cc’ is incomplete. ‘lib/misc/symbol.*’ ‘lib/misc/unique.*’ The class ‘misc::symbol’ keeps a single copy of identifiers, see *note lib/misc::. Its implementation in ‘lib/misc/symbol.hxx’ and ‘lib/misc/symbol.cc’ is incomplete. Note that running ‘make check’ in ‘lib/misc’ exercises ‘lib/misc/test-symbol.cc’: having this unit test pass should be a goal by itself. As a matter of fact, unit tests were left to help you: once they pass successfully you may proceed to the rest of the compiler. ‘misc::symbol’’s implementation is based on ‘misc::unique’, a generic class implementing the Flyweight design pattern. The definition of this class, ‘lib/misc/unique.hxx’, is also to be completed. ‘lib/misc/variant.*’ The implementation of the class template ‘misc::variant’ lacks a couple of conversion operators that you have to supply. ---------- Footnotes ---------- (1) . (2) . 4.3.5 TC-1 FAQ -------------- Bison reports type clashes Bison may report type clashes for some actions. For instance, if you have given a type to ‘"string"’, but none to ‘exp’, then it will choke on: exp: "string"; because, unless you used ‘%define variant’, it actually means exp: "string" { $$ = $1; }; which is not type consistent. So write this instead: exp: "string" {}; Where is ‘ast::Exp’? Its real definition will be provided with TC-2, so meanwhile you have to provide a fake. We recommend for a forward declaration of ‘ast::Exp’ in ‘libparse.hh’. Finding ‘prelude.tih’ When run, the compiler needs the file ‘prelude.tih’ that includes the signature of all the primitives. But the executable ‘tc’ is typically run in two very different contexts: installed An installed binary will look for an installed ‘prelude.tih’, typically in ‘/usr/local/share/tc/’. The ‘cpp’ macro ‘PKGDATADIR’ is set to this directory. Its value depends on the use of ‘configure’’s option ‘--prefix’, defaulting to ‘/usr/local’. compiled, not installed When compiled, the binary will look for the installed ‘prelude.tih’, and of course will fail if it has never been installed. There are two means to address this issue: The environment variable ‘TC_PKGDATADIR’ If set, it overrides the value of ‘PKGDATADIR’. The option ‘--library-prepend’/‘-p’ Using this option you may set the library file search path to visit the given directory _before_ the built-in default value. For instance ‘tc -p /tmp foo.tig’ will first look for ‘prelude.tih’ in ‘/tmp’. Must import be functional? Yes. Read the previous item. 4.3.6 TC-1 Improvements ----------------------- Possible improvements include: 4.4 TC-2, Building the Abstract Syntax Tree =========================================== *2020-TC-2 submission for Ing1 students is Sunday, February 25th 2018 at 11:42.* This section has been updated for EPITA-2020 on 2016-01-27. At the end of this stage, the compiler can build abstract syntax trees of Tiger programs and pretty-print them. The parser is now a GLR parser and equipped with error recovery. The memory is properly deallocated on demand. The code must follow our coding style and be documented, see *note Coding Style::, and *note Doxygen::. Relevant lecture notes include ‘dev-tools.pdf’(1), ‘ast.pdf’(2). ---------- Footnotes ---------- (1) . (2) . 4.4.1 TC-2 Goals ---------------- Things to learn during this stage that you should remember: Strict Coding Style Following a strict coding style is an essential part of collaborative work. Understanding the rationales behind rules is even better. *Note Coding Style::. Memory Leak Trackers Using tools such as Valgrind (*note Valgrind::) to track memory leaks. Understanding the use of a GLR Parser The parser should now use all the possibilities of a GLR parser. Error recovery with Bison Using the ‘error’ token, and building usable ASTs in spite of lexical/syntax errors. Using STL containers The AST uses ‘std::vector’, ‘misc::symbol’ uses ‘std::set’. Inheritance The AST hierarchy is typical example of a proper use of inheritance, together with... Inclusion polymorphism An intense use of inclusion polymorphism for ‘accept’. Use of constructors and destructors In particular using the destructors to reclaim memory bound to components. ‘virtual’ Dynamic and static bindings. ‘misc::indent’ ‘misc::indent’ extends ‘std::ostream’ with indentation features. Use it in the ‘PrettyPrinter’ to pretty-print. Understanding how ‘misc::indent’ works will be checked later, see *note TC-3 Goals::. The Composite design pattern The AST hierarchy is an implementation of the Composite pattern. The Visitor design pattern The ‘PrettyPrinter’ is an implementation of the Visitor pattern. Writing good developer documentation (using Doxygen) The AST must be properly documented. 4.4.2 TC-2 Samples ------------------ Here are a few samples of the expected features. 4.4.2.1 TC-2 Pretty-Printing Samples .................................... The parser builds abstract syntax trees that can be output by a pretty-printing module: /* Define a recursive function. */ let /* Calculate n!. */ function fact (n : int) : int = if n = 0 then 1 else n * fact (n - 1) in fact (10) end File 4.8: ‘simple-fact.tig’ $ tc -XA simple-fact.tig /* == Abstract Syntax Tree. == */ function _main() = ( let function fact(n : int) : int = (if (n = 0) then 1 else (n * fact((n - 1)))) in fact(10) end; () ) Example 4.10: ‘tc -XA simple-fact.tig’ The pretty-printed output must be _valid_ and _equivalent_. _Valid_ means that any Tiger compiler must be able to parse with success your output. Pay attention to the banners such as ‘== Abstract...’: you should use comments: ‘/* == Abstract... */’. Pay attention to special characters too. print("\"\x45\x50ITA\"\n") File 4.9: ‘string-escapes.tig’ $ tc -XA string-escapes.tig /* == Abstract Syntax Tree. == */ function _main() = ( print("\"EPITA\"\n"); () ) Example 4.11: ‘tc -XA string-escapes.tig’ _Equivalent_ means that, except for syntactic sugar, the output and the input are equal. Syntactic sugar refers to ‘&’, ‘|’, unary ‘-’, etc. 1 = 1 & 2 = 2 File 4.10: ‘1s-and-2s.tig’ $ tc -XA 1s-and-2s.tig /* == Abstract Syntax Tree. == */ function _main() = ( (if (1 = 1) then ((2 = 2) <> 0) else 0); () ) Example 4.12: ‘tc -XA 1s-and-2s.tig’ $ tc -XA 1s-and-2s.tig >output.tig Example 4.13: ‘tc -XA 1s-and-2s.tig >output.tig’ $ tc -XA output.tig /* == Abstract Syntax Tree. == */ function _main() = ( (if (1 = 1) then ((2 = 2) <> 0) else 0); () ) Example 4.14: ‘tc -XA output.tig’ Beware that ‘for’ loops are encoded using a ‘ast::VarDec’: do not display the ‘var’: for i := 0 to 100 do (print_int (i)) File 4.11: ‘for-loop.tig’ $ tc -XA for-loop.tig /* == Abstract Syntax Tree. == */ function _main() = ( (for i := 0 to 100 do print_int(i)); () ) Example 4.15: ‘tc -XA for-loop.tig’ Parentheses must not stack for free; you must even remove them as the following example demonstrates. ((((((((((0)))))))))) File 4.12: ‘parens.tig’ $ tc -XA parens.tig /* == Abstract Syntax Tree. == */ function _main() = ( 0; () ) Example 4.16: ‘tc -XA parens.tig’ This is not a pretty-printer trick: the ASTs of this program and that of ‘0’ are exactly the same: a single ‘ast::IntExp’. As a result, _anything output by ‘tc -A’ is equal to what ‘tc -A | tc -XA -’ displays!_ 4.4.2.2 TC-2 Chunks ................... The type checking rules of Tiger, or rather its binding rules, justify the contrived parsing of declarations. This is why this section uses ‘-b’/‘--bindings-compute’, implemented later (*note TC-3::). In Tiger, to support recursive types and functions, continuous declarations of functions and continuous declarations of types are considered “simultaneously”. For instance in the following program, ‘foo’ and ‘bar’ are visible in each other’s scope, and therefore the following program is correct wrt type checking. let function foo() : int = bar() function bar() : int = foo() in 0 end File 4.13: ‘foo-bar.tig’ $ tc -b foo-bar.tig Example 4.17: ‘tc -b foo-bar.tig’ In the following sample, because ‘bar’ is not declared in the same bunch of declarations, it is not visible during the declaration of ‘foo’. The program is invalid. let function foo() : int = bar() var stop := 0 function bar() : int = foo() in 0 end File 4.14: ‘foo-stop-bar.tig’ $ tc -b foo-stop-bar.tig error→foo-stop-bar.tig:1.28-32: undeclared function: bar ⇒4 Example 4.18: ‘tc -b foo-stop-bar.tig’ The same applies to types. We shall name “chunk” a continuous series of type (or function) declaration. A single name cannot be defined more than once in a chunk. let function foo() : int = 0 function bar() : int = 1 function foo() : int = 2 var stop := 0 function bar() : int = 3 in 0 end File 4.15: ‘fbfsb.tig’ $ tc -b fbfsb.tig error→fbfsb.tig:3.5-28: redefinition: foo error→fbfsb.tig:1.5-28: first definition ⇒4 Example 4.19: ‘tc -b fbfsb.tig’ It behaves exactly as if chunks were part of embedded ‘let in end’, i.e., as if the previous program was syntactic sugar for the following one (in fact, in 2006-tc used to desugar it that way). let function foo() : int = 0 function bar() : int = 1 in let function foo() : int = 2 in let var stop := 0 in let function bar() : int = 3 in 0 end end end end File 4.16: ‘fbfsb-desugared.tig’ Given the type checking rules for variables, whose definitions cannot be recursive, chunks of variable declarations are reduced to a single variable. 4.4.2.3 TC-2 Error Recovery ........................... Your parser must be robust to (some) syntactic errors. Observe that on the following input several parse errors are reported, not merely the first one: ( 1; (2, 3); (4, 5); 6 ) File 4.17: ‘multiple-parse-errors.tig’ $ tc multiple-parse-errors.tig error→multiple-parse-errors.tig:3.5: syntax error, unexpected ",", expecting ; error→multiple-parse-errors.tig:4.5: syntax error, unexpected ",", expecting ; ⇒3 Example 4.20: ‘tc multiple-parse-errors.tig’ Of course, the exit status still reveals the parse error. Error recovery must not break the rest of the compiler. $ tc -XA multiple-parse-errors.tig error→multiple-parse-errors.tig:3.5: syntax error, unexpected ",", expecting ; error→multiple-parse-errors.tig:4.5: syntax error, unexpected ",", expecting ; /* == Abstract Syntax Tree. == */ function _main() = ( ( 1; (); (); 6 ); () ) ⇒3 Example 4.21: ‘tc -XA multiple-parse-errors.tig’ 4.4.3 TC-2 Given Code --------------------- Code is provided through the ‘tc-base’ repository, using tag ‘2020-tc-base-2.0’. For a description of the new modules, see *note lib/misc::, and *note src/ast::. 4.4.4 TC-2 Code to Write ------------------------ What is to be done: ‘src/parse/parsetiger.yy’ Build the AST Complete actions to instantiate AST nodes. Support object-related syntax Supporting object constructs, an improvement suggested for TC-0 (*note PTHL Improvements::), is highly recommended. Support metavariable constructs Augment your scanner and your parser to support the (reserved) keywords ‘_decs’, ‘_exp’, ‘_lvalue’ and ‘_namety’ and implement the corresponding grammar rules (*note (tiger)Language Extensions::). The semantic actions of these productions shall use the ‘metavar’ function template to fetch the right AST subtree from the ‘parse::Tweast’ object attached to the parsing context (‘parse::TigerParser’ instance). Implement error recovery. There should be at least three uses of the token ‘error’. Read the Bison documentation about it. Use ‘%printer’ Extend the use of ‘%printer’ to display non-terminals. Use ‘%destructor’ Use ‘%destructor’ to reclaim the memory bound to semantic values thrown away during error recovery. GLR Change your skeleton to ‘glr.cc’, use the ‘%glr-parser’ directive. Thanks to GLR, conflicts (S/R and/or R/R) can be accepted. Use ‘%expect’ and ‘%expect-rr’ to specify their number. For information, we have no R/R conflicts, and two S/R: one related to the “big lvalue” issue, and the other to the implementation of the two ‘_cast’ operators (*note (tiger)Additional Syntactic Specifications::). Chunks In order to implement easily the type checking of declarations and to simplify following modules, adjust your grammar _to parse declarations by chunks_. The implementations of these chunks are in ‘ast::FunctionDecs’, ‘ast::MethodDecs’, ‘ast::VarDecs’, and ‘ast::TypeDecs’; they are implemented thanks to ‘ast::AnyDecs’). Note that an ‘ast::VarDecs’ node appearing in a declaration list shall contain exactly one ‘ast::VarDec’ object (*note TC-2 Chunks::); however, an ‘ast::VarDecs’ used to implement a function’s formal arguments may of course contain several ‘ast::VarDec’ (one per formal). ‘src/ast’ Complete the abstract syntax tree module: no ‘FIXME:’ should be left. Several files are missing in full. See ‘src/ast/README’ for additional information on the missing classes. ‘src/ast/default-visitor.hxx’ Complete the ‘GenDefaultVisitor’ class template. It is the basis for following visitors in the Tiger compiler. ‘src/ast/object-visitor.hxx’ Likewise, complete ‘GenObjectVisitor’. This class template is used to instantiate visitors factoring common code (default traversals of object-related nodes) and serves as a base class of ‘ast::PrettyPrinter’ (and later ‘bind::Binder’). ‘src/ast/pretty-printer.hh’ ‘src/ast/pretty-printer.cc’ The ‘PrettyPrinter’ class must be written entirely. It must use the ‘misc::xalloc’ features to support indentation. 4.4.5 TC-2 FAQ -------------- A ‘NameTy’, or a ‘symbol’ At some places, you may use one or the other. Just ask yourself which is the most appropriate given the context. Appel is not always right. Bison Be sure to read its dedicated section: *note Flex & Bison::. Memory leaks in the parser during error recovery To reclaim the memory during error recovery, use the ‘%destructor’ directive: %type exp %type lvalue %destructor { delete $$; } /* ... */; Memory leaks in the standard containers *Note Valgrind::, for a pointer to the explanation and solution. How do I use ‘misc::error’ *Note misc/error::, for a description of this component. In the case of the parse module, ‘TigerParser’ aggregates the local error handler. From ‘scan_open’, for instance, your code should look like: if(!yyin) error_ << misc::error::failure << program_name << ": cannot open `" << name << "': " << strerror(errno) << std::endl << &misc::error::exit; ‘ast::fields_type’ vs. ‘ast::VarDecs’ Record definition vs. Function declaration The grammar of the Tiger language (*note (tiger)Syntactic Specifications::) includes: # Function, primitive and method declarations. ::= "function" "(" ")" [ ":" ] "=" | "primitive" "(" ")" [ ":" ] ::= "method" "(" ")" [ ":" ] "=" # Record type declaration. ::= "{" "}" # List of “id : type”. ::= [ ":" { "," ":" } ] This grammar snippet shows that we used ‘tyfields’ several times, in two very _different_ contexts: a list of formal arguments of a function, primitive or method; and a list of record fields. The fact that the syntax is similar in both cases is an “accident”: it is by no means required by the language. A. Appel could have chosen to make them different, but what would have been the point then? It does make sense, sometimes, to make two different things look alike, that’s a form of economy — a sane engineering principle. If the concrete syntaxes were chosen to be identical, should it be the case for abstract too? We would say it depends: the inert data is definitely the same, but the behaviors (i.e., the handling in the various visitors) are very different. So if your language features “inert data”, say C or ML, then keeping the same abstract syntax makes sense; if your language features “active data” — let’s call this... objects — then _it is a mistake_. Sadly enough, the first edition of Red Tiger book made this mistake, and we also did it for years. The second edition of the Tiger in Java introduces a dedicated abstract syntax for formal arguments; we made a different choice: there is little difference between formal arguments and local variables, so we use a ‘VarDecs’, which fits nicely with the semantics of chunks. Regarding the abstract syntax of a record type declaration, we use a list of ‘Field’s (aka ‘fields_type’). Of course this means that you will have to _duplicate_ your parsing of the ‘tyfields’ non-terminal in your parser. ‘ast::DefaultVisitor’ and ‘ast::NonObjectVisitor’ The existence of ‘ast::NonObjectVisitor’ is the result of a reasonable compromise between (relative) safety and complexity. The problem is: as object-aware programs are to be desugared into object-free ones, (a part of) our front-end infrastructure must support two kinds of traversals: − Traversals dealing with AST with objects: ‘ast::PrettyPrinter’, ‘object::Binder’, ‘object::TypeChecker’, ‘object::DesugarVisitor’. − Traversals dealing with AST without objects ‘bind::Binder’, ‘type::TypeChecker’, and all other AST visitors. The first category has visit methods for all type of nodes of our (object-oriented) AST, so they raise no issue. On the other hand, the second category of visitors knows nothing about objects, and should either be unable to visit AST w/ objects (static solution) or raise an error if they encounter objects (dynamic solution). Which led us to several solutions: 1. Consider that we have two kinds of visitors, and thus two _hierarchies_ of visitors. Two hierarchies might confuse the students, and make the maintenance harder. Hooks in the AST nodes (‘accept’ methods) must be duplicated, too. 2. Have a single hierarchy of visitors, but equip all concrete visitors traversing ASTs w/o objects with methods visiting object-related node aborting at run time. 3. Likewise, but factor the aborting methods in a single place, namely ‘ast::NonObjectVisitor’. That is the solution we chose. Solutions 2 and 3 let us provide a default visitor for ASTs without objects, but it’s harder to have a meaningful default visitor for ASTs _with_ objects: indeed, concrete visitors on ASTs w/ objects inherit from their non-object counterparts, where methods visiting object nodes are _already_ defined! (Though they abort at run time.) We have found that having two visitors (‘ast::DefaultVisitor’ and ‘ast::NonObjectVisitor’) to solve this problem was more elegant, rather than merging both of them in ‘ast::DefaultVisitor’. The pros are that ‘ast::DefaultVisitor’ remains a default visitor; the cons are that this visitor is now abstract, since object-related nodes have no visit implementation. Therefore, we also introduced an ‘ast::ObjectVisitor’ performing default visits of the remaining node types; the combined inheritance of both ‘ast::DefaultVisitor’ and ‘ast::ObjectVisitor’ provides a complete default visitor. 4.4.6 TC-2 Improvements ----------------------- Possible improvements include: Desugar Boolean operators and unary minus in concrete syntax In the original version of the exercise, the ‘|’ and ‘&’ operators and the unary minus operator are _desugared_ in abstract syntax (i.e., using explicit instantiations of AST nodes). Using ‘TigerInput’, you can desugar using Tiger’s concrete syntax instead. This second solution is advised. Introduce an ‘Error’ class When syntactic errors are caught, a valid AST must be built anyway, hence a critical question is: what value should be given to the missing bits? If your error recovery is not compatible with what the user meant, you are likely to create artificial type errors with your invented value. While this behavior is compliant with the assignment, you may improve this by introducing an ‘Error’ class (one?), which will never trigger type checking errors. Using Generic Visitors Andrei Alexandrescu has done a very interesting work on generic implementation of Visitors, see *note Modern C++ Design::. It does require advanced C++ skills, since it is based on type lists, which requires heavy use of templates. Using Visitor Combinators Going even further that Andrei Alexandrescu, Nicolas Tisserand proposes an implementation of Visitor combinators, see *note Generic Visitors in C++::. 4.5 TC-3, Bindings ================== *2020-TC-3 submission for Ing1 students is Sunday, March 11th 2018 at 11:42.* **note TC-R:: is part of the mandatory assignment of 2020-TC-3.* This section has been updated for EPITA-2020 on 2016-01-27. At the end of this stage, the compiler must be able to compute and display the bindings. These features are triggered by the options ‘-b’/‘--bindings-compute’, ‘--object-bindings-compute’ and ‘-B’/‘--bindings-display’. Relevant lecture notes include: ‘names.pdf’(1). ---------- Footnotes ---------- (1) . 4.5.1 TC-3 Goals ---------------- Things to learn during this stage that you should remember: The Command design pattern The ‘Task’ module is based on the Command design pattern. Writing a Container Class Template Class template are most useful to implement containers such as ‘misc::scoped_map’. Using methods from parents classes ‘super_type’ and qualified method invocation to factor common code. Traits Traits are a useful technique that allows to write (compile time) functions ranging over types. *Note Glossary::. The implementation of both hierarchies of visitors (const or not) relies on traits. You are expected to understand the code. Streams’ internal extensible arrays C++ streams allows users to dynamically store information within themselves thanks to ‘std::ios::xalloc’, ‘std::stream::iword’, and ‘std::stream::pword’ (see ‘ios_base’ documentation by Cplusplus Ressources(1)). Indented output can use it directly in ‘operator<<’, see ‘lib/misc/indent.*’ and ‘lib/misc/test-indent.cc’. More generally, if you have to resort to using ‘print’ because you need additional arguments than the sole stream, consider using this feature instead. Use this feature so that the ‘PrettyPrinter’ can be told _from the ‘std::ostream’_ whether escapes and bindings should be displayed. ---------- Footnotes ---------- (1) . 4.5.2 TC-3 Samples ------------------ “Binding” is relating a name use to its definition. let var me := 0 in me end File 4.18: ‘me.tig’ $ tc -XbBA me.tig /* == Abstract Syntax Tree. == */ function _main /* 0x563048f78b00 */() = ( let var me /* 0x563048f7b5b0 */ := 0 in me /* 0x563048f7b5b0 */ end; () ) Example 4.22: ‘tc -XbBA me.tig’ This is harder when there are several occurrences of the same name. Note that primitive types are accepted, but have no pre-declaration, contrary to primitive functions. let var me := 0 function id(me : int) : int = me in me end File 4.19: ‘meme.tig’ $ tc -XbBA meme.tig /* == Abstract Syntax Tree. == */ function _main /* 0x5566cd725b00 */() = ( let var me /* 0x5566cd7285b0 */ := 0 function id /* 0x5566cd7272d0 */(me /* 0x5566cd7261e0 */ : int /* 0 */) : int /* 0 */ = me /* 0x5566cd7261e0 */ in me /* 0x5566cd7285b0 */ end; () ) Example 4.23: ‘tc -XbBA meme.tig’ TC-3 is in charge of incorrect uses of the names, such as undefined names, me File 4.20: ‘nome.tig’ $ tc -bBA nome.tig error→nome.tig:1.1-2: undeclared variable: me ⇒4 Example 4.24: ‘tc -bBA nome.tig’ or redefined names. let type me = {} type me = {} function twice(a: int, a: int) : int = a + a in me {} = me {} end File 4.21: ‘tome.tig’ $ tc -bBA tome.tig error→tome.tig:3.3-14: redefinition: me error→tome.tig:2.3-14: first definition error→tome.tig:4.25-31: redefinition: a error→tome.tig:4.18-23: first definition ⇒4 Example 4.25: ‘tc -bBA tome.tig’ In addition to binding names, ‘--bindings-compute’ is also in charge of binding the ‘break’ to their corresponding loop construct. let var x := 0 in while 1 do ( for i := 0 to 10 do ( x := x + i; if x >= 42 then break ); if x >= 51 then break ) end File 4.22: ‘breaks-in-embedded-loops.tig’ $ tc -XbBA breaks-in-embedded-loops.tig /* == Abstract Syntax Tree. == */ function _main /* 0x55a197e92b00 */() = ( let var x /* 0x55a197e955e0 */ := 0 in (while /* 0x55a197e960a0 */ 1 do ( (for /* 0x55a197e94ae0 */ i /* 0x55a197e94280 */ := 0 to 10 do ( (x /* 0x55a197e955e0 */ := (x /* 0x55a197e955e0 */ + i /* 0x55a197e94280 */)); (if (x /* 0x55a197e955e0 */ >= 42) then break /* 0x55a197e94ae0 */ else ()) )); (if (x /* 0x55a197e955e0 */ >= 51) then break /* 0x55a197e960a0 */ else ()) )) end; () ) Example 4.26: ‘tc -XbBA breaks-in-embedded-loops.tig’ break File 4.23: ‘break.tig’ $ tc -b break.tig error→break.tig:1.1-5: `break' outside any loop ⇒4 Example 4.27: ‘tc -b break.tig’ Embedded loops show that there is scoping for ‘break’s. Beware that there are places, apparently inside loops, where ‘break’s make no sense too. Although it is a matter of definitions and uses of names, record members are not bound here, because it is easier to implement during type checking. Likewise, duplicate fields are to be reported during type checking. let type box = { value : int } type dup = { value : int, value : string } var box := box { value = 51 } in box.head end File 4.24: ‘box.tig’ $ tc -XbBA box.tig /* == Abstract Syntax Tree. == */ function _main /* 0x55c8f9b45ed0 */() = ( let type box /* 0x55c8f9b44db0 */ = { value : int /* 0 */ } type dup /* 0x55c8f9b440a0 */ = { value : int /* 0 */, value : string /* 0 */ } var box /* 0x55c8f9b44750 */ := box /* 0x55c8f9b44db0 */ { value = 51 } in box /* 0x55c8f9b44750 */.head end; () ) Example 4.28: ‘tc -XbBA box.tig’ $ tc -T box.tig error→box.tig:3.33-46: identifier multiply defined: value error→box.tig:6.3-10: invalid field: head ⇒5 Example 4.29: ‘tc -T box.tig’ But apart from these field-specific checks delayed at TC-4, TC-3 should report other name-related errors. In particular, a field with an invalid type name _is_ a binding error (related to the field’s type, not the field itself), to be reported at TC-3. let type rec = { a : unknown } in rec { a = 42 } end File 4.25: ‘unknown-field-type.tig’ $ tc -XbBA unknown-field-type.tig error→unknown-field-type.tig:2.20-26: undeclared type: unknown ⇒4 Example 4.30: ‘tc -XbBA unknown-field-type.tig’ Likewise, class members (both attributes and methods) are not to be bound at *note TC-3::, but at the type-checking stage (*note TC-4::). Therefore, no bindings are to be displayed in regards to object at *note TC-3::. let type C = class {} var c := new C in c.missing_method(); c.missing_attribute end File 4.26: ‘bad-member-bindings.tig’ $ tc -X --object-bindings-compute -BA bad-member-bindings.tig /* == Abstract Syntax Tree. == */ function _main /* 0x55914f1e3b00 */() = ( let type C /* 0x55914f1e4610 */ = class extends Object /* 0 */ { } var c /* 0x55914f1e3bd0 */ := new C /* 0x55914f1e4610 */ in ( c /* 0x55914f1e3bd0 */.missing_method(); c /* 0x55914f1e3bd0 */.missing_attribute ) end; () ) Example 4.31: ‘tc -X --object-bindings-compute -BA bad-member-bindings.tig’ $ tc --object-types-compute bad-member-bindings.tig error→bad-member-bindings.tig:5.3-20: unknown method: missing_method error→bad-member-bindings.tig:6.3-21: unknown attribute: missing_attribute ⇒5 Example 4.32: ‘tc --object-types-compute bad-member-bindings.tig’ Concerning the super class type, the compiler should just check that this type exists in the environment at *note TC-3::. Other checks are left to TC-4 (*note TC-4 Samples::). let /* Super class doesn't exist. */ class Z extends Ghost {} in end File 4.27: ‘missing-super-class.tig’ $ tc -X --object-bindings-compute -BA missing-super-class.tig error→missing-super-class.tig:3.19-23: undeclared type: Ghost ⇒4 Example 4.33: ‘tc -X --object-bindings-compute -BA missing-super-class.tig’ 4.5.3 TC-3 Given Code --------------------- Code is provided through the ‘tc-base’ repository, using tag ‘2020-tc-base-3.0’. For a description of the new module, see *note src/bind::. 4.5.4 TC-3 Code to Write ------------------------ ‘misc::scoped_map’ Complete the class template ‘misc::scoped_map’ in ‘lib/misc/scoped-map.hh’ and ‘lib/misc/scoped-map.hxx’. *Note lib/misc::, *Note scoped_map::, for more details. Equip ‘ast’ Augment constructs “using” an identifier, such as ‘CallExp’, with ‘def_’, ‘def_get’, and ‘def_set’ to be able to set a reference to their definition, here a ‘FunctionDec’. ‘ast::PrettyPrinter’ Implement ‘--bindings-display’ support in the ‘PrettyPrinter’. Be sure to display the addresses exactly as displayed in this document: immediately after the identifier. Complete the ‘bind::Binder’ Most of the assignment is here... Complete the ‘object::Binder’ ...and here. ‘object::Binder’ inherits from ‘bind::Binder’ so as to factor common parts. Implement renaming to unique identifiers. TC-R is a mandatory assignment. Once TC-3 completed, implementing TC-R is straightforward, see *note TC-R::. Note that ‘--rename’ is helpful to write a test suite for TC-3. Complete auxiliary code Write the tasks, ‘libbind.*’ etc. 4.5.5 TC-3 FAQ -------------- Ambiguous resolution of ‘operator<<’ for ‘ast::VarDec’ Starting from TC-3, ‘ast::VarDec’ inherits both from ‘ast::VarDec’ and ‘ast::Escapable’. Printing an ‘ast::VarDec’ using ‘operator<<’ can be troublesome as this operator may be overloaded for both ‘ast::VarDec’’s base classes, but not for ‘ast::VarDec’ itself, resulting in an ambiguous overload resolution. The simplest way to get rid of this ambiguity is to convert the ‘ast::VarDec’ object to the type of one of its base classes (“upcast”) before printing it, either by creating a alias or (more simply) by using the ‘static_cast’ operator: const ast::VarDec& vardec = ... // Printing VARDEC as an ast::Dec using an intermediate // variable (alias). const ast::Dec& dec = vardec; ostr << dec; // Printing VARDEC as an ast::Escapable using an // on-the-fly conversion. ostr << static_cast(vardec); What is the purpose of the ‘bound’ task? The computation of name bindings can be carried out in different ways, depending on the input language: Tiger without object constructs (“Panther”), Tiger with object constructs and Tiger with support for function overloading. These different flavors of the binding computation are performed by options ‘--bindings-compute’, ‘--object-bindings-compute’ and ‘--overfun-bindings-compute’ respectively (*note (tiger)Invoking tc::). However, some subsequent task may later just require that an AST is annotated with bindings (“bound”) regardless of the technique used to compute these bindings. The purpose of the ‘bound’ task is to address this need: ensuring that one of the bindings task has been executed. This task can be considered as a disjunction (logical “or”) of the ‘bindings-compute’, ‘object-bindings-compute’ and ‘overfun-bindings-compute’ tasks, the first one being the default binding strategy. 4.5.6 TC-3 Improvements ----------------------- Possible improvements include: Factoring the binding interface In the ‘ast’ module, several classes need to be changed to be “bindable”, i.e., to have new data and function members to set, store, and retrieve their associated definition. Instead of changing several classes in a very similar fashion, introduce a ‘Bindable’ template class and derive from its instantiation. Hash tables How about using true hash tables (aka “unordered associative containers” in Boost parlance) instead of trees? You might also want to try Google’s Sparse Hash Tables(1). Escaping Variables Computation Once TC-3 completed, you might consider the TC-E option now, see *note TC-E::. It takes about 100 lines to make it. ---------- Footnotes ---------- (1) . 4.6 TC-R, Unique Identifiers ============================ *2020-TC-3 submission for Ing1 students is Sunday, February 25th 2018 at 11:42.* **note TC-R:: is part of the mandatory assignment of 2020-TC-3.* This section has been updated for EPITA-2020 on 2016-01-27. At the end of this stage, when given the option ‘--rename’, the compiler produces an AST such that no identifier is defined twice. Relevant lecture notes include: ‘names.pdf’(1). ---------- Footnotes ---------- (1) . 4.6.1 TC-R Samples ------------------ Note that the transformation does not apply to field names. let type a = { a: int } function a(a: a): a = a{ a = a + a } var a : a := a(1, 2) in a.a end File 4.28: ‘as.tig’ $ tc -X --rename -A as.tig /* == Abstract Syntax Tree. == */ function _main() = ( let type a_0 = { a : int } function a_2(a_1 : a_0) : a_0 = a_0 { a = (a_1 + a_1) } var a_3 : a_0 := a_2(1, 2) in a_3.a end; () ) Example 4.34: ‘tc -X --rename -A as.tig’ 4.6.2 TC-R Given Code --------------------- No additional code is provided, see *note TC-3 Given Code::. 4.6.3 TC-R Code to Write ------------------------ ‘bind::Renamer’ Write it from scratch. Complete auxiliary code Write the tasks, ‘libbind.*’ etc. 4.6.4 TC-R FAQ -------------- Should I rename primitives (builtins) or ‘_main’? No, you shall not rename them; you have to keep the interface of the Tiger runtime. Likewise for ‘_main’. 4.7 TC-E, Computing the Escaping Variables ========================================== *2020-TC-E submission is Sunday, March 25th 2018 at 11:42.* **note TC-E:: is part of the mandatory assignment of 2020-TC-4.* This section has been updated for EPITA-2020 on 2015-01-27. At the end of this stage, the compiler must be able to compute and display the escaping variables. These features are triggered by the options ‘--escapes-compute’/‘-e’ and ‘--escapes-display’/‘-E’. Relevant lecture notes include: ‘names.pdf’(1) and ‘intermediate.pdf’(2). ---------- Footnotes ---------- (1) . (2) . 4.7.1 TC-E Goals ---------------- Things to learn during this stage that you should remember: Understanding escaping variables In TC-E, we consider the case of non-local variables, i.e., variables that are defined in a function, but used (at least once) in _another_ function, nested in the first one. This possibility for an inner function to use variables declared in outer functions is called _block structure_. Because such variables are used outside of their host function, they are qualified as “escaping”. This information will be necessary during the translation to the intermediate representation (*note TC-5::) when variables (named temporaries a that stage) are assigned a location (in the stack or in a register). Escaping variables shall indeed be stored in memory, so that non-local uses of such variables can actually have a means to access them. Writing a Visitor from scratch The ‘escapes::EscapesVisitor’ provided is almost empty. A goal of TC-E is to write a complete visitor (though a small one). Do not forget to use ‘ast::DefaultVisitor’ to factor as much code as possible. 4.7.2 TC-E Samples ------------------ This example demonstrates the computation and display of escaping variables (and formal arguments). By default, all the variables must be considered as escaping, since it is safe to put a non escaping variable onto the stack, while the converse is unsafe. let var one := 1 var two := 2 function incr(x: int) : int = x + one in incr(two) end File 4.29: ‘variable-escapes.tig’ $ tc -XEAeEA variable-escapes.tig /* == Abstract Syntax Tree. == */ function _main() = ( let var /* escaping */ one := 1 var /* escaping */ two := 2 function incr(/* escaping */ x : int) : int = (x + one) in incr(two) end; () ) /* == Abstract Syntax Tree. == */ function _main() = ( let var /* escaping */ one := 1 var two := 2 function incr(x : int) : int = (x + one) in incr(two) end; () ) Example 4.35: ‘tc -XEAeEA variable-escapes.tig’ Compute the escapes after binding, so that the AST is known to be sane enough (type checking is irrelevant): the ‘EscapeVisitor’ should not bother with undeclared entities. undeclared File 4.30: ‘undefined-variable.tig’ $ tc -e undefined-variable.tig error→undefined-variable.tig:1.1-10: undeclared variable: undeclared ⇒4 Example 4.36: ‘tc -e undefined-variable.tig’ Run your compiler on ‘merge.tig’ and to study its output. There is a number of silly mistakes that people usually make on TC-E: they are all easy to defeat when you do have a reasonable test suite, and once you understood that _torturing your project is a good thing to do_. 4.7.3 TC-E Given Code --------------------- No additional code is provided, see *note TC-3 Given Code::. 4.7.4 TC-E Code to Write ------------------------ See *note src/ast::, and *note src/escapes::. ‘ast::PrettyPrinter’ Implement ‘--escapes-display’ support in the ‘PrettyPrinter’. Follow strictly the output format, since we parse your output to check it. Display the ‘/* escaping */’ flag where needed, and _only_ where needed: each definition of an escaping variable/formal is _preceded_ by the comment ‘/* escaping */’. Do not display meaningless flags due to implementation details. How this pretty-printing is implemented is left to you, but factor common code. ‘escapes::EscapesVisitor’ Write the class ‘escapes::EscapesVisitor’ in ‘src/escapes/escapes-visitor.hh’ and ‘src/escapes/escapes-visitor.cc’. Introduce ‘ast::Escapable’ Ensure ‘ast::VarDec’ inherits from ‘ast::Escapable’. *Note Escapable::. 4.7.5 TC-E FAQ -------------- 4.7.6 TC-E Improvements ----------------------- Possible improvements include: 4.8 TC-4, Type Checking ======================= *2020-TC-4 submission is Sunday, May 25th 2018 at 11:42.* **note TC-E:: is part of the mandatory assignment of 2020-TC-4.* This section has been updated for EPITA-2020 on 2016-01-27. At the end of this stage, the compiler type checks Tiger programs, and annotates the AST. Clear error messages are required. Relevant lecture notes include ‘names.pdf’(1), ‘type-checking.pdf’(2). ---------- Footnotes ---------- (1) . (2) . 4.8.1 TC-4 Goals ---------------- Things to learn during this stage that you should remember: Function template and member function templates Functions template are quite convenient to factor code that looks alike but differs by the nature of its arguments. Member function templates are used to factor error handling the ‘TypeChecker’. Virtual member function templates You will be asked why there can be no such thing in C++. Template specialization Although quite different in nature, types and functions are processed in a similar fashion in a Tiger compiler: first one needs to visit the headers (to introduce the names in the scope, and to check that names are only defined once), and then to visit the bodies (to bind the names to actual values). We use templates and template specialization to factor this. See also the Template Method. The Template Method design pattern The Template Method allows to factor a generic algorithm, the steps of which are specific. This is what we use to type check function and type declarations. Do not confuse Template Method with member function template, the order matters. Remember that in English the noun is usually last, preceded by qualifier. Type-checking What it is, how to implement it. Stack unwinding What it means, and when the C++ standard requires it from the compiler. 4.8.2 TC-4 Samples ------------------ Type checking is optional, invoked by ‘--types-compute’. As for the computation of bindings, this option only handles programs with no object construct. To perform the type-checking of programs with objects, use ‘--object-types-compute’. Implementing overloaded functions in Tiger is an option, which requires the implementation of a different type checker, triggered by ‘--overfun-types-compute’ (*note TC-A::). The option ‘--typed’/‘-T’ makes sure one of them was run. 1 + "2" File 4.31: ‘int-plus-string.tig’ $ tc int-plus-string.tig Example 4.37: ‘tc int-plus-string.tig’ $ tc -T int-plus-string.tig error→int-plus-string.tig:1.5-7: type mismatch error→ right operand type: string error→ expected type: int ⇒5 Example 4.38: ‘tc -T int-plus-string.tig’ The type checker shall ensure loop index variables are read-only. /* error: index variable erroneously assigned to. */ for i := 10 to 1 do i := i - 1 File 4.32: ‘assign-loop-var.tig’ $ tc -T assign-loop-var.tig error→assign-loop-var.tig:3.3-12: variable is read only ⇒5 Example 4.39: ‘tc -T assign-loop-var.tig’ When there are several type errors, it is admitted that some remain hidden by others. unknown_function(unknown_variable) File 4.33: ‘unknowns.tig’ $ tc -T unknowns.tig error→unknowns.tig:1.1-34: undeclared function: unknown_function ⇒4 Example 4.40: ‘tc -T unknowns.tig’ Be sure to check the type of all the constructs. if 1 then 2 File 4.34: ‘bad-if.tig’ $ tc -T bad-if.tig error→bad-if.tig:1.1-11: type mismatch error→ then clause type: int error→ else clause type: void ⇒5 Example 4.41: ‘tc -T bad-if.tig’ Be aware that type and function declarations are recursive by chunks. For instance: let type one = { hd : int, tail : two } type two = { hd : int, tail : one } function one(hd : int, tail : two) : one = one { hd = hd, tail = tail } function two(hd : int, tail : one) : two = two { hd = hd, tail = tail } var one := one(11, two(22, nil)) in print_int(one.tail.hd); print("\n") end File 4.35: ‘mutuals.tig’ $ tc -T mutuals.tig Example 4.42: ‘tc -T mutuals.tig’ In case you are interested, the result is: $ tc -H mutuals.tig >mutuals.hir Example 4.43: ‘tc -H mutuals.tig >mutuals.hir’ $ havm mutuals.hir 22 Example 4.44: ‘havm mutuals.hir’ The type-checker must catch erroneous inheritance relations. let /* Mutually recursive inheritance. */ type A = class extends A {} /* Mutually recursive inheritance. */ type B = class extends C {} type C = class extends B {} /* Class inherits from a non-class type. */ type E = class extends int {} in end File 4.36: ‘bad-super-type.tig’ $ tc --object-types-compute bad-super-type.tig error→bad-super-type.tig:3.12-29: recursive inheritance: A error→bad-super-type.tig:6.12-29: recursive inheritance: C error→bad-super-type.tig:10.26-28: class type expected, got: int ⇒5 Example 4.45: ‘tc --object-types-compute bad-super-type.tig’ Handle the type-checking of ‘TypeDecs’ with care in ‘object::TypeChecker’: they are processed in three steps, while other declarations use a two-step visit. The ‘object::TypeChecker’ visitor proceeds as follows when it encounters a ‘TypeDecs’: 1. Visit the headers of all types in the block. 2. Visit the bodies of all types in the block, but ignore members for each type being a class. 3. For each type of the block being a class, visit its members. This three-pass visit allows class members to make forward references to other types defined in the same block of types, for instance, instantiate a class ‘B’ from a class ‘A’ (defined in the same block), even if ‘B’ is defined _after_ ‘A’. let /* A block of types. */ class A { /* Valid forward reference to B, defined in the same block as the class enclosing this member. */ var b := new B } type t = int class B { } in end File 4.37: ‘forward-reference-to-class.tig’ $ tc --object-types-compute forward-reference-to-class.tig Example 4.46: ‘tc --object-types-compute forward-reference-to-class.tig’ (See ‘object::TypeChecker::operator()(ast::TypeDecs&)’ for more details. 4.8.3 TC-4 Given Code --------------------- Some code is provided through the ‘tc-base’ repository, using tag ‘2020-tc-base-4.0’. For a description of the new module, see *note src/type::. 4.8.4 TC-4 Code to Write ------------------------ What is to be done. ‘ast::Typable’ ‘ast::TypeConstructor’ Because many AST nodes will be annotated with their type, the feature is factored by these two classes. *Note Typable::, and *note TypeConstructor::, for details. ‘ast::Exp’, ‘ast::Dec’, ‘ast::Ty’ These are typable. ‘ast::FunctionDec’, ‘ast::TypeDec’, ‘ast::Ty’ These build types. ‘src/type/type.*’, ‘src/type/array.*’, ‘src/type/builtin-types.*’, ‘src/type/class.*’, ‘src/type/function.*’, ‘src/type/method.*’, ‘src/type/named.*’, ‘src/type/nil.*’, ‘src/type/record.*’ Implement the Singletons ‘type::String’, ‘type::Int’, and ‘type::Void’. Using templates would be particularly appreciated to factor the code between the three singleton classes, see *note TC-4 Options::. The remaining classes are incomplete. Pay extra attention to ‘type::operator==(const Type& a, const Type& b)’ and ‘type::Type::compatible_with’. ‘type::TypeChecker’ ‘object::TypeChecker’ Of course this is the most tricky part. We hope there are enough comments in there so that you understand what is to be done. Please, post your questions and help us improve it. It is also the ‘type::TypeChecker’’s job to set the ‘record_type’ in the ‘type::Nil’ class. ‘record_type’ is holding some information about the ‘type::Record’ type associated to the ‘type::Nil’ type. We choose to handle the ‘record_type’ only when no error occured in the type-checking process. ‘type::GenVisitor’ ‘type::GenDefaultVisitor’ ‘type::Type’s are visitable. You must implement the default visitor class template, which walks through the tree of types doing nothing. It’s used as a base class for the type visitors. ‘type::PrettyPrinter’ In order to output nice error messages, the types need to be printed. You must implement a visitor that prints the types, similar to ‘ast::PrettyPrinter’. Computing the Escaping Variables The implementation of *note TC-E::, suggested at *note TC-3::, becomes a mandatory assignment at *note TC-4::. 4.8.5 TC-4 Options ------------------ These are features that you might want to implement in addition to the core features. ‘type::Error’ One problem is that type error recovery can generate false errors. For instance our compiler usually considers that the type for incorrect constructs is ‘Int’, which can create cascades of errors: "666" = if 000 then 333 else "666" File 4.38: ‘is_devil.tig’ $ tc -T is_devil.tig error→is_devil.tig:1.9-34: type mismatch error→ then clause type: int error→ else clause type: string error→is_devil.tig:1.1-34: type mismatch error→ left operand type: string error→ right operand type: int ⇒5 Example 4.47: ‘tc -T is_devil.tig’ One means to avoid this issue consists in introducing a new type, ‘type::Error’, that the type checker would never complain about. This can be a nice complement to ‘ast::Error’. Various Desugaring *Note TC-D::, for more details. This is quite an easy option, and a very interesting one. Note that implementing desugaring makes TC-5 easier. Bounds Checking If you felt TC-D was easy, then implementing bounds checking should be easy too. *Note TC-B::. Overloaded Tiger *Note TC-A::, for a description of this ambitious option. Renaming object-oriented constructs Like TC-R, this task consists in writing a visitor renaming AST nodes holding names (either defined or used), this time with support for object-oriented constructs (option ‘--object-rename’). This visitor, ‘object::Renamer’, shall also update named types (‘type::Named’) and collect the names of all (renamed) classes. This option is essentially a preliminary step of TC-O (see the next item). Desugaring Tiger to Panther If your compiler is complete w.r.t. object constructs (in particular, the type-checking and the renaming of objects is a requirement), then you can implement this very ambitious option, whose goal is to convert a Tiger program with object constructs into a program with none of them (i.e., in the subset of Tiger called _Panther_). This work consists in completing the ‘object::DesugarVisitor’ and implementing the ‘--object-desugar’ option. *Note TC-O::. 4.8.6 TC-4 FAQ -------------- Stupid Types One can legitimately wonder whether the following program is correct: let type weirdo = array of weirdo in print("I'm a creep.\n") end the answer is "yes", as nothing prevents this in the Tiger specifications. This type is not usable though. Is ‘type::Field’ useful? Using ‘std::pair’ in ‘type::Record’ is probably enough, and simpler. Is ‘nil’ compatible with objects? For instance, is the following example valid? var a : Object := nil The answer is yes: ‘nil’ is both compatible with records _and_ objects. Can one redefine the built-in class ‘Object’? Yes, if the rules of the Tiger Compiler Reference Manual are honored, notably: − Every class has a super class, defaulting to the built-in class ‘Object’ (syntactic sugar of ‘class’ without an ‘extends’ clause). − Recursive inheritance (within the same block of types) is forbidden. For example, let class Object {} in end is invalid, since it is similar to let class Object extends Object {} in end and recursive inheritance is invalid. One can try and introduce a ‘Dummy’ type as a workaround let class Dummy {} class Object extends Dummy {} in end but this is just postponing the problem, since the code above is the same as the following: let class Dummy extends Object {} class Object extends Dummy {} in end where there is still a recursive inheritance. The one solution is to define our ‘Dummy’ type beforehand (i.e., in its own block of type declarations), then to redefine ‘Object’. /* Valid. */ let class Dummy {} in let class Object extends Dummy {} in end end Take care: this new ‘Object’ type is _different_ from the built-in one. The code below gives an example of an invalid mix of these two types. let class Dummy {} function get_builtin_object() : Object = new Object /* builtin */ in let class Object extends Dummy {} /* custom */ /* Invalid assignment, since an instance of the builtin Object is *not* an instance of the custom Object. */ var o : Object /* custom */ := get_builtin_object() /* builtin */ in end end 4.8.7 TC-4 Improvements ----------------------- Possible improvements include: A Singleton template Implementations of the Singleton design pattern are frequently needed; the ‘type’ module alone requires three instances! Therefore a template to generate such singletons is desirable. There are two ways to address this issue: tailored to ‘type’ (directly in ‘src/type/builtin-types.*’), or in a completely generic way (in ‘lib/misc/singleton.*’). *Note Modern C++ Design::, for a topnotch implementation. A more verbose type display When reporting a type, one must be careful with recursive definitions that could produce never ending outputs. The suggested simple implementation ensure this by limiting the ‘Named’-depth (i.e., the number of ‘Named’ objects traversed) to one. Another, nicer possibility, would be to limit the expansion to once _per ‘Named’_. A Graphical User Interface ‘tcsh’ is up and running. You might want to use it to implement a GUI using Python’s Tkinter(1). ---------- Footnotes ---------- (1) . 4.9 TC-D, Removing the syntactic sugar from the Abstract Syntax Tree ==================================================================== *TC-D is an optional assignment.* This section has been updated for EPITA-2009 on 2007-04-26. At the end of this stage, the compiler must be able to remove syntactic sugar from a type-checked AST. These features are triggered by the options ‘--desugar’ and ‘--overfun-desugar’. 4.9.1 TC-D Samples ------------------ String comparisons can be translated to an equivalent AST using function calls, before the translation to HIR. "foo" = "bar" File 4.39: ‘string-equality.tig’ $ tc --desugar-string-cmp --desugar -A string-equality.tig /* == Abstract Syntax Tree. == */ primitive print(string_0 : string) primitive print_err(string_1 : string) primitive print_int(int_2 : int) primitive flush() primitive getchar() : string primitive ord(string_3 : string) : int primitive chr(code_4 : int) : string primitive size(string_5 : string) : int primitive streq(s1_6 : string, s2_7 : string) : int primitive strcmp(s1_8 : string, s2_9 : string) : int primitive substring(string_10 : string, start_11 : int, length_12 : int) : string primitive concat(fst_13 : string, snd_14 : string) : string primitive not(boolean_15 : int) : int primitive exit(status_16 : int) function _main() = ( streq("foo", "bar"); () ) Example 4.48: ‘tc --desugar-string-cmp --desugar -A string-equality.tig’ "foo" < "bar" File 4.40: ‘string-less.tig’ $ tc --desugar-string-cmp --desugar -A string-less.tig /* == Abstract Syntax Tree. == */ primitive print(string_0 : string) primitive print_err(string_1 : string) primitive print_int(int_2 : int) primitive flush() primitive getchar() : string primitive ord(string_3 : string) : int primitive chr(code_4 : int) : string primitive size(string_5 : string) : int primitive streq(s1_6 : string, s2_7 : string) : int primitive strcmp(s1_8 : string, s2_9 : string) : int primitive substring(string_10 : string, start_11 : int, length_12 : int) : string primitive concat(fst_13 : string, snd_14 : string) : string primitive not(boolean_15 : int) : int primitive exit(status_16 : int) function _main() = ( (strcmp("foo", "bar") < 0); () ) Example 4.49: ‘tc --desugar-string-cmp --desugar -A string-less.tig’ ‘for’ loops can be seen as sugared ‘while’ loops, and be transformed as such. for i := 0 to 10 do print_int(i) File 4.41: ‘simple-for-loop.tig’ $ tc --desugar-for --desugar -A simple-for-loop.tig /* == Abstract Syntax Tree. == */ primitive print(string_0 : string) primitive print_err(string_1 : string) primitive print_int(int_2 : int) primitive flush() primitive getchar() : string primitive ord(string_3 : string) : int primitive chr(code_4 : int) : string primitive size(string_5 : string) : int primitive streq(s1_6 : string, s2_7 : string) : int primitive strcmp(s1_8 : string, s2_9 : string) : int primitive substring(string_10 : string, start_11 : int, length_12 : int) : string primitive concat(fst_13 : string, snd_14 : string) : string primitive not(boolean_15 : int) : int primitive exit(status_16 : int) function _main() = ( let var _lo := 0 var _hi := 10 var i_17 := _lo in (if (_lo <= _hi) then (while 1 do ( print_int(i_17); (if (i_17 = _hi) then break else ()); (i_17 := (i_17 + 1)) )) else ()) end; () ) Example 4.50: ‘tc --desugar-for --desugar -A simple-for-loop.tig’ 4.10 TC-I, Function inlining ============================ *TC-I is an optional assignment.* This section has been updated for EPITA-2009 on 2007-04-26. At the end of this stage, the compiler inlines function bodies where functions are called. In a later pass, useless functions can be pruned from the AST. These features are triggered by the options ‘--inline’ and ‘--prune’. If you also implemented function overloading (*note TC-A::), use the options ‘--overfun-inline’ and ‘--overfun-prune’. 4.10.1 TC-I Samples ------------------- let function sub(i: int, j: int) :int = i + j in sub(1, 2) end File 4.42: ‘sub.tig’ $ tc -X --inline -A sub.tig /* == Abstract Syntax Tree. == */ function _main() = ( let function sub_2(i_0 : int, j_1 : int) : int = (i_0 + j_1) in let var i_0 : int := 1 var j_1 : int := 2 var res : int := (i_0 + j_1) in res end end; () ) Example 4.51: ‘tc -X --inline -A sub.tig’ Recursive functions cannot be inlined. 4.11 TC-B, Array bounds checking ================================ *TC-B is an optional assignment.* This section has been updated for EPITA-2020 on 2015-01-31. At the end of this stage, the compiler adds dynamic checks of the bounds of arrays to the AST. Every access (either on read or write) is checked, and the program should stops with the runtime exit code (120) on out-of-bounds access. This feature is triggered by the options ‘--bounds-checks-add’ and ‘--overfun-bounds-checks-add’. 4.11.1 TC-B Samples ------------------- Here is an example with an out-of-bounds array subscript, run with HAVM. let type int_array = array of int var foo := int_array [10] of 3 in /* Out-of-bounds access. */ foo[20] end File 4.43: ‘subscript-read.tig’ $ tc --bounds-checks-add -A subscript-read.tig /* == Abstract Syntax Tree. == */ primitive print(string_0 : string) primitive print_err(string_1 : string) primitive print_int(int_2 : int) primitive flush() primitive getchar() : string primitive ord(string_3 : string) : int primitive chr(code_4 : int) : string primitive size(string_5 : string) : int primitive streq(s1_6 : string, s2_7 : string) : int primitive strcmp(s1_8 : string, s2_9 : string) : int primitive substring(string_10 : string, start_11 : int, length_12 : int) : string primitive concat(fst_13 : string, snd_14 : string) : string primitive not(boolean_15 : int) : int primitive exit(status_16 : int) function _main() = let type __int_array = array of int type _int_array = { arr : __int_array, size : int } function _check_bounds(a : _int_array, index : int, location : string) : int = ( (if (if (index < 0) then 1 else ((index >= a.size) <> 0)) then ( print_err(location); print_err(": array index out of bounds.\n"); exit(120) ) else ()); index ) in ( let type _box_int_array_17 = { arr : int_array_17, size : int } type int_array_17 = array of int var foo_18 := let var _size := 10 in _box_int_array_17 { arr = int_array_17 [_size] of 3, size = _size } end in foo_18.arr[_check_bounds(_cast(foo_18, _int_array), 20, "1.1")] end; () ) end Example 4.52: ‘tc --bounds-checks-add -A subscript-read.tig’ $ tc --bounds-checks-add -L subscript-read.tig >subscript-read.lir Example 4.53: ‘tc --bounds-checks-add -L subscript-read.tig >subscript-read.lir’ $ havm subscript-read.lir error→1.1: array index out of bounds. ⇒120 Example 4.54: ‘havm subscript-read.lir’ And here is an example with an out-of-bounds assignment to an array cell, tested with Nolimips. let type int_array = array of int var foo := int_array [10] of 3 in /* Out-of-bounds assignment. */ foo[42] := 51 end File 4.44: ‘subscript-write.tig’ $ tc --bounds-checks-add -A subscript-write.tig /* == Abstract Syntax Tree. == */ primitive print(string_0 : string) primitive print_err(string_1 : string) primitive print_int(int_2 : int) primitive flush() primitive getchar() : string primitive ord(string_3 : string) : int primitive chr(code_4 : int) : string primitive size(string_5 : string) : int primitive streq(s1_6 : string, s2_7 : string) : int primitive strcmp(s1_8 : string, s2_9 : string) : int primitive substring(string_10 : string, start_11 : int, length_12 : int) : string primitive concat(fst_13 : string, snd_14 : string) : string primitive not(boolean_15 : int) : int primitive exit(status_16 : int) function _main() = let type __int_array = array of int type _int_array = { arr : __int_array, size : int } function _check_bounds(a : _int_array, index : int, location : string) : int = ( (if (if (index < 0) then 1 else ((index >= a.size) <> 0)) then ( print_err(location); print_err(": array index out of bounds.\n"); exit(120) ) else ()); index ) in ( let type _box_int_array_17 = { arr : int_array_17, size : int } type int_array_17 = array of int var foo_18 := let var _size := 10 in _box_int_array_17 { arr = int_array_17 [_size] of 3, size = _size } end in (foo_18.arr[_check_bounds(_cast(foo_18, _int_array), 42, "1.1")] := 51) end; () ) end Example 4.55: ‘tc --bounds-checks-add -A subscript-write.tig’ $ tc --bounds-checks-add -S subscript-write.tig >subscript-write.s Example 4.56: ‘tc --bounds-checks-add -S subscript-write.tig >subscript-write.s’ $ nolimips -l nolimips -Nue subscript-write.s error→1.1: array index out of bounds. ⇒120 Example 4.57: ‘nolimips -l nolimips -Nue subscript-write.s’ 4.11.2 TC-B FAQ --------------- The bounds checking extension relies on the use of casts (*note TC-B Samples::), see *Note (tiger)Language Extensions::. However, a simplistic implementation of casts introduces ambiguities in the grammar that even a GLR parser cannot resolve dynamically. Consider the following example, where ‘foo’ is an l-value : _cast(foo, string) This piece of code can be parsed in two different ways: 1. ‘exp -> cast-exp -> exp -> lvalue (foo)’ 2. ‘exp -> lvalue -> cast-lvalue -> lvalue (foo)’ As the cast must preserve the l-value nature of ‘foo’, it must itself produce an l-value. Hence we want the latter interpretation. This is a true ambiguity, not a local ambiguity that GLR can resolve simply by “waiting for enough look-ahead”. To help it take the right decision, you can favor the right path by assigning _dynamic_ priorities to relevant rules, using Bison’s ‘%dprec’ keyword. See Bison’s manual (*note Flex & Bison::) for more information on this feature. 4.12 TC-A, Ad Hoc Polymorphism (Function Overloading) ===================================================== *TC-A is an optional assignment.* This section has been updated for EPITA-2009 on 2007-04-26. At the end of this stage, the compiler must be able to resolve overloaded function calls. These features are triggered by the options ‘--overfun-bindings-compute’ and ‘--overfun-types-compute’/‘-O’. Relevant lecture notes include: ‘names.pdf’(1). ---------- Footnotes ---------- (1) . 4.12.1 TC-A Samples ------------------- Overloaded functions are not supported in regular Tiger. let function null(i: int) : int = i = 0 function null(s: string) : int = s = "" in null("123") = null(123) end File 4.45: ‘sizes.tig’ $ tc -Xb sizes.tig error→sizes.tig:3.3-41: redefinition: null error→sizes.tig:2.3-40: first definition ⇒4 Example 4.58: ‘tc -Xb sizes.tig’ Instead of regular binding, overloaded binding binds each function call to the _set_ of active function definitions. Unfortunately displaying this set is not implemented, so we cannot see them in the following example: $ tc -X --overfun-bindings-compute -BA sizes.tig /* == Abstract Syntax Tree. == */ function _main /* 0x55940eca7890 */() = ( let function null /* 0x55940eca7c20 */(i /* 0x55940ecaa700 */ : int /* 0 */) : int /* 0 */ = (i /* 0x55940ecaa700 */ = 0) function null /* 0x55940eca8610 */(s /* 0x55940eca9ce0 */ : string /* 0 */) : int /* 0 */ = (s /* 0x55940eca9ce0 */ = "") in (null /* 0 */("123") = null /* 0 */(123)) end; () ) Example 4.59: ‘tc -X --overfun-bindings-compute -BA sizes.tig’ The selection of the right binding cannot be done before type-checking, since precisely overloading relies on types to distinguish the actual function called. Therefore it is the type checker that “finishes” the binding. $ tc -XOBA sizes.tig /* == Abstract Syntax Tree. == */ function _main /* 0x55ce2d384890 */() = ( let function null /* 0x55ce2d386ed0 */(i /* 0x55ce2d385e00 */ : int /* 0 */) : int /* 0 */ = (i /* 0x55ce2d385e00 */ = 0) function null /* 0x55ce2d384b00 */(s /* 0x55ce2d3850a0 */ : string /* 0 */) : int /* 0 */ = (s /* 0x55ce2d3850a0 */ = "") in (null /* 0x55ce2d384b00 */("123") = null /* 0x55ce2d386ed0 */(123)) end; () ) Example 4.60: ‘tc -XOBA sizes.tig’ There can be ambiguous (overloaded) calls. let type foo = {} function empty(f: foo) : int = f = nil type bar = {} function empty(b: bar) : int = b = nil in empty(foo {}); empty(bar {}); empty(nil) end File 4.46: ‘over-amb.tig’ $ tc -XO over-amb.tig error→over-amb.tig:9.3-12: nil ambiguity calling `empty' error→matching declarations: error→ empty @ error→ { error→ f : foo = error→ { error→ } error→ } error→ empty @ error→ { error→ b : bar = error→ { error→ } error→ } ⇒5 Example 4.61: ‘tc -XO over-amb.tig’ The spirit of plain Tiger is kept: a “chunk” is not allowed to redefine a function with the same signature: let function foo(i: int) = () function foo(i: int) = () in foo(42) end File 4.47: ‘over-duplicate.tig’ $ tc -XO over-duplicate.tig error→over-duplicate.tig:3.3-27: function complete redefinition: foo error→over-duplicate.tig:2.3-27: first definition ⇒5 Example 4.62: ‘tc -XO over-duplicate.tig’ but a signature can be defined twice in different blocks of function definitions, in which case the last defined function respecting the calling signature is used.. let function foo(i: int) = () in let function foo(i: int) = () in foo(51) end end File 4.48: ‘over-scoped.tig’ $ tc -XOBA over-scoped.tig /* == Abstract Syntax Tree. == */ function _main /* 0x55818121a110 */() = ( let function foo /* 0x55818121bed0 */(i /* 0x55818121ae00 */ : int /* 0 */) = () in let function foo /* 0x55818121a230 */(i /* 0x55818121a0a0 */ : int /* 0 */) = () in foo /* 0x55818121a230 */(51) end end; () ) Example 4.63: ‘tc -XOBA over-scoped.tig’ 4.12.2 TC-A Given Code ---------------------- No additional code is provided. 4.12.3 TC-A Code to Write ------------------------- See *note src/ast::, and *note src/overload::. 4.13 TC-O, Desugaring object constructs ======================================= *TC-O is an optional assignment.* This section has been updated for EPITA-2012 on 2015-01-21. At the end of this stage, the compiler must be able to desugar object constructs into plain Tiger without objects, a.k.a. Panther. This feature is triggered by the option ‘--object-desugar’. Do not forget that you need to complete and write all missing parts of the object support (parser, ast, binder, type-checker, etc...). Make sure that all of these are correctly working before starting this bonus. This a very hard assignment. If you plan to work on it, start with very simple programs, and progressively add new desugaring patterns. Be sure to keep a complete test suite to cover all cases and avoid regressions. Achieving a faithful and complete translation from Tiger to Panther requires a lot of work. Even the reference implementation of the object-desugar pass (about 1,000 lines of code) is not perfect, as some inputs may generate invalid Tiger code after desugaring objects (in particular when playing with scopes). 4.13.1 TC-O Samples ------------------- Be warned: even Small object-oriented Tiger programs may generate complicated desugared outputs. let class A {} in end File 4.49: ‘empty-class.tig’ $ tc -X --object-desugar -A empty-class.tig /* == Abstract Syntax Tree. == */ function _main() = let type _variant_Object = { exact_type : int } type _variant_A_0 = { exact_type : int } var _id_Object := 0 var _id_A_0 := 1 function _new_Object() : _variant_Object = _variant_Object { exact_type = _id_Object } in ( let function _new_A_0() : _variant_A_0 = let in _variant_A_0 { exact_type = _id_A_0 } end function _upcast_A_0_to_Object(source : _variant_A_0) : _variant_Object = _variant_Object { exact_type = _id_A_0 } in () end; () ) end Example 4.64: ‘tc -X --object-desugar -A empty-class.tig’ let class B { var a := 42 method m() : int = self.a } var b := new B in b.a := 51 end File 4.50: ‘simple-class.tig’ $ tc -X --object-desugar -A simple-class.tig /* == Abstract Syntax Tree. == */ function _main() = let type _variant_Object = { exact_type : int, field_B_1 : _contents_B_1 } type _contents_B_1 = { a : int } type _variant_B_1 = { exact_type : int, field_B_1 : _contents_B_1 } var _id_Object := 0 var _id_B_1 := 1 function _new_Object() : _variant_Object = _variant_Object { exact_type = _id_Object, field_B_1 = nil } in ( let function _new_B_1() : _variant_B_1 = let var contents_B_1 := _contents_B_1 { a = 42 } in _variant_B_1 { exact_type = _id_B_1, field_B_1 = contents_B_1 } end function _upcast_B_1_to_Object(source : _variant_B_1) : _variant_Object = _variant_Object { exact_type = _id_B_1, field_B_1 = source.field_B_1 } function _method_B_1_m(self : _variant_B_1) : int = self.field_B_1.a function _dispatch_B_1_m(self : _variant_B_1) : int = _method_B_1_m(self) var b_2 := _new_B_1() in (b_2.field_B_1.a := 51) end; () ) end Example 4.65: ‘tc -X --object-desugar -A simple-class.tig’ let class C { var a := 0 method m() : int = self.a } class D extends C { var b := 9 /* Override C.m(). */ method m() : int = self.a + self.b } var d : D := new D /* Valid upcast due to inclusion polymorphism. */ var c : C := d in c.a := 42; /* Note that accessing `c.b' is not allowed, since `c' is statically known as a `C', even though it is actually a `D' at run time. */ let /* Polymorphic call. */ var res := c.m() in print_int(res); print("\n") end end File 4.51: ‘override.tig’ $ tc --object-desugar -A override.tig /* == Abstract Syntax Tree. == */ primitive print(string_0 : string) primitive print_err(string_1 : string) primitive print_int(int_2 : int) primitive flush() primitive getchar() : string primitive ord(string_3 : string) : int primitive chr(code_4 : int) : string primitive size(string_5 : string) : int primitive streq(s1_6 : string, s2_7 : string) : int primitive strcmp(s1_8 : string, s2_9 : string) : int primitive substring(string_10 : string, start_11 : int, length_12 : int) : string primitive concat(fst_13 : string, snd_14 : string) : string primitive not(boolean_15 : int) : int primitive exit(status_16 : int) function _main() = let type _variant_Object = { exact_type : int, field_C_18 : _contents_C_18, field_D_20 : _contents_D_20 } type _contents_C_18 = { a : int } type _variant_C_18 = { exact_type : int, field_C_18 : _contents_C_18, field_D_20 : _contents_D_20 } type _contents_D_20 = { b : int } type _variant_D_20 = { exact_type : int, field_D_20 : _contents_D_20, field_C_18 : _contents_C_18 } var _id_Object := 0 var _id_C_18 := 1 var _id_D_20 := 2 function _new_Object() : _variant_Object = _variant_Object { exact_type = _id_Object, field_C_18 = nil, field_D_20 = nil } in ( let function _new_C_18() : _variant_C_18 = let var contents_C_18 := _contents_C_18 { a = 0 } in _variant_C_18 { exact_type = _id_C_18, field_C_18 = contents_C_18, field_D_20 = nil } end function _upcast_C_18_to_Object(source : _variant_C_18) : _variant_Object = _variant_Object { exact_type = _id_C_18, field_C_18 = source.field_C_18, field_D_20 = source.field_D_20 } function _downcast_C_18_to_D_20(source : _variant_C_18) : _variant_D_20 = _variant_D_20 { exact_type = _id_D_20, field_D_20 = source.field_D_20, field_C_18 = source.field_C_18 } function _method_C_18_m(self : _variant_C_18) : int = self.field_C_18.a function _dispatch_C_18_m(self : _variant_C_18) : int = (if (self.exact_type = _id_D_20) then _method_D_20_m(_downcast_C_18_to_D_20(self)) else _method_C_18_m(self)) function _new_D_20() : _variant_D_20 = let var contents_D_20 := _contents_D_20 { b = 9 } var contents_C_18 := _contents_C_18 { a = 0 } in _variant_D_20 { exact_type = _id_D_20, field_D_20 = contents_D_20, field_C_18 = contents_C_18 } end function _upcast_D_20_to_C_18(source : _variant_D_20) : _variant_C_18 = _variant_C_18 { exact_type = _id_D_20, field_C_18 = source.field_C_18, field_D_20 = source.field_D_20 } function _upcast_D_20_to_Object(source : _variant_D_20) : _variant_Object = _variant_Object { exact_type = _id_D_20, field_C_18 = source.field_C_18, field_D_20 = source.field_D_20 } function _method_D_20_m(self : _variant_D_20) : int = (self.field_C_18.a + self.field_D_20.b) function _dispatch_D_20_m(self : _variant_D_20) : int = _method_D_20_m(self) var d_21 : _variant_D_20 := _new_D_20() var c_22 : _variant_C_18 := _upcast_D_20_to_C_18(d_21) in ( (c_22.field_C_18.a := 42); let var res_23 := _dispatch_C_18_m(c_22) in ( print_int(res_23); print("\n") ) end ) end; () ) end Example 4.66: ‘tc --object-desugar -A override.tig’ $ tc --object-desugar -L override.tig >override.lir Example 4.67: ‘tc --object-desugar -L override.tig >override.lir’ $ havm override.lir 51 Example 4.68: ‘havm override.lir’ 4.14 TC-5, Translating to the High Level Intermediate Representation ==================================================================== *2020-TC-5 submission is Saturday, April 29th 2018 at 11:42* This section has been updated for EPITA-2020 on 2016-01-27. At the end of this stage the compiler translates the AST into the high level intermediate representation, HIR for short. Relevant lecture notes include ‘intermediate.pdf’(1). ---------- Footnotes ---------- (1) . 4.14.1 TC-5 Goals ----------------- Things to learn during this stage that you should remember: Smart pointers The techniques used to implement reference counting via the redefinition of ‘operator->’ and ‘operator*’. ‘std::unique_ptr’s are also smart pointers. ‘std::unique_ptr’ The intermediate translation is stored in an ‘unique_ptr’ to guarantee it is released (‘delete’) at the end of the run. Reference counting The class template ‘misc::ref’ provides reference counting smart pointers to ease the memory management. It is used to handle nodes of the intermediate representation, especially because during TC-6 some rewriting might transform this tree into an DAG, in which case memory deallocation is complex. Variants C++ features the ‘union’ keyword, inherited from C. Not only is ‘union’ not type safe, it also forbids class members. Some people have worked hard to implement ‘union’ à la C++, i.e., with type safety, polymorphism etc. These union are called “discriminated unions” or “variants” to follow the vocabulary introduced by Caml. See the papers from Andrei Alexandrescu: Discriminated Unions (i)(1), Discriminated Unions (ii)(2), Generic: Discriminated Unions (iii)(3) for an introduction to the techniques. We use ‘misc::variant’ in ‘temp’. I (Akim) strongly encourage you to read these enlightening articles. Default copy constructor, default assignment operator The C++ standard specifies that unless specified, default implementations of the copy constructor and assignment operator must be provided by the compiler. There are some pitfalls though, clearly exhibited in the implementation of ‘misc::ref’. You must be able to explain these pitfalls. Template template parameters C++ allows several kinds of entities to be used as template parameters. The most well known kind is “type”: you frequently parameterize class templates with types via ‘template ’ or ‘template ’. But you may also parameterize with a class template. The ‘temp’ module heavily uses this feature: understand it, and be ready to write similar code. Explicit template instantiations You must be able to explain how templates are “compiled”. In addition, you know how to explicitly instantiate templates, and explain what it can be used for. The implementation of ‘temp::Identifier’ (and ‘temp::Temp’ and ‘temp::Label’) is based on these ideas. See the corresponding rule in *note File Conventions:: for some explanations on this topic. Covariant return C++ supports covariance of the method return type. This feature is crucial to implement methods such as ‘clone’, as in ‘frame::Access::clone()’. Understand return type covariance. Lazy/delayed computation The ‘Ix’, ‘Cx’, ‘Nx’, and ‘Ex’ classes delay computation to address context-depend issues in a context independent way. Intermediate Representations A different approach of hierarchies In this project, the AST is composed of different classes related by inheritance (as if the kinds of the nodes were _class_ members). Here, the nodes are members of a single class, but their nature is specified by the object itself (as if the kinds of the nodes were _object_ members). Stack Frame, Activation Record The implementation of recursion and automatic variables. Inner functions and their impact on memory management at runtime Reaching non local variables. ---------- Footnotes ---------- (1) . (2) . (3) . 4.14.2 TC-5 Samples ------------------- TC-5 can be started (and should be started if you don’t want to finish it in a hurry) by first making sure your compiler can handle code that uses no variables. Then, you can complete your compiler to support more and more Tiger features. 4.14.2.1 TC-5 Primitive Samples ............................... This example is probably the simplest Tiger program. 0 File 4.52: ‘0.tig’ $ tc --hir-display 0.tig /* == High Level Intermediate representation. == */ # Routine: _main label main # Prologue # Body seq sxp const 0 sxp const 0 seq end # Epilogue label end Example 4.69: ‘tc --hir-display 0.tig’ You should then probably try to make more difficult programs with literals only. Arithmetics is one of the easiest tasks. 1 + 2 * 3 File 4.53: ‘arith.tig’ $ tc -H arith.tig /* == High Level Intermediate representation. == */ # Routine: _main label main # Prologue # Body seq sxp binop add const 1 binop mul const 2 const 3 sxp const 0 seq end # Epilogue label end Example 4.70: ‘tc -H arith.tig’ Use ‘havm’ to exercise your output. $ tc -H arith.tig >arith.hir Example 4.71: ‘tc -H arith.tig >arith.hir’ $ havm arith.hir Example 4.72: ‘havm arith.hir’ Unfortunately, without actually printing something, you won’t see the final result, which means you need to implement function calls. Fortunately, you can ask ‘havm’ for a verbose execution: $ havm --trace arith.hir error→checkingLow error→plaining error→unparsing error→checking error→evaling error→ call ( name main ) [] error→9.6-9.13: const 1 error→11.8-11.15: const 2 error→12.8-12.15: const 3 error→10.6-12.15: binop mul 2 3 error→8.4-12.15: binop add 1 6 error→7.2-12.15: sxp 7 error→14.4-14.11: const 0 error→13.2-14.11: sxp 0 error→ end call ( name main ) [] = 0 Example 4.73: ‘havm --trace arith.hir’ If you look carefully, you will find an ‘sxp 7’ in there... Then you are encouraged to implement control structures. if 101 then 102 else 103 File 4.54: ‘if-101.tig’ $ tc -H if-101.tig /* == High Level Intermediate representation. == */ # Routine: _main label main # Prologue # Body seq seq cjump ne const 101 const 0 name l0 name l1 label l0 sxp const 102 jump name l2 label l1 sxp const 103 label l2 seq end sxp const 0 seq end # Epilogue label end Example 4.74: ‘tc -H if-101.tig’ And even more difficult control structure uses: while 101 do (if 102 then break) File 4.55: ‘while-101.tig’ $ tc -H while-101.tig /* == High Level Intermediate representation. == */ # Routine: _main label main # Prologue # Body seq seq label l1 cjump ne const 101 const 0 name l2 name l0 label l2 seq cjump ne const 102 const 0 name l3 name l4 label l3 jump name l0 jump name l5 label l4 sxp const 0 label l5 seq end jump name l1 label l0 seq end sxp const 0 seq end # Epilogue label end Example 4.75: ‘tc -H while-101.tig’ Beware that _HAVM has some known bugs_ with its handling of ‘break’, see *note HAVM Bugs::. 4.14.2.2 TC-5 Optimizing Cascading If ..................................... Optimize the number of jumps needed to compute nested ‘if’, using ‘translate::Ix’. A plain use of ‘translate::Cx’ is possible, but less efficient. Consider the following sample: if if 11 < 22 then 33 < 44 else 55 < 66 then print("OK\n") File 4.56: ‘boolean.tig’ a naive implementation will probably produce too many ‘cjump’ instructions(1): $ tc --hir-naive -H boolean.tig /* == High Level Intermediate representation. == */ label l7 "OK\n" # Routine: _main label main # Prologue # Body seq seq cjump ne eseq seq cjump lt const 11 const 22 name l0 name l1 label l0 move temp t0 eseq seq move temp t1 const 1 cjump lt const 33 const 44 name l3 name l4 label l4 move temp t1 const 0 label l3 seq end temp t1 jump name l2 label l1 move temp t0 eseq seq move temp t2 const 1 cjump lt const 55 const 66 name l5 name l6 label l6 move temp t2 const 0 label l5 seq end temp t2 jump name l2 label l2 seq end temp t0 const 0 name l8 name l9 label l8 sxp call name print name l7 call end jump name l10 label l9 sxp const 0 jump name l10 label l10 seq end sxp const 0 seq end # Epilogue label end Example 4.76: ‘tc --hir-naive -H boolean.tig’ $ tc --hir-naive -H boolean.tig >boolean-1.hir Example 4.77: ‘tc --hir-naive -H boolean.tig >boolean-1.hir’ $ havm --profile boolean-1.hir error→/* Profiling. */ error→fetches from temporary : 2 error→fetches from memory : 0 error→binary operations : 0 error→function calls : 1 error→stores to temporary : 2 error→stores to memory : 0 error→jumps : 2 error→conditional jumps : 3 error→/* Execution time. */ error→number of cycles : 19 OK Example 4.78: ‘havm --profile boolean-1.hir’ An analysis of this pessimization reveals that it is related to the computation of an intermediate expression (the value of ‘if 11 < 22 then 33 < 44 else 55 < 66’) later decoded as a condition. A better implementation will produce: $ tc -H boolean.tig /* == High Level Intermediate representation. == */ label l0 "OK\n" # Routine: _main label main # Prologue # Body seq seq seq cjump lt const 11 const 22 name l4 name l5 label l4 cjump lt const 33 const 44 name l1 name l2 label l5 cjump lt const 55 const 66 name l1 name l2 seq end label l1 sxp call name print name l0 call end jump name l3 label l2 sxp const 0 label l3 seq end sxp const 0 seq end # Epilogue label end Example 4.79: ‘tc -H boolean.tig’ $ tc -H boolean.tig >boolean-2.hir Example 4.80: ‘tc -H boolean.tig >boolean-2.hir’ $ havm --profile boolean-2.hir error→/* Profiling. */ error→fetches from temporary : 0 error→fetches from memory : 0 error→binary operations : 0 error→function calls : 1 error→stores to temporary : 0 error→stores to memory : 0 error→jumps : 1 error→conditional jumps : 2 error→/* Execution time. */ error→number of cycles : 13 OK Example 4.81: ‘havm --profile boolean-2.hir’ ---------- Footnotes ---------- (1) The option ‘--hir-naive’ is not to be implemented. 4.14.2.3 TC-5 Builtin Calls Samples ................................... The game becomes more interesting with primitive calls (which are easier to compile than function definitions and function calls). (print_int(101); print("\n")) File 4.57: ‘print-101.tig’ $ tc -H print-101.tig >print-101.hir Example 4.82: ‘tc -H print-101.tig >print-101.hir’ $ havm print-101.hir 101 Example 4.83: ‘havm print-101.hir’ Complex values, arrays and records, also need calls to the runtime system: let type ints = array of int var ints := ints [51] of 42 in print_int(ints[ints[0]]); print("\n") end File 4.58: ‘print-array.tig’ $ tc -H print-array.tig /* == High Level Intermediate representation. == */ label l0 "\n" # Routine: _main label main # Prologue move temp t1 temp fp move temp fp temp sp move temp sp binop sub temp sp const 4 # Body seq seq move mem temp fp eseq move temp t0 call name init_array const 51 const 42 call end temp t0 seq sxp call name print_int mem binop add mem temp fp binop mul mem binop add mem temp fp binop mul const 0 const 4 const 4 call end sxp call name print name l0 call end seq end seq end sxp const 0 seq end # Epilogue move temp sp temp fp move temp fp temp t1 label end Example 4.84: ‘tc -H print-array.tig’ $ tc -H print-array.tig >print-array.hir Example 4.85: ‘tc -H print-array.tig >print-array.hir’ $ havm print-array.hir 42 Example 4.86: ‘havm print-array.hir’ The case of record is more subtle. Think carefully about the following example let type list = { h: int, t: list } var list := list { h = 1, t = list { h = 2, t = nil } } in print_int(list.t.h); print("\n") end File 4.59: ‘print-record.tig’ 4.14.2.4 TC-5 Samples with Variables .................................... The following example demonstrates the usefulness of information about escapes: when it is not computed, all the variables are stored on the stack. let var a := 1 var b := 2 var c := 3 in a := 2; c := a + b + c; print_int(c); print("\n") end File 4.60: ‘vars.tig’ $ tc -H vars.tig /* == High Level Intermediate representation. == */ label l0 "\n" # Routine: _main label main # Prologue move temp t0 temp fp move temp fp temp sp move temp sp binop sub temp sp const 12 # Body seq seq move mem temp fp const 1 move mem binop add temp fp const -4 const 2 move mem binop add temp fp const -8 const 3 seq move mem temp fp const 2 move mem binop add temp fp const -8 binop add binop add mem temp fp mem binop add temp fp const -4 mem binop add temp fp const -8 sxp call name print_int mem binop add temp fp const -8 call end sxp call name print name l0 call end seq end seq end sxp const 0 seq end # Epilogue move temp sp temp fp move temp fp temp t0 label end Example 4.87: ‘tc -H vars.tig’ Once escaping variable computation implemented, we _know_ none escape in this example, hence they can be stored in temporaries: $ tc -eH vars.tig /* == High Level Intermediate representation. == */ label l0 "\n" # Routine: _main label main # Prologue # Body seq seq move temp t0 const 1 move temp t1 const 2 move temp t2 const 3 seq move temp t0 const 2 move temp t2 binop add binop add temp t0 temp t1 temp t2 sxp call name print_int temp t2 call end sxp call name print name l0 call end seq end seq end sxp const 0 seq end # Epilogue label end Example 4.88: ‘tc -eH vars.tig’ $ tc -eH vars.tig >vars.hir Example 4.89: ‘tc -eH vars.tig >vars.hir’ $ havm vars.hir 7 Example 4.90: ‘havm vars.hir’ Then, you should implement the declaration of functions: let function fact(i: int) : int = if i = 0 then 1 else i * fact(i - 1) in print_int(fact(15)); print("\n") end File 4.61: ‘fact15.tig’ $ tc -H fact15.tig /* == High Level Intermediate representation. == */ # Routine: fact label l0 # Prologue move temp t1 temp fp move temp fp temp sp move temp sp binop sub temp sp const 8 move mem temp fp temp i0 move mem binop add temp fp const -4 temp i1 # Body move temp rv eseq seq cjump eq mem binop add temp fp const -4 const 0 name l1 name l2 label l1 move temp t0 const 1 jump name l3 label l2 move temp t0 binop mul mem binop add temp fp const -4 call name l0 mem temp fp binop sub mem binop add temp fp const -4 const 1 call end label l3 seq end temp t0 # Epilogue move temp sp temp fp move temp fp temp t1 label end label l4 "\n" # Routine: _main label main # Prologue # Body seq seq sxp call name print_int call name l0 temp fp const 15 call end call end sxp call name print name l4 call end seq end sxp const 0 seq end # Epilogue label end Example 4.91: ‘tc -H fact15.tig’ $ tc -H fact15.tig >fact15.hir Example 4.92: ‘tc -H fact15.tig >fact15.hir’ $ havm fact15.hir 2004310016 Example 4.93: ‘havm fact15.hir’ Note that the result of 15! (1307674368000) does not fit on a signed 32-bit integer, and is therefore wrapped (to 2004310016). And finally, you should support escaping variables (*note File 4.29: variable-escapes.tig.). $ tc -eH variable-escapes.tig /* == High Level Intermediate representation. == */ # Routine: incr label l0 # Prologue move temp t2 temp fp move temp fp temp sp move temp sp binop sub temp sp const 4 move mem temp fp temp i0 move temp t1 temp i1 # Body move temp rv binop add temp t1 mem mem temp fp # Epilogue move temp sp temp fp move temp fp temp t2 label end # Routine: _main label main # Prologue move temp t3 temp fp move temp fp temp sp move temp sp binop sub temp sp const 4 # Body seq sxp eseq seq move mem temp fp const 1 move temp t0 const 2 seq end call name l0 temp fp temp t0 call end sxp const 0 seq end # Epilogue move temp sp temp fp move temp fp temp t3 label end Example 4.94: ‘tc -eH variable-escapes.tig’ 4.14.3 TC-5 Given Code ---------------------- Some code is provided through the ‘tc-base’ repository, using tag ‘2020-tc-base-5.0’. For a description of the new modules, see *note src/temp::, *note src/tree::, *note src/frame::, *note src/translate::. 4.14.4 TC-5 Code to Write ------------------------- You are encouraged to first try very simple examples: ‘nil’, ‘1 + 2’, ‘"foo" < "bar"’ etc. Then consider supporting variables, and finally handle the case of the functions. ‘temp::Identifier’ Their implementations are to be finished. This task is independent of others. Passing ‘test-temp.cc’ is probably the sign you completed correctly the implementation. You are invited to follow the best practices for variants, in particular, avoid “type switching” by hand, rather use variant visitors. For instance the ‘IdentifierEqualVisitor’ can be used this way: template