Everything exposed in this document is expected to be known.
This document, revision $Id: b970f4b392dca8b7749dbf5a08db938d01c9f27c $ of June 7, 2010, details the various tasks EPITA students must complete. It is available under various forms:
--- The Detailed Node Listing ---
Introduction
History
Instructions
Coding Style
Evaluation
Tarballs
Project Layout
Compiler Stages
TC-0, Naive Scanner and Parser
TC-1, Scanner and Parser
TC-2, Building the Abstract Syntax Tree
TC-2 Samples
TC-3, Bindings
TC-R, Unique Identifiers
TC-E, Computing the Escaping Variables
TC-4, Type Checking
TC-D, Removing the syntactic sugar from the Abstract Syntax Tree
TC-I, Function inlining
TC-B, Array bound checking
TC-A, Ad Hoc Polymorphism (Function Overloading)
TC-O, Desugaring object constructs
TC-5, Translating to the High Level Intermediate Representation
TC-5 Samples
TC-5 Options
TC-6, Translating to the Low Level Intermediate Representation
TC-6 Samples
TC-7, Instruction Selection
TC-8, Liveness Analysis
TC-9, Register Allocation
Tools
Modern Compiler Implementation
The gnu Build System
Appendices
This document presents the Tiger Project as part of the EPITA curriculum. It aims at the implementation of a Tiger compiler (see Modern Compiler Implementation) in C++.
If you are a newcomer, you might be afraid by its sheer size. Don't worry, but in any case, do not give up: as stated in the very beginning of this document,
That is to say everything exposed in this document is considered to be known. If it is written but you didn't know, you are wrong. If it is not written and was not clearly reported in the news, I am wrong.
Basically this document contains three kinds of information:
There is additional material on the Internet:
This project is quite different from most other EPITA projects, and has aims at several different goals, in different areas:
This also means that you have to design a test suite, and maintain it
through out the project. The test suite is an integral part of
the project.
Note, however, that implementing an industrial strength compiler in C++
makes a lot of sense1.
Bjarne Stroustrup's list of C++ Applications mentions Metrowerks
(CodeWarrior), HP, Sun, Intel, M$ as examples.
Q: What is your opinion, is knowing assembly language useful for programmers nowadays?BS: It is useful to understand how machines work and knowing assembler is almost essential for that.
English has an important role as a common language for programmers, and I suspect that it would be unwise to abandon that without serious consideration.
Any attempt to break the importance of English is wrong. For instance,
do not translate this document nor any other. Ask support to the
Yakas, or to the English team. By the past, some oral and written
examinations were made in English. It may well be back some day. Some
books will help you to improve your English, see The Elements of Style.
The Tiger project is not unique in these regards, see Cool: The Classroom Object-Oriented Compiler, for instance, with many strikingly similar goals, and some profound differences. See also Making Compiler Design Relevant for Students who will (Most Likely) Never Design a Compiler, for an explanation of why compilation techniques have a broader influence than they seem.
This section could have been named “What Akim did not say”, or “Common misinterpretations”.
The first and foremost misinterpretation would be “Akim says C sucks and is useless”. Wrong. C sucks, definitely, but today C is probably the first employer of programmers in the world, so let's face it: C is mandatory in your education. The fact that C++ is studied afterward does not mean that learning C is a loss of time, it means that since C is basically a subset of C++ it makes sense to learn it first, it also means that (let it be only because it is a superset) C++ provides additional services so it is often a better choice, but even more often you don't have the choice.
C++ is becoming a common requirement for programmers, so you also have to learn it, although it “features” many defects (but heredity was not in its favor...). It's an industrial standard, so learn it, and learn it well: know its strengths and weaknesses.
And by the way, of course C++ sucks++.
Another common rumor in EPITA has it that “C/Unix programming does not deserve attention after the first period”. Wrong again. First of all its words are wrong: it is a legacy belief that C and Unix require each other: you can implement advanced system features using other languages than C (starting with C++, of course), and of course C can be used for other tasks than just system programming. For instance Bjarne Stroustrup's list of C++ Applications includes:
- Apple
- OS X is written in a mix of language, but a few important parts are C++. The two most interesting are:
- Finder
- IOKit device drivers. (IOKit is the only place where we use C++ in the kernel, though.)[...]
- Ericsson
- TelORB - Distributed operating system with object oriented
- Microsoft
- Literally everything at Microsoft is built using various flavors of Visual C++ - mostly 6.0 and 7.0 but we do have a few holdouts still using 5.0 :-( and some products like Windows XP use more recent builds of the compiler. The list would include major products like:
- Windows XP
- Windows NT (NT4 and 2000)
- Windows 9x (95, 98, Me)
- Microsoft Office (Word, Excel, Access, PowerPoint, Outlook)[...]
- CDE
- The CDE desktop (the standard desktop on many UNIX systems) is written in C++.
Know C. Learn when it is adequate, and why you need it.
Know C++. Learn when it is adequate, and why you need it.
Know other languages. Learn when they are adequate, and why you need them.
And then, if you are asked to choose, make an educated choice. If there is no choice to be made, just deal with Real Life.
The Tiger Compiler Project evolves every year, so as to improve its infrastructure, to demonstrate more instructional material and so forth. This section tries to keep a list of these changes, together with the most constructive criticisms from students (or ourselves).
If you have information, including criticisms, that should be mentioned here, please send it to us.
The years correspond to the class, e.g., Tiger 2005 refers to epita class 2005, i.e., the project ran from October 2002 to July (previously September) 2003.
Before diving into the history of the Tiger Compiler Project in EPITA, a whole project in itself for ourselves, with experimental tries and failures, it might be good to review some constraints that can explain why things are the way they are. Understanding these constraints will make it easier to criticize actual flaws, instead of focusing on issues that are mandated by other factors.
Bear in mind that Tiger is an instructional project, the purpose of which is detailed above, see Why the Tiger Project. Because the input is a stream of students with virtually no knowledge whatsoever in C++, and our target is a stream of students with good fluency in many constructs and understanding of complex matters, we have to gradually transform them via intermediate forms with increasing skills. In particular this means that by the end of the project, evolved techniques can and should be used, but at the beginning only introductory knowledge should be needed. As an example of a consequence, we cannot have a nice and high-tech ast.
Because the insight of compilers is not the primary goal, when a choice is to be made between (i) more interesting work on compiler internals with little C++ novelty, and (ii) providing most of this work and focusing on something else, then we are most likely to select the second option. This means that the Tiger Project is doomed to be a low-tech featureless compiler, with no call graph, no default optimization, no debugging support, no bells, no whistles, and even no etc. Hence, most interested students will sometimes feel we “stole” the pleasure to write nice pieces of code from them; understand that we actually provided code to the other students: you are free to rewrite everything if you wish.
std name space unqualified etc.). In
addition, we were using hash_map which is an sgi
extension that is not available in standard C++. It was therefore
decided to upgrade the compiler in 2003, and to upgrade the programming
style.
all target as first running clean and then the
actual build.
As a result I grew tired of fixing the tarballs, and in order to have a robust, efficient (albeit some piece of pain in the neck sometimes) distribution 2 we moved to using Automake, and hence Autoconf.
There are reasons not to be happy with it, agreed. But there are many more reasons to be sad without it. So Autoconf and Automake are here to stay.
Note, however, that you are free to use another system if you wish.
Just obey the standard package interface (see Submission).
SemantVisitor is a nightmare to maintainSemantVisitor, which performs both the type checking and the
translation to intermediate code, was near to impossible to deliver in
pieces to the students: because type checking and translation were so
much intertwined, it was not possible to deliver as a first step the
type checking machinery template, and then the translation pieces.
Students had to fight with non applicable patches. This was fixed in
Tiger 2003 by splitting the SemantVisitor into
TypeVisitor and TranslationVisitor. The negative impact,
of course, is a performance loss.
During this year, I was helped by:
Submission dates were:
| Stage | Submission
|
|---|---|
| TC-1 | Monday, December 18th 2000 at noon
|
| TC-2 | Friday, February 23th 2001 at noon
|
| TC-3 | Friday, March 30th 2001 at noon
|
| TC-4 | Tuesday, June 12th 2001 at noon
|
| TC-5 | Monday, September 17th 2001 at noon
|
Some groups have reached TC-6.
Criticisms include:
main). We had to revert to using the bad native
C++ compiler.
It is to be noted that some funny guy once replaced the g++ executable from my account into ‘rm -rf ~’. Some students and myself have been bitten. The funny thing is that this is when the system administration realized the teacher accounts were not backed up.
Fortunately, since that time, we have decent compilers made available by
students, and the Tiger Compiler is now written in strictly standard
C++.
During this year, I was helped by:
Submission dates were:
| Stage | Submission
|
|---|---|
| TC-2 | Tuesday, March 4th 2002 at noon
|
| TC-3 | Friday, March 15th 2002 at noon
|
| TC-4 | Friday, April 12th 2002 at noon
|
| TC-5 | Friday, June 14th 2002, at noon
|
| TC-6 | Monday, July 15th 2002 at noon
|
Criticisms include:
Task model.
EscapeVisitor
“optional” (actually it became a rush).
Though a garbage collector is tempting and well suited for our tasks,
its pedagogical content is less interesting: students should be taught
how to properly manage the memory.
A lot of the following material is the result of discussion with several people, including, but not limited to3:
I here thank all the people who participated to this edition of this project. It has been a wonderful vintage, thanks to the students, the assistants, and the members of the lrde.
Deliveries were:
| Stage | Submission
|
|---|---|
| TC-0 | Friday, January 24th 2003 12:00
|
| TC-1 | Friday, February 14th 2003 12:00
|
| TC-2 | Friday, March 14th 2003 12:00
|
| TC-4 | Friday, April 25th 2003 12:00
|
| TC-3 | Rush from Saturday, May 24th at 18:00 to Sunday 12:00
|
| TC-56 | Friday, June 20th 2003, 12:00
|
| TC-7 | Friday, July 4th 2003 12:00
|
| TC-78 | Friday, July 18th 2003 12:00
|
| TC-9 | Monday, September 8th 2003 12:00
|
Criticisms about Tiger 2005 include:
The factors that had pushed to a weak memory management is mainly a lack of coordination between developers: we should have written more things. So don't do as we did: define the memory management policy for each module, and write it.
The 2006 edition pays strict attention to memory allocation.
The interfaces between modules have also been cleaned to avoid excessive inter dependencies. Also, when possible, opaque types are used to avoid additional includes. Each module exports forward declarations in a fwd.hh file to promote this. For instance, ast/tasks.hh today includes:
// Forward declarations of ast:: items.
#include "ast/fwd.hh"
// ...
/// Global root node of abstract syntax tree.
extern ast::Exp* the_program;
// ...
where it used to include all the ast headers to define exactly
the type ast::Exp.
Void (in which case the translation must not
issue an actual assignment), or whether ‘a < b’ is about strings
(in which case the translation will issue a hidden call to
strcmp), or the type of a variable (needed when implementing
object oriented Tiger), etc., etc.
As you can see, the list is virtually infinite. So we would need an
extensible system of annotation of the ast. As of September
2003 no solution has been chosen. But we must be cautious not to
complicate TC-2 too much (it is already a very steep step).
To avoid this:
We are considering splitting TC-5 into two: TC-5- which would be limited to
programs without escaping variables, and TC-5+ with escaping variables
and the computation of the escapes.
Since Tiger 2006, the coding style enforces a more
conventional style.
I have been helped by:
Deliveries:
| Stage | Kind | Submission | Supervisor
|
|---|---|---|---|
| TC-0 | Wednesday, 2004-02-04 12:00 | Anne-Lise Brourhant
| |
| TC-1 | Sunday, 2004-02-08 12:00 | Tristan Lanfrey
| |
| TC-2 | Sunday, 2004-03-07 12:00 | Anne-Lise Brourhant, Tristan Lanfrey
| |
| TC-3 | Rush | Fr., 2004-03-19 18:30 to Sun., 2004-03-21 19:00 | Fabrice Hesling
|
| TC-4 | Sunday, 2004-04-11 19:00 | Tristan Lanfrey
| |
| TC-5 | Sunday, 2004-06-06 12:00 | Fabrice Hesling
| |
| TC-6 | Sunday, 2004-06-27 12:00 | Marco Tessari
| |
| TC-7 | Opt | Sunday, 2004-07-11 12:00 | Marco Tessari or Fabrice Hesling
|
| TC-89 | Opt | Thursday, 2004-07-29 12:00 | Marco Tessari
|
Criticisms about Tiger 2006 include:
symbol::Table should be providedThe problem actually disappeared: Tiger 2007 no longer depends so
heavily on scoped symbol tables.
let
type rec = {}
in
rec {}
end
where the type rec escapes its scope since the type checker will
assign the type rec to the let construct. Given the
suggested implementation, which reclaims memory allocated by the
declarations when closing the scope, the compiler dumps core.
The new implementation, tested with 2005b, copes with this gracefully: types are destroyed when the AST is. This does not cure the example, which should be invalid IMHO. The following example, from Arnaud Fabre, amplifies the problem.
let
var box :=
let
type box = {val: string}
var box := box {val = "42\n"}
in
box
end
in
print (box.val)
end
TranslateVisitor: the Binder did it all).
Escapable,
Bindable etc.).
Amongst other noteworthy changes, after five years of peaceful existence, the stages of the compiler were renamed from T1, T4 etc. to TC-1, TC-4... EPITA moved from “periods” (P1, P2...) to “trimesters” and they stole T1 and so forth from Tiger.
I have been helped by:
Deliveries:
| Stage | Kind | Submission
|
|---|---|---|
| TC-1 | Sun 2004-10-10 12:00
| |
| TC-2 | Sun 2004-10-24 12:00
| |
| TC-3 | Sun 2004-11-7 12:00
| |
| TC-4 | Sun 2004-11-28 12:00
|
Criticisms about Tiger 2006 include:
misc::identPrintVisitor code
includes a few examples.
I have been helped by:
Deliveries:
| Stage | Kind | Launch | Submission | Supervisor
|
|---|---|---|---|---|
| TC-0 | Wed 2005-03-09 | Tue 2005-03-15 23:42 | Bastien Gueguen
| |
| TC-1 | Rush | Fri 2005-03-18 | Sun 2005-03-19 9:00 | Guillaume Bousquet
|
| TC-2 | Mon 2005-03-21 | Sun 2005-04-03 | Nicolas Rateau
| |
| TC-3 | Rush | Fri 2005-04-08 20:00 | Sun 2005-04-10 12:00 | Fanny Ricour
|
| TC-4 | Mon 2005-04-18 | Sun 2005-05-01 | Julien Nesme
| |
| TC-5 | Mon 2005-05-09 | Sun 2005-06-05 | Benoît Monin
| |
| TC-6 | Mon 2005-06-06 | Sun 2005-06-12 | Philippe Kajmar
| |
| TC-7 | Mon 2005-06-13 | Sun 2005-06-19 | Gilles Walbrou
| |
| TC-8 | Mon 2005-06-20 | Mon 2005-06-27 | Arnaud Fabre
| |
| TC-9 | Mon 2005-06-20 | Sun 2005-07-03 | Arnaud Fabre
| |
| Final submission | Wed 2005-07-06 |
|
Criticisms about Tiger 2007 include:
misc:: toolsSome groups would have liked to have the files earlier: in the future we will publish them on the Wednesday, instead of the last minute.
Some groups have found it very difficult to be several working together
on the same file (binder.cc of course). This is also a
problem in the group management, and use of version control: when tasks
are properly assigned, and using a tool such as Subversion, such
problems should be minimal. In particular, merges resulting from
updates should not be troublesome! Difficult updates result from
disordered edition of the files. Dropping the use of a version control
manager is not an answer: you will be bitten one day if two people edit
concurrently the same file. One option is to split the file, say
binder-exp.cc and binder-dec.cc for instance.
I leave this to students.
Binder::decs_visit, but the majority prefers: we will stay
on this version, but we will emphasize that students are free not to
follow our suggestions.
People agree it is harder, and mainly because of compiler construction issues, not C++ issues. But many students prefer to keep it this way, rather than completely giving away the answers to compiler construction related problems.
We have been helped by:
Deliveries:
| Stage | Kind | Launch | Submission | Supervisor
|
|---|---|---|---|---|
| TC-0 | Tue 01-03 | Fri 01-13 23:42 | Christophe Duong
| |
| TC-1 | Rush | Fri 03-17 | Sun 03-19 12:12 | Renaud Lienhart
|
| TC-2 | Mon 03-20 | Thu 03-30 23:42 | David Doukhan
| |
| TC-3 | Rush | Fri 03-31 | Sun 04-02 12:12 | Frederick Mousnier-Lompre
|
| TC-4 | Tue 04-04 | Mon 04-24 23:42 | Guillaume Deslandes
| |
| TC-5 | Mon 05-01 | Sun 05-28 23:42 | Alexis Sebbane
| |
| TC-6 | Mon 05-29 | Sun 06-11 23:42 | Christophe Duong
| |
| TC-7 | Wed 06-14 | Wed 06-21 12:00 |
| |
| TC-8 | Wed 06-21 | Sun 07-2 12:00 |
| |
| TC-9 | Mon 07-03 | Sun 07-16 12:00 |
| |
| Final |
|
Some of the noteworthy changes compared to Tiger 2007:
let <decs> end, is simplified into <decs>.
We also use GLR starting at TC-2. &, | and the unary
minus operator are desugared using concrete syntax transformations.
DesugarVisitor, BoundCheckingVisitor and InlineVisitor.
We have been helped by:
Deliveries:
| Stage | Kind | Launch | Submission | Supervisor
|
|---|---|---|---|---|
| LC-0 | Mon 03-05 | Fri 03-16 12:00 |
| |
| LC-1 | Rush | Fri 03-23 | Sun 03-25 12:00 |
|
| LC-2 | Mon 03-26 | Fri 04-06 12:00 |
| |
| LC-3 & LC-R | Rush | Fri 04-06 | Sun 04-08 12:00 |
|
| LC-4 | Mon 04-23 | Sun 05-06 12:00 |
| |
| LC-5 | Mon 05-15 | Sun 06-03 12:00 |
| |
| LC-6 | Mon 06-04 | Sun 06-10 12:00 |
| |
| LC-7 | Mon 06-11 | Wed 06-20 12:00 |
| |
| LC-8 | Thu 06-21 | Sun 07-01 12:00 |
| |
| LC-9 | Mon 07-02 | Sun 07-15 12:00 |
|
Some of the noteworthy changes compared to Tiger 2008:
We have been helped by:
Deliveries:
| Stage | Kind | Launch | Submission | Supervisor
|
|---|---|---|---|---|
| TC-0 | Mon Nov 05, 2007 | Sun Nov 25, 2007 12:00 |
| |
| TC-1 | Mon Dec 10, 2007 | Sun Dec 16, 2007 12:00 |
| |
| TC-2 | Mon Feb 25, 2008 | Wed Mar 05, 2008 12:00 |
| |
| TC-3 & TC-R | Rush | Fri Mar 07, 2008 | Sun Mar 09, 2008 12:00 |
|
| TC-4 | Mon Mar 10, 2008 | Sun Mar 23, 2008 12:00 |
| |
| TC-5 | Mon Mar 24, 2008 | Sun Apr 06, 2008 12:00 |
| |
| TC-6 | Mon Apr 14, 2008 | Sun Apr 20, 2008 12:00 |
| |
| TC-7 | Mon Apr 21, 2008 | Sun May 04, 2008 12:00 |
| |
| TC-8 | Mon May 05, 2008 | Sun May 18, 2008 12:00 |
| |
| TC-9 | Mon May 19, 2008 | Sun Jun 01, 2008 12:00 |
|
Some of the noteworthy changes compared to Leopard 2009:
This is the tenth year of the Tiger Project.
We have been helped by:
Deliveries:
| Stage | Kind | Launch | Submission | Supervisor
|
|---|---|---|---|---|
| .tig | Rush | Dec 20, 2008 | Dec 21, 2008 |
|
| TC-0 | Jan 05, 2009 | Jan 16, 2009 at 12:00 |
| |
| TC-1 | Rush | Jan 16, 2009 | Jan 18, 2009 at 12:00 |
|
| TC-2 | Feb 16, 2009 | Feb 25, 2009 at 23:42 |
| |
| TC-3 & TC-R | Rush | Feb 27, 2009 | Mar 01, 2009 at 11:42 |
|
| TC-4 & TC-E | Mar 02, 2009 | Mar 15, 2009 at 11:42 |
| |
| TC-5 | Mar 16, 2009 | Mar 25, 2009 at 23:42 |
| |
| TC-6 | Apr 23, 2009 | May 03, 2009 at 12:00 |
| |
| TC-7 | May 04, 2009 | May 17, 2009 |
| |
| TC-8 | May 18, 2009 | May 31, 2009 |
| |
| TC-9 | Jun 29, 2009 | Jul 12, 2009 |
|
Some of the noteworthy changes compared to Tiger 2010:
.tig project: The Bistromatig.
It consists in implementing an arbitrary-radix infinite-precision
calculator. The project is an adaptation of the famous Bistromathic
project, that used to be one of the first C assignments at EPITA in the
Old Days. The name was borrowed from
Douglas Adams's
invention from
Life, the Universe and Everything.
This is the eleventh year of the Tiger Project.
We have been helped by:
Deliveries:
| Stage | Kind | Launch | Submission | Supervisor
|
|---|---|---|---|---|
| .tig | Rush | Dec 02, 2009 | Dec 04, 2009 |
|
| TC-0 | Dec 11, 2009 | Dec 20, 2009 |
| |
| TC-1 | Jan 11, 2010 | Jan 17, 2010 |
| |
| TC-2 | Feb 01, 2010 | Feb 17, 2010 |
| |
| TC-3 & TC-R | Rush | Feb 19, 2010 | Feb 26, 2010 |
|
| TC-4 & TC-E | Feb 22, 2010 | Mar 07, 2010 |
| |
| TC-5 | Mar 11, 2010 | Mar 22, 2010 |
| |
| TC-6 | Apr 19, 2010 | May 02, 2010 |
| |
| TC-7 | May 12, 2010 | May 25, 2010 |
| |
| TC-8 | May 25, 2010 | Jun 06, 2010 |
| |
| TC-9 | Jun 07, 2010 | Jun 12, 2010 |
|
Some of the noteworthy changes compared to Tiger 2011:
Bear in mind that if you are writing, it is to be read, so pay attention to your reader.
epita.cours.compile. You need to have a very good reason to send
a message to the assistants or to Akim, as it usually annoys us, which is
not in your interest.
The news group epita.cours.compile is dedicated to the Compiler
Construction lecture, the Tiger project, and related matters (e.g.,
assignments in Tiger itself). Any other material is off topic.
| Don't do that | Do this
|
|---|---|
| Problem in TC-1 | Cannot generate location.hh
|
| make check | make check fails on test-ref
|
This includes the test cases. While posting a simple test case is
tolerated, sending many of them, or simply one that addresses a specific
common failure (e.g., some obscure cases for escapes) is strictly
forbidden.
As any other assignment, the Tiger Project comes with its rules to follow.
It is strictly forbidden to possess code that is not yours. You are encouraged to work with others, but don't get a copy of their code. See How not to go about a programming assignment, for more hints on what will not be accepted.
Test cases and test engines development are parts of the Tiger Project. As such the same rules apply as for code.
If something illegal happened in the course of a stage, let us know, arrangements might be possible. If we find out, the rules will be strictly applied. It already happened that third year students have had to redo the Tiger Project because their code was found in another group: -42/20 is seldom benign.
Don't bother everybody instead of trying first. Conversely, once you did your best, don't hesitate working with others.
Starting with TC-1, assignments are to be done by groups of four.
The first cause of failures to the Tiger project is human problems within the groups. I cannot stress too much the importance of constituting a good group of four people. The Tiger project starts way before your first line of code: it begins with the selection of your partners.
Here are a few tips, collected wisdom from the previous failures.
At the first stage, the leader assigns you a task. You try and fail, for weeks. In the meanwhile, the other members teach you lots of facts, but (i) you can't memorize everything and end up saying “hum hum” without having understood, and (ii) because they don't understand what you don't understand, they are often poor teachers. The day before the submission, the leader does your assignments to save the group. You learned nothing, or quite. Second stage: same beginning, you are left with your assignment, but the other members are now bothered by your asking questions: why should they answer, since you don't understand what they say (remember: they are poor teachers because they don't understand your problems), and you don't seem to remember anything! The day before the submission, they do your work. From now on, they won't even ask you for anything: “fixing” you is much more time consuming than just doing it by themselves. Oral examinations reveal you neither understand nor do anything, hence your grades are bad, and you win another round of first year...
Take my advice: if you have difficulties with programming, be with other
people like you. Your chances are better together, and anyway you are
allowed to ask for assistance from other groups.
This section could have been named “Strong and Weak Requirements”, as it includes not only mandatory features from your compiler (memory management), but also tips and advice. As the captain Barbossa would put it, “actually, it's more of a guideline than a rule.”
The code you deliver must be clean. In particular, when some code is provided, and you have to fill in the blanks denoted by ‘FIXME: Some code has been deleted.’. Sometimes you will have to write the code from scratch.
In any case, dead code and dead comments must be removed. You are free to leave comments spotting places where you fixed a ‘FIXME:’, but never leave a fixed ‘FIXME:’ in your code. Nor any irrelevant comment.
The official compiler for this project, is GNU C++ Compiler, 4.1 or higher (see GCC).
If, and only if, you already have enough fluency in C++ to be willing to try something wilder, then the following exception is made for you. Be warned: along the years the Tiger project was polished to best fit the typical epitean learning curve, trying to escape this curve is also taking a major risk. By the past, some students tried different approaches, and ended with unmaintainable pieces of code.
If you and your group are sure you can afford some additional difficulty (for additional benefits), then you may use the following extra tools. You have to warn the examiners that you use these tools. You also have to take care of harnessing configure.ac to make sure that what you need is available on the testing environment. Be also aware that you are likely to obtain less help from us if you use tools that we don't master: You are on your own, but, hey!, that's what you're looking for, ain't it?
libboost-*.
See Boost.org.
If you think about something not listed here, please send us your proposal; acceptance is required to use them.
There are some strict conventions to obey wrt the files and their contents.
LikeThis per files like-this.*Each class
LikeThisis implemented in a single set of file named like-this.*. Note that the mixed case class names are mapped onto lower case words separated by dashes.There can be exceptions, for instance auxiliary classes used in a single place do not need a dedicated set of files.
The *.hh should contain only declarations, i.e., prototypes,
externfor variables etc. Inlined short methods are accepted when there are few of them, otherwise, create an *.hxx file. The documentation should be here too.There is no good reason for huge objects to be defined here.
As much as possible, avoid including useless headers (GotW007, GotW034):
- when detailed knowledge of a class is not needed, instead of
#include "foo.hh"write
// Fwd decl. class Foo;or better yet: use the appropriate fwd.hh file (read below).
- if you need output streams, then include ostream, not iostream. Actually, if you merely need to declare the existence of streams, you might want to include iosfwd.
Some definitions should be loaded in different places: templates, inline functions etc. Declare and document them in the *.hh file, and implement them in the *.hxx file. The *.hh file last includes the *.hxx file, conversely *.hxx first includes *.hh. Read below.
Big objects should be defined in the *.cc file corresponding to the declaration/documentation file *.hh.
There are less clear cut cases between *.hxx and *.cc. For instance short but time consuming functions should stay in the *.cc files, since inlining is not expected to speed up significantly. As another example features that require massive header inclusions are better defined in the *.cc file.
As a concrete example, consider the
acceptmethods of the ast classes. They are short enough to be eligible for an *.hxx file:void LetExp::accept (Visitor& v) { v (*this); }We will leave them in the *.cc file though, since this way only the *.cc file needs to load ast/visitor.hh; the *.hh is kept short, both directly (its contents) and indirectly (its includes).
There are several strategies to compile templates. The most common strategy consists in leaving the code in a *.hxx file, and letting every user of the class template instantiate the code. While correct, this approach has several drawbacks:
- Because the *.hh file includes the *.hxx file, each time a simple declaration of a template is needed, the full implementation comes with it. And if the implementation requires other declarations such as
std::iostream, you force all the client code to parse the iostream header!- The instantiation is performed several times, which is time and space consuming.
- The dependencies are tight: the clients of the template depend upon its implementation.
To circumvent these problems, we introduce this fourth type of file, *.hcc: files that must be compiled once for each concrete template parameter.
A basic example is probably the easiest means to introduce the concept. The class template
Boxwill be used for several parameters, sayintandstd::string. box.hh defines its interface./** ** \file box.hh ** \brief Declaration of Box. **/ #ifndef BOX_HH # define BOX_HH template <typename T> class Box { public: Box (const T& t); virtual ~Box (); // And many others... }; #endif // !BOX_HHThen we define box.hcc which resembles what box.hxx would have been, except that (i) box.hh does not include box.hcc, and (ii) there are no
inlinehere./** ** \file box.hcc ** \brief Implementation of Box. **/ #ifndef BOX_HCC # define BOX_HCC #include <iostream> #include "box.hh" template <typename T> Box::Box (const T& t) { // Implementation details. } template <typename T> Box::~Box () { // Implementation details. } #endif // !BOX_HCCLast, we create files that perform the explicit template instantiations. In our running example, we chose to have a single file for both instantiations:
/** ** \file box.cc ** \brief Instantiations of Box. **/ #include "box.hcc" template class Box<int>; template class Box<std::string>;Neither the headers string and iostream nor box.hcc have “leaked” into box.hh: the rest of the program is independent of them.
Use the preprocessor to ensure the contents of a file is read only once. This is critical for *.hh and *.hxx files that include one another.
One typically has:
/** ** \file sample/sample.hxx ** \brief Declaration of sample::Sample. **/ #ifndef SAMPLE_SAMPLE_HH # define SAMPLE_SAMPLE_HH // ... #include "sample/sample.hxx" #endif // !SAMPLE_SAMPLE_HH/** ** \file sample/sample.hxx ** \brief Inlined definition of sample::Sample. **/ #ifndef SAMPLE_SAMPLE_HXX # define SAMPLE_SAMPLE_HXX #include "sample/sample.hh" // ... #endif // !SAMPLE_SAMPLE_HXX
Dependencies can be a major problem during big project developments. It is not acceptable to “recompile the world” when a single file changes. To fight this problem, you are encouraged to use fwd.hh files that contain simple forward declarations. Everything that defeat the interest of fwd.hh file must be avoided, e.g., including actual header files. These forward files should be included by the *.hh instead of more complete headers.
The expected benefit is manifold:
- A forward declaration is much shorter.
- Usually actual definitions rely on other classes, so other ‘#include’s etc. Forward declarations need nothing.
- While it is not uncommon to change the interface of a class, changing its name is infrequent.
Consider for example ast/visitor.hh, which is included directly or indirectly by many other files. Since it needs a declaration of each ast node one could be tempted to use ast/all.hh which includes virtually all the headers of the
astmodule. Hence all the files including ast/visitor.hh will bring in the wholeastmodule, where the much shorter and much simpler ast/fwd.hh would suffice.Of course, usually the *.cc files need actual definitions.
likethisThe compiler is composed of several modules that are dedicated to a set of coherent specific tasks (e.g., parsing, AST handling, register allocation etc.). A module name is composed of lower case letters exclusively,
likethis, notlike_thisnorlike-this. This module's files are stored in the directory with the same name, which is also that of the namespace in which all the symbols are defined.Contrary to file names, we do not use dashes to avoid clashes with Swig and
namespace.
The interface of the module module contains only pure functions: these functions should not depend upon globals, nor have side effects of global objects. Global variable are forbidden here.
Tasks are the place for side effects. That's where globals such as the current ast, the current assembly program, etc., are defined and modified.
The standard reserves a number of identifier classes, most notably ‘_*’ [17.4.3.1.2]:
Each name that begins with an underscore is reserved to the implementation for use as a name in the global namespace.Using ‘_*’ is commonly used for CPP guards (‘_FOO_HH_’), private members (‘_foo’), and internal functions (‘_foo ()’): don't.
LikeThisClass should be named in mixed case; for instance
Exp,StringExp,TempMap,InterferenceGraphetc. This applies to class templates. See CStupidClassName.
like_thisNo upper case letters, and words are separated by an underscore.
like_this_It is extremely convenient to have a special convention for private and protected members: you make it clear to the reader, you avoid gratuitous warnings about conflicts in constructors, you leave the “beautiful” name available for public members etc. We used to write
_like_this, but this goes against the standard, see Stay out of reserved names.For instance, write:
class IntPair { public: IntPair (int first, int second) : first_ (first), second_ (second) { } protected: int first_, second_; }See CStupidClassName.
typedef foo_typeWhen declaring a
typedef, name the type foo_type(where foo is obviously the part that changes). For instance:typedef std::map<const Symbol, Entry_T> map_type; typedef std::list<map_type> symtab_type;We used to use foo
_t, unfortunately this (pseudo) name space is reserved by posix.
super_typeIt is often handy to define the type of “the” super class (when there is a single one); use the name
super_typein that case. For instance most Visitors of the ast start with:class TypeChecker: public ast::DefaultVisitor { public: typedef ast::DefaultVisitor super_type; using super_type::operator(); // ...(Such
usingclauses are subject to the current visibility modifier, hence thepublicbeforehand.)
Hide auxiliary/helper classes (i.e., classes private to a single compilation unit, not declared in a header) in functions, or in an anonymous namespace. Instead of:
struct Helper { ... }; void doit () { Helper h; ... }write:
namespace { struct Helper { ... }; } void doit () { Helper h; ... }or
void doit () { struct Helper { ... } h; ... }The risk otherwise is to declare two classes with the same name: the linker will ignore one of the two silently. The resulting bugs are often difficult to understand.
Use every possible means to release the resources you consume, especially memory. Valgrind can be a nice assistant to track memory leaks (see Valgrind). To demonstrate different memory management styles, you are invited to use different features in the course of your development: proper use of destructors for the ast, use of a factory for
Symbol,Tempetc., use ofstd::auto_ptrstarting with theTranslatemodule, and finally use of reference counting via smart pointers for the intermediate representation.
Code duplication is your enemy: the code is less exercised (if there are two routines instead of one, then the code is run half of the time only), and whenever an update is required, you are likely to forget to update all the other places. Strive to prevent code duplication from sneaking into your code. Every C++ feature is good to prevent code duplication: inheritance, templates etc.
dynamic_cast of referencesOf the following two snippets, the first is preferred:
const IntExp& ie = dynamic_cast<const IntExp&> (exp); int val = ie.value_get ();const IntExp* iep = dynamic_cast<const IntExp*> (&exp); assert (iep); int val = iep->value_get ();While upon type mismatch the second
aborts, the first throws astd::bad_cast: they are equally safe.
Do not use type cases: if you want to dispatch by hand to different routines depending upon the actual class of objects, you probably have missed some use of virtual functions. For instance, instead of
bool compatible_with (const Type& lhs, const Type& rhs) { if (&lhs == &rhs) return true; if (dynamic_cast<Record*> (&lhs)) if (dynamic_cast<Nil*> (&rhs)) return true; if (dynamic_cast<Record*> (&rhs)) if (dynamic_cast<Nil*> (&lhs)) return true; return false; }write
bool Record::compatible_with (const Type& rhs) { return &rhs == this || dynamic_cast<Nil*> (&rhs); } bool Nil::compatible_with (const Type& rhs) { return &rhs == this || dynamic_cast<Record*> (&rhs); } bool compatible_with (const Type& lhs, const Type& rhs) { return lhs->compatible_with (rhs); }
dynamic_cast for type casesDid you read the previous item, “Use virtual methods, not type cases”? If not, do it now.
If you really need to write type dispatching, carefully chose between
typeidanddynamic_cast. In the case of tc, where we sometimes need to down cast an object or to check its membership to a specific subclass, we don't needtypeid, so usedynamic_castonly.They address different needs:
dynamic_castfor (sub-)membership,typeidfor exact type- The semantics of testing a
dynamic_castvs. a comparison of atypeidare not the same. For instance, think of a classAwith subclassBwith subclassC; then compare the meaning of the following two snippets:// Is `a' containing an object of exactly the type B? bool test1 = typeid (a) == typeid (B); // Is `a' containing an object of type B, or a subclass of B? bool test2 = dynamic_cast<B*> (&a);- Non polymorphic entities
typeidworks on hierarchies withoutvtable, or even builtin types (intetc.).dynamic_castrequires a dynamic hierarchy. Beware oftypeidon static hierarchies; for instance consider the following code, courtesy from Alexandre Duret-Lutz:#include <iostream> struct A { // virtual ~A () {}; }; struct B: A { }; int main () { A* a = new B; std::cout << typeid (*a).name () << std::endl; }it will “answer” that the
typeidof ‘*a’ isA(!). Usingdynamic_casthere will simply not compile4. If you provideAwith a virtual function table (e.g., uncomment the destructor), then thetypeidof ‘*a’ isB.- Compromising the future for the sake of speed
- Because the job performed by
dynamic_castis more complex, it is also significantly slower thattypeid, but hey! better slow and safe than fast and furious.You might consider that today, a strict equality test of the object's class is enough and faster, but can you guarantee there will never be new subclasses in the future? If there will be, code based
dynamic_castwill probably behave as expected, while code basedtypeidwill probably not.More material can be found the chapter 9 of see Thinking in C++ Volume 2: Run-time type identification.
We use const references in arguments (and return value) where otherwise a passing by value would have been adequate, but expensive because of the copy. As a typical example, accessors ought to return members by const reference:
const Exp& OpExp::lhs_get () const { return lhs_; }Small entities can be passed/returned by value.
When you need to have several names for a single entity (this is the definition of aliasing), use references to create aliases. Note that passing an argument to a function for side effects is a form of aliasing. For instance:
template <typename T> void swap (T& a, T& b) { T c = a; a = b; b = c; }
When an object is created, or when an object is given (i.e., when its owner leaves the management of the object's memory to another entity), use pointers. This is consistent with C++:
newcreates an object, returns it together with the responsibility to calldelete: it uses pointers. For instance, note the three pointers below, one for the return value, and two for the arguments:OpExp* opexp_builder (OpExp::Oper oper, Exp* lhs, Exp* rhs) { return new OpExp (oper, lhs, rhs); }
More generally, “Ensure that non-local static objects are initialized before they're used”, as reads the title of ec47.
Non local static objects (such as
std::coutetc.) are initialized by the C++ system even beforemainis called. Unfortunately there is no guarantee on the order of their initialization, so if you happen to have a static object which initialization depends on that of another object, expect the worst. Fortunately this limitation is easy to circumvent: just use a simple Singleton implementation, that relies on a local static variable.This is covered extensively in ec47.
foo_get, not get_fooAccessors have standardized names:
foo_getandfoo_set.There is an alternative attractive standard, which we don't follow:
class Class { public: int foo (); void foo (int foo); private: int foo_; }or even
class Class { public: int foo (); Class& foo (int foo); // Return *this. private: int foo_; }which enables idioms such as:
{ Class obj; obj.foo (12) .bar (34) .baz (56) .qux (78) .quux (90); }
dump as a member function returning a streamYou should always have a means to print a class instance, at least to ease debugging. Use the regular
operator<<for standalone printing functions, butdumpas a member function. Use this kind of prototype:std::ostream& Class::dump (std::ostream& ostr [, ...]) constwhere the ellipsis denote optional additional arguments.
dumpreturns the stream.
For instance, instead of declaring
typedef set::set<const Temp*> temp_set_type;declare
/// Object function to compare two Temp*. struct temp_compare: public binary_function<const Temp* , const Temp*, bool> { bool operator() (const Temp* s1, const Temp* s2) const { return *s1 < *s2; } }; typedef set::set<const Temp* , temp_compare> temp_set_type;Scott Meyers mentions several good reasons, but leaves implicit a very important one: if you don't, since the outputs will be based on the order of the pointers in memory, and since (i) this order may change if your allocation pattern changes and (ii) this order depends of the environment you run, then you cannot compare outputs (including traces). Needless to say that, at least during development, this is a serious misfeature.
When you write unary or binary predicates to use in interaction with stl, derive from
std::unary_functionorstd::binary_function. For instance:/// Object function to compare two Temp*. struct temp_ptr_less: public std::binary_function<const Temp*, const Temp*, bool> { bool operator() (const Temp* s1, const Temp* s2) const; };
Using
for_each,find,find_if,transformetc. is preferred over explicit loops. This is for (i) efficiency, (ii) correctness, and (iii) maintainability. Knowing these algorithms is mandatory for who claims to be a C++ programmer.
For instance, prefer ‘my_set.find (my_item)’ to ‘find (my_item, my_set.begin (), my_set.end ())’. This is for efficiency: the former has a logarithmic complexity, versus... linear for the latter! You may find the Item 44 of Effective STL on the Internet.
The following items are more a matter of style than the others. Nevertheless, you are asked to follow this style.
Stick to 80 column programming. As a matter of fact, stick to 76 or 78 columns most of the time, as it makes it easier to keep the diffs within the limits. And if you post/mail these diffs, people are likely to reply to the message, hence the suggestion of 76 columns, as for emails.
When declaring a class, start with public members, then protected, and last private members. Inside these groups, you are invited to group by category, i.e., methods, types, and members that are related should be grouped together. The motivation is that private members should not even be visible in the class declaration (but of course, it is mandatory that they be there for the compiler), and therefore they should be “hidden” from the reader.
This is an example of what should not be done:
class Foo { public: Foo (std::string, int); virtual ~Foo (); private: typedef std::string string_type; public: std::string bar_get () const; void bar_set (std::string); private: string_type bar_; public: int baz_get () const; void baz_set (int); private: int baz_; }rather, write:
class Foo { public: Foo (std::string, int); virtual ~Foo (); std::string bar_get () const; void bar_set (std::string); int baz_get () const; void baz_set (int); private: typedef std::string string_type; string_type bar_; int baz_; }and add useful Doxygen comments.
When declaring a derived class, try to keep its list of superclasses on the same line. Leave a space at least on the right hand side of the colon. If there is not enough room to do so, leave the colon on the class declaration line (the opposite applies for constructor, see Put initializations below the constructor declaration).
class Derived: public Base { // ... }; /// Object function to compare two Temp*. struct temp_ptr_less: public std::binary_function<const Temp*, const Temp*, bool> { bool operator() (const Temp* s1, const Temp* s2) const; };
inline in declarationsUse
inlinein implementations (i.e., *.hxx, possibly *.cc)), not during declarations (*.hh files).
virtual in subclass declarationsIf a method was once declared
virtual, it remains virtual. Nevertheless, as an extra bit of documentation to your fellow developers, repeat thisvirtual:class Base { public: // ... virtual foo (); }; class Derived: public Base { public: // ... virtual foo (); };
Pointers and references are part of the type, and should be put near the type, not near the variable.
int* p; // not `int *p;' list& l; // not `list &l;' void* magic(); // not `void *magic();'
Use
int* p; int* q;instead of
int *p, *q;The former declarations also allow you to describe each variable.
Write
std::list<int> l; std::pair<std::list<int>, int> p;with a space after the comma, and of course between two closing ‘>’:
std::list<std::list<int> > ls;These rules apply for casts:
// Come on baby, light my fire. int* p = static_cast<int*> (42);
Write
template <class T1, class T2> struct pair;with one space separating the keyword
templatefrom the list of formal parameters.
int foo (int n) { return bar (n); }The ‘()’ operator is not a list of arguments.
class Foo { public: Foo (); virtual ~Foo (); bool operator() (int n); };
Don't put or initializations or constructor invocations on the same line as you declare the constructor. As a matter of fact, don't even leave the colon on that line. Instead of ‘A::A (): B (), C()’, write either:
A::A () : B (), C () { }or
A::A () : B (), C () { }The rationale is that the initialization belongs more to the body of the constructor than its signature. And when dealing with exceptions leaving the colon above would yield a result even worse than the following.
A::A () try : B (), C () { } catch (...) { }
Nowadays most editors provide interactive spell checking, including for sources (strings and comments). For instance, see
flyspell-modein Emacs, and in particular theflyspell-prog-mode. To trigger this automatically, install the following in your ~/.emacs.el:(add-hook 'c-mode-hook 'flyspell-prog-mode 1) (add-hook 'c++-mode-hook 'flyspell-prog-mode 1) (add-hook 'cperl-mode-hook 'flyspell-prog-mode 1) (add-hook 'makefile-mode-hook 'flyspell-prog-mode 1) (add-hook 'python-mode-hook 'flyspell-prog-mode 1) (add-hook 'sh-mode-hook 'flyspell-prog-mode 1)and so forth.
End comments with a period.
For documentation as for any other kind of writing, the shorter, the better: hunt useless words. See The Elements of Style, for an excellent set of writing guidelines.
Here are a few samples of things to avoid:
- Don't document the definition instead of its object
- Don't write:
/// Declaration of the Foo class. class Foo { ... };Of course you're documenting the definition of the entities! “Declaration of the” is totally useless, just use ‘/// Foo class’. But read bellow.
- Don't qualify obvious entity kinds
- Don't write:
/// Foo class. class Foo { public: /// Construct a Foo object. Foo (Bar& bar) ... };It is so obvious that you're documenting the class and the constructor that you should not write it down. Instead of documenting the kind of an entity (class, function, namespace, destructor...), document its goal.
/// Wrapper around Bar objects. class Foo { public: /// Bind to \a bar. Foo (Bar& bar) ... };
Use the imperative when documenting, as if you were giving order to the function or entity you are describing. When describing a function, there is no need to repeat “function” in the documentation; the same applies obviously to any syntactic category. For instance, instead of:
/// \brief Swap the reference with another. /// The method swaps the two references and returns the first. ref& swap (ref& other);write:
/// \brief Swap the reference with another. /// Swap the two references and return the first. ref& swap (ref& other);The same rules apply to ChangeLogs.
Often one wants to leave a clear markup to separate different matters. For declarations, this is typically done using the Doxygen ‘\name ... \{ ... \}’ sequence; for implementation files use rebox.el (see rebox.el).
Documentation is a genuine part of programming, just as testing. We use Doxygen (see Doxygen) to maintain the developer documentation of the Tiger Compiler. The quality of this documentation can change the grade.
Beware that Doxygen puts the first letter of documentation in upper case. As a result,
/// \file ast/arrayexp.hh /// \brief ast::ArrayExp declaration.will not work properly, since Doxygen will transform
ast::ArrayExpinto ‘Ast::ArrayExp’, which will not be recognized as an entity name. As a workaround, write the slightly longer:/// \file ast/arrayexp.hh /// \brief Declaration of ast::ArrayExp.Of course, Doxygen documentation is not appropriate everywhere.
There must be a single location, that's our standard.
Prefer C comments (‘/** ... */’) to C++ comments (‘/// ...’). This is to ensure consistency with the style we use.
Because it is lighter, instead of
/** \brief Name of this program. */ extern const char* program_name;prefer
/// Name of this program. extern const char* program_name;For instance, instead of
/* Construct an InterferenceGraph. */ InterferenceGraph (const std::string& name, const assem::instrs_t& instrs, bool trace = false);or
/** @brief Construct an InterferenceGraph. ** @param name its name, hopefully based on the function name ** @param instrs the code snippet to study ** @param trace trace flag **/ InterferenceGraph (const std::string& name, const assem::instrs_t& instrs, bool trace = false);or
/// \brief Construct an InterferenceGraph. /// \param name its name, hopefully based on the function name /// \param instrs the code snippet to study /// \param trace trace flag InterferenceGraph (const std::string& name, const assem::instrs_t& instrs, bool trace = false);write
/** \brief Construct an InterferenceGraph. \param name its name, hopefully based on the function name \param instrs the code snippet to study \param trace trace flag */ InterferenceGraph (const std::string& name, const assem::instrs_t& instrs, bool trace = false);
As stated in Rules of the Game, writing a test framework and tests is part of the exercise.
As a starting point, we provide a tarball containing a few Tiger files, see Given Test Cases. They are not enough: your test suite should be continually expanding.
In three occasions tests are “easy” to write:
See Testing student-made compilers, for many hints on what tests you need to write.
If bardec_f is the head of your group, the tarball must be
bardec_f-tc-n.tar.bz2 where n is the number of the
“release” (see Package Name and Version). The following commands
must work properly:
$ bunzip2 -cd bardec_f-tc-n.tar.bz2 | tar xvf -
$ cd bardec_f-tc-n
$ export CXX=g++-4.1
$ mkdir _build
$ cd _build
$ ../configure
$ make
$ src/tc /tmp/test.tig
$ make distcheck
For more information on the tools, see The GNU Build System, GCC.
Your tarball must be done via ‘make distcheck’ (see Making a Tarball). Any tarball which is not built thanks to ‘make distcheck’ (this is easy to see: they include files we don't want, and don't contain some files we need...) will be severely penalized.
Once the tarball passed these tests, upload it as specified on the Assistants' Submission Page.
Some stages are evaluated only by a program, and others are evaluated both by humans, and a program.
Each stage of the compiler will be evaluated by an automatic corrector. As soon as the tarball are submitted, the logs are available on the assistants' intranet.
Automated evaluation enforces the requirements: you must stick to what is being asked. For instance, for TC-E it is explicitly asked to display something like:
var /* escaping */ i : int := 2
so if you display any of the following outputs
var i : int /* escaping */ := 2
var i /* escaping */ : int := 2
var /* Escapes */ i : int := 2
be sure to fail all the tests, even if the computation is correct.
If you find some unexpected errors (your project does compile with the reference compiler, some files are missing, your output is slightly incorrect etc.), you should consider uploading again.
When you are defending your projects, here are a few rules to follow:
Conversely, there is something I wish to make clear: I, Akim, and the other examiners, will probably be harsh (maybe even very harsh), but this does not mean I disrespect you, or judge you badly.
You are here to defend your project and knowledge, I'm here to stress them, to make sure they are right. Learning to be strong under pressure is part of the exercise. Don't burst into tears, react! Don't be shy, that's not the proper time: you are selling me something, and I will never buy something from someone who cries when I'm criticizing his product.
You should also understand that human examination is the moment where we try to evaluate who, or what group, needs help. We are here to diagnose your project and provide solutions to your problems. If you know there is a problem in your project, but you failed to fix it, tell it to the examiner! Work with her/him to fix your project.
The point of this evaluation is to measure, among other things:
Examiners: the human grade.
The examiner should not take (too much) the automated tests into account to decide the mark: the mark is computed later, taking this into account, so don't do it twice.
Examiners: broken tarballs.
If you fixed the tarball or made whatever modification, run ‘make distcheck’ again, and update the delivered tarball. Do not keep old tarballs, do not install them in a special place: just replace the first tarball with it, but say so in the ‘eval’ file.
The rationale is simple: only tarballs pass the tests, and every tarball must be able to pass the tests. If you don't do that, then someone else will have to do it again.
Because the Tiger Compiler is a project with stages, the computation of the marks depends on the stages too. To spell it out explicitly:
A stage is penalized by bad results on tests performed for previous stages.
It means, for instance, that a TC-3 compiler will be exercised on TC-1, TC-2, and TC-3. If there are still errors on TC-1 and TC-2 tests, they will pessimize the result of TC-3 tests. The older the errors are, the more expensive they are.
As an example, here are the formulas to compute the global success rate of TC-3 and TC-5:
global-rate-TC-3 := rate-TC-3 * (+ 2 * rate-TC-1
+ 1 * rate-TC-2) / 3
global-rate-TC-5 := rate-TC-5 * (+ 4 * rate-TC-1
+ 3 * rate-TC-2
+ 2 * rate-TC-3
+ 1 * rate-TC-4) / 10
Because a project which fail half of the time is not a project that deserves half of 20, the global-rate is elevated to 1.7 before computing the mark:
mark-TC-3 := roundup (power (global-rate-TC-3, 1.7) * 20 - malus-TC-3, 1)
where ‘roundup (x, 1)’ is x rounded up to one decimal (‘roundup (15, 1) = 15’, ‘roundup (15.01, 1) = 15.1’).
When the project is also evaluated by a human, ‘power’ is not used. Rather, the success rate modifies the mark given by the examiner:
mark-TC-2 := roundup (eval-TC-2 * global-rate-TC-2 - malus-TC-2, 1)
The naming scheme for provided tarballs is different from the scheme you must follow (see Submission). Our naming scheme looks like 2004-tc-2.0.tar.bz2. If we update the tarballs, they will be named 2004-tc-2.x.tar.bz2. But your tarball must be named login-tc-2.tar.bz2, even if you send a second version of your project.
We also (try to) provide patches from one tarball to another. For instance 2010-tc-1.0-2.0.diff is the
difference
from 2010-tc-1.0.tar.bz2 to
2010-tc-2.0.tar.bz2 (and 2010-tc-1.1-tc-2.0.diff is
the difference from 2010-tc-1.1.tar.bz2 to
2010-tc-2.0.tar.bz2). You are encouraged to read this file
as understanding a patch is expected from any Unix programmer. Just run
‘bzless 2010-tc-1.0-2.0.diff’.
To apply the patch:
You might need to repeat the process to jump from a version x to x + 2 via version x + 1.
This section describes the mandatory layout of the tarball.
Fabrice Bardčche <bardec_f@epita.fr>
Jean-Paul Sartre <sartre_j@epita.fr>
Jean-Paul Deux <deux_j@epita.fr>
Jean-Paul Belmondo <belmon_j@epita.fr>
The group leader is first. Do not include emails other than those of
EPITA. I repeat: give the ‘6_1@epita.fr’ address. The file
AUTHORS is automatically distributed, but pay attention to the
spelling.
This is a wrapper around Bison, tailored to produce C++ parsers. Compared to bison,
bison++updates the output files only if changed. For a file such as location.hh, virtually included by the whole front-end, this is a big win.Also, bison outputs ‘\file location.hh’ in Doxygen documentation, which clashes with ast/location.hh. bison++ changes this into ‘\file parse/location.hh’.
This file provides two new Emacs functions, ‘M-x rebox-comment’ and ‘M-x rebox-region’. They build and maintain nice looking boxed comments in most languages. Once installed (read it for instructions), write a simple comment such as:
// Comments end with a period.then move your cursor into this comment and press ‘C-u 2 2 3 M-q’ to get:
/*-----------------------------. | Comments end with a period. | `-----------------------------*/‘2 2 3’ specifies the style of the comment you want to build. Once the comment built, ‘M-q’ suffices to refill it. Run ‘C-u - M-q’ for an interactive interface.
The command line parser we use.
Convenient C++ routines.
The class
misc::errorimplements an error register. Because libraries are expected to be pure, they cannot issue error messages to the error output, nor exit with failure. One could pass call-backs (as functions or as objects) to set up error handling. Instead, we chose to register the errors in an object, and have the library functions return this register: it is up to the caller to decide what to do with these errors. Note also that direct calls tostd::exitbypass stack unwinding. In other words, withstd::exit(instead ofthrow) your application leaks memory.An instance of
misc::errorcan be used as if it were a stream to output error messages. It also keeps the current exit status until it is “triggered”, i.e., until it is thrown. Each module has its own error handler. For instance, theBinderhas anerror_attribute, and uses it to report errors:void Binder::error (const ast::Ast& loc, const std::string& msg) { error_ << misc::error::bind << loc.location_get () << ": " << msg << std::endl; }Then the task system fetches the local error handler, and merges it into the global error handler
error(see common.*). Some tasks trigger the error handler: if errors were registered, an exception is raised to exit the program cleanly. The following code demonstrates both aspects.void bindings_compute () { // bind::bind returns the local error handler. error << ::bind::bind (*ast::tasks::the_program); error.exit_on_error (); }
This file implements a means to output string while escaping non printable characters. An example:
cout << "escape (\"\111\") = " << escape ("\"\111\"") << endl;Understanding how
escapeworks is required starting from TC-2.
This file contains a generic implementation of oriented and undirected graphs.
Understanding how
graphworks is required starting from TC-8.
A wrapper around
std::setthat introduce convenient operators (operator+and so forth).
The handling of
misc::scoped_map<Key, Data>, generic scoped map, serving as a basis for symbol tables used by theBinder.misc::scoped_mapmaps aKeyto aData(that should ring a bell...). You are encouraged to implement something simple, based on stacks (seestd::stack, or better yet,std::list) and maps (seestd::map).It must provide this interface:
— Data: get (const Key& key) const
If key was associated to some
Datain the open scopes, return the most recent insertion. Otherwise, ifDatais a pointer type, then return the empty pointer, else throw astd::range_error. To implement this feature, see misc/traits.
In a program, the rule for identifiers is to be used many times: at least once for its definition, and once for each use. Just think about the number of occurrences of
size_tin a C program for instance.To save space one keeps a single copy of each identifier. This provides additional benefits: the address of this single copy can be used as a key: comparisons (equality or order) are much faster.
The class
misc::symbolis an implementation of this idea. See the lecture notes, scanner.pdf.misc::symbolis built on top ofmisc::unique.
A class that makes it possible to have timings of processes, similarly to gcc's --time-report, or bison's --report=time. It is used in the
Taskmachinery, but can be used to provide better timings (e.g., separating the scanner from the parser).
A simple traits to learn whether a type is a pointer type. See Traits, for more about traits.
A generic class implementing the Flyweight design pattern. It maps identical objects to a unique reference.
No namespace for the time being, but it should be task.
Delivered for TC-1. A generic scheme to handle the
components of our compiler, and their dependencies.
Namespace ‘parse’. Delivered during TC-1.
Namespace ‘ast’, delivered for TC-2. Implementation of the abstract syntax tree. The file ast/README gives an overview of the involved class hierarchy.
Abstract base class of the compiler's visitor hierarchy. Actually, it defines a class template
GenVisitor, which expects an argument which can be eithermisc::constify_traitsormisc::id_traits. This allows to define two parallel hierarchies:ConstVisitorandVisitor, similar toiteratorandconst_iterator.The understanding of the template programming used is not required at this stage as it is quite delicate, and goes far beyond your (average) current understanding of templates.
Implementation of the
GenDefaultVisitorclass template, which walks the abstract syntax tree, doing nothing. This visitor do not define visit methods for nodes related to object-oriented constructs (classes, methods, etc.); thus it is an abstract class, and is solely used as a basis for deriving other visitors. It is instantiated twice:GenDefaultVisitor<misc::constify_traits>andGenDefaultVisitor<misc::id_traits>.
Implementation of the
GenNonObjectVisitorclass template, which walks the abstract syntax tree, doing nothing, but aborting on nodes related to object-oriented constructs (classes, methods, etc.). This visitor is abstract and is solely used as a basis for deriving other visitors. It is instantiated twice:GenNonObjectVisitor<misc::constify_traits>andGenNonObjectVisitor<misc::id_traits>.
The
PrettyPrinterclass, which pretty-prints an AST back into Tiger concrete syntax.
This class is not needed before TC-4 (see TC-4).
Auxiliary class from which typable AST node classes should derive. It has a simple interface made to manage a pointer to the type of the node:
This class is not needed before TC-4 (see TC-4).
Auxiliary class from which should derive AST nodes that construct a type (e.g.,
ast::ArrayTy). Its interface is similar to that ofast::Typablewith one big difference:ast::TypeConstructoris responsible for de-allocating that type.
This class is needed only for TC-E (see TC-E).
Auxiliary class from which AST node classes that denote the declaration of variables and formal arguments should derive. Its role is to encode a single Boolean value: whether the variable escapes or not. The natural interface includes
escape_getandescape_setmethods.
Namespace ‘bind’. Binding uses to definitions.
The
bind::Bindervisitor. Bind uses to definitions (works on syntax without object).
The
bind::Renamervisitor. Rename every identifier to a unique name (works on syntax without object).
Namespace ‘escapes’. Compute the escaping variables.
Namespace ‘type’. Type checking.
The interface of the Type module. It exports a single procedure,
type_check.
The definitions of all the types. Built-in types (
Int,String,NilandVoid) are defined in src/type/builtin-types.*.
The
type::TypeCheckervisitor. Compute the types of an ast and add type labels to the corresponding nodes (works on syntax without object).
The
object::Bindervisitor. Bind uses to definitions (works on syntax with objects). Inherits frombind::Binder.
The
object::TypeCheckervisitor. Compute the types of an ast and add type labels to the corresponding nodes (works on syntax with objects). Inherits fromtype::TypeChecker.
The
object::Renamervisitor. Rename every identifier to a unique name (works on syntax with objects), and keep a record of the names of the renamed classes. Inherits frombind::Renamer.
The
object::DesugarVisitorvisitor. Transforms an ast with objects into an ast without objects.
Namespace ‘overload’. Overloading function support.
The
desugar::DesugarVisitorvisitor. Remove constructs that can be considered as syntactic sugar using other language constructs. For instance, turnforloops intowhileloops, string comparisons into function calls.
The
desugar::BoundCheckingVisitorvisitor. Add dynamic array bound checks while duplicating anast.
The
desugar::Inlinervisitor. Perform inline expansion of functions.
The
desugar::Prunervisitor. Prune useless function declarations within anast.
Namespace temp, delivered for TC-5.
Provides the class template
Identifierbuilt uponboost::variantand used to implementtemp::Tempandtemp::Label. Also contains the genericIdentifierCompareVisitor, used to compare two identifiers.
Identifierhandles maps ofIdentifiers. For instance, theTempt5might be allocated the register$t2, in which case, when outputtingt5, we should print$t2. Maps stored in the xalloc'd slotIdentifier::mapof streams implements such a correspondence. In addition, theoperator<<of theIdentifierclass template itself "knows" when such a mapping is active, and uses it.
We need labels for
jumps, for functions, strings etc. Implemented as an instantiation of thetemp::Identifierscheme.
So called temporaries are pseudo-registers: we may allocate as many temporaries as we want. Eventually the register allocator will map those temporaries to either an actual register, or it will allocate a slot in the activation block (aka frame) of the current function. Implemented as an instantiation of the
temp::Identifierscheme.
Namespace tree, delivered for TC-5. The
implementation of the intermediate representation. The file
tree/README should give enough explanations to understand how it
works.
Reading the corresponding explanations in Appel's book is mandatory.
It is worth noting that contrary to A. Appel, just as we did for
ast, we use n-ary structures. For instance, where Appel uses a
binary seq, we have an n-ary seq which allows us to put as
many statements as we want.
To avoid gratuitous name clashes, what Appel denotes exp is
denoted sxp (Statement Expression), implemented in
translate::Sxp.
Please, pay extra attention to the fact that there are temp::Temp
used to create unique temporaries (similar to misc::symbol),
and tree::Temp which is the intermediate representation
instruction denoting a temporary (hence a tree::Temp needs a
temp::Temp). Similarly, on the one hand, there is
temp::Label which is used to create unique labels, and on the
other hand there are tree::Label which is the IR statement to
define to a label, and tree::Name used to refer to
a label (typically, a tree::Jump needs a tree::Name which
in turn needs a temp::Label).
It implements
tree::Fragment, an abstract class,tree::DataFragto store the literal strings, andtree::ProcFragto store the routines.
Implementation of
tree::Visitorandtree::ConstVisitorto implement function objects ontree::Fragments. In other words, these visitors implement polymorphic operations ontree::Fragment.
Namespace ‘frame’, delivered for TC-5.
An
Accessis a location of a variable: on the stack, or in a temporary.
Namespace ‘translate’. Translation to intermediate code translation. It includes:
translate::Levelare wrappersframe::Framethat support the static links, so that we can find an access to the variables of the “parent function”.
Implementation of
translate::Ex(expressions),Nx(instructions),Cx(conditions), andIx(if) shells. They wraptree::Treeto delay their translation until the actual use is known.
functions used by the
translate::Translatorto translate the ast into hir. For instance, it contains ‘Exp* simpleVar (const Access& access, const Level& level)’, ‘Exp* callExp (const temp::Label& label, std::list<Exp*> args)’ etc. which are routines that produce some ‘Tree::Exp’. They handle all theunCxetc. magic.
Implements the class ‘Translator’ which performs the IR generation thanks to translation.hh. It must not be polluted with translation details: it is only coordinating the AST traversal with the invocation of translation routines. For instance, here is the translation of a ‘ast::SimpleVar’:
virtual void operator() (const SimpleVar& e) { exp_ = simpleVar (*var_access_[e.def_get ()], *level_); }
Namespace canon.
Namespace assem, delivered for TC-7.
This directory contains the implementation of the Assem language: yet another intermediate representation that aims at encoding an assembly language, plus a few needed features so that register allocation can be performed afterward. Given in full.
Implementation of the basic types of assembly instructions.
Implementation of
assem::Fragment,assem::ProcFrag, andassem::DataFrag. They are very similar totree::Fragment: aggregate some informations that must remain together, such as aframe::Frameand the instructions (a list ofassem::Instr).
Namespace target, delivered for TC-7. Some data on
the back end.
Description of a cpu: everything about its registers, and its word size.
Description of a target (language): its cpu, its assembly (
target::Assembly), and it translator (target::Codegen).
The description of the mips (actually, spim/Nolimips) target.
Description of the i386. This is not part of the project, it is left only as an incomplete source of inspiration.
The instruction selection per se split into a generic part, and a target specific (mips and ia32) part. See src/target/mips, and src/target/ia32.
The abstract class
target::Assembly, the interface for elementary assembly instructions generation.
The abstract class
target::Codegen, the interface for all our back ends.
This is the Tiger runtime, written in C, based on Andrew Appel's runtime.c. The actual runtime.s file for mips was written by hand, but the ia32 was a compiled version of this file. It should be noted that:
- Strings
- Strings are implemented as 4 bytes to encode the length, and then a 0-terminated a` la C string. The length part is due to conformance to the Tiger Reference Manual, which specifies that 0 is a regular character that can be part of the strings, but it is nevertheless terminated by 0 to be compliant with spim/Nolimips's
- Special Strings
- There are some special strings: 0 and 1 character long strings are all implemented via a singleton. That is to say there is only one allocated string ‘""’, a single ‘"1"’ etc. These singletons are allocated by
main. It is essential to preserve this invariant/convention in the whole runtime.strcmpvs.stringEqual- I don't know how Appel wants to support ‘"bar" < "foo"’ since he doesn't provide
strcmp. We do. His implementation of equality is more efficient than ours though, since he can decide just be looking at the lengths. That could be improved in the future...main- The runtime has some initializations to make, such as strings singletons, and then calls the compiled program. This is why the runtime provides
main, and callst_main, which is the “main” that your compiler should provide.
Namespace target::mips, delivered for TC-7. Code
generation for mips R2000.
The Tiger runtime in mips assembly language:
Our assembly language (syntax, opcodes and layout); it abstracts the generation of mips 2000 instructions.
target::mips::SpimAssemblyderives fromtarget::Assembly.
Our real and only back end: a translator from lir to assem using the mips 2000 instruction set defined by
target::mips::SpimAssembly. It is implemented as a maximal munch.target::mips::Codegenderives fromtarget::Codegen.
How mips (and spim/Nolimips) fragments are to be displayed. In other words, that's where the (global) syntax of the target assembly file is selected.
Namespace target::ia32, delivered for TC-7. Code
generation for ia32. This is not part of the student project,
but it is left to satisfy their curiosity. In addition its presence is
a sane invitation to respect the constraints of a multi-back-end
compiler.
The Tiger runtime in ia32 assembly language:
Our assembly language (syntax, opcodes and layout); it abstracts the generation of ia32 instructions using Gas' syntax.
target::ia32::GasAssemblyderives fromtarget::Assembly.
The ia32 back-end: a translator from lir to assem using the ia32 instruction set defined by
target::ia32::GasAssembly. It is implemented as a maximal munch.target::ia32::Codegenderives fromtarget::Codegen.
How ia32 fragments are to be displayed. In other words, that's where the (global) syntax of the target assembly file is selected.
Namespace liveness, delivered for TC-8.
Computing the live-in and live-out information from the
FlowGraph.
Computing the
InterferenceGraphfrom the live-in/live-out information.
Namespace regalloc, register allocation, delivered for
TC-9.
Removing useless
moves once the register allocation performed, and allocating the register for fragments.
We provide a few test cases: you must write your own tests. Writing tests is part of the project. Do not just copy test cases from other groups, as you will not understand why they were written.
The initial test suite is available for download at tests.tgz. It contains the following directories:
The compiler will be written in several steps, described below.
The following sections adhere to a standard layout in order to present each stage n:
This section has been updated for EPITA-2012 on 2009-12-16.
TC-0 is a weak form of TC-1: the scanner and the parser are written, but the framework is simplified (see TC-1 Code to Write). The grammar is also simpler: object-related productions are not to be supported at this stage (see TC-0 Improvements). No command line option is supported.
Things to learn during this stage that you should remember:
lval (aka yylval) to pass token values to the parser.
First, please note that all the samples, including in this section, are generated with a TC-1+ compliant compiler: its behavior differs from that of a TC-0 compiler. In particular, for the time being, forget about the options (-X and --parse).
Running TC-0 basically consists in looking at exit values:
print ("Hello, World!\n")
$ tc simple.tig
The following example demonstrates the scanner and parser tracing. The glyphs “error-->” and “⇒” are typographic conventions to specify respectively the standard error stream and the exit status. They are not part of the output per se.
$ SCAN=1 PARSE=1 tc -X --parse simple.tig
error-->Parsing file: simple.tig
error-->Starting parse
error-->Entering state 0
error-->Reading a token: --(end of buffer or a NUL)
error-->--accepting rule at line 171 ("print")
error-->Next token is token "identifier" (simple.tig:1.1-5: print)
error-->Shifting token "identifier" (simple.tig:1.1-5: print)
error-->Entering state 2
error-->Reading a token: --accepting rule at line 89 (" ")
error-->--accepting rule at line 94 ("(")
error-->Next token is token "(" (simple.tig:1.7: )
error-->Reducing stack 0 by rule 105 (line 683):
error--> $1 = token "identifier" (simple.tig:1.1-5: print)
error-->-> $$ = nterm funid (simple.tig:1.1-5: print)
error-->Entering state 40
error-->Next token is token "(" (simple.tig:1.7: )
error-->Shifting token "(" (simple.tig:1.7: )
error-->Entering state 92
error-->Reading a token: --accepting rule at line 172 (""")
error-->--accepting rule at line 242 ("Hello, World!")
error-->--accepting rule at line 229 ("\n")
error-->--accepting rule at line 204 (""")
error-->Next token is token "string" (simple.tig:1.8-24: Hello, World!
error-->)
error-->Shifting token "string" (simple.tig:1.8-24: Hello, World!
error-->)
error-->Entering state 1
error-->Reducing stack 0 by rule 22 (line 373):
error--> $1 = token "string" (simple.tig:1.8-24: Hello, World!
error-->)
error-->-> $$ = nterm exp (simple.tig:1.8-24: "Hello, World!\n")
error-->Entering state 140
error-->Reading a token: --accepting rule at line 95 (")")
error-->Next token is token ")" (simple.tig:1.25: )
error-->Reducing stack 0 by rule 53 (line 476):
error--> $1 = nterm exp (simple.tig:1.8-24: "Hello, World!\n")
error-->-> $$ = nterm args.1 (simple.tig:1.8-24: "Hello, World!\n")
error-->Entering state 142
error-->Next token is token ")" (simple.tig:1.25: )
error-->Reducing stack 0 by rule 52 (line 471):
error--> $1 = nterm args.1 (simple.tig:1.8-24: "Hello, World!\n")
error-->-> $$ = nterm args (simple.tig:1.8-24: "Hello, World!\n")
error-->Entering state 141
error-->Next token is token ")" (simple.tig:1.25: )
error-->Shifting token ")" (simple.tig:1.25: )
error-->Entering state 185
error-->Reducing stack 0 by rule 24 (line 379):
error--> $1 = nterm funid (simple.tig:1.1-5: print)
error--> $2 = token "(" (simple.tig:1.7: )
error--> $3 = nterm args (simple.tig:1.8-24: "Hello, World!\n")
error--> $4 = token ")" (simple.tig:1.25: )
error-->-> $$ = nterm exp (simple.tig:1.1-25: print ("Hello, World!\n"))
error-->Entering state 27
error-->Reading a token: --(end of buffer or a NUL)
error-->--accepting rule at line 90 ("
error-->")
error-->--(end of buffer or a NUL)
error-->--EOF (start condition 0)
error-->Now at end of input.
error-->Reducing stack 0 by rule 1 (line 313):
error--> $1 = nterm exp (simple.tig:1.1-25: print ("Hello, World!\n"))
error-->-> $$ = nterm program (simple.tig:1.1-25: )
error-->Entering state 26
error-->Now at end of input.
error-->Shifting token "end of file" (simple.tig:2.1: )
error-->Entering state 69
error-->Cleanup: popping token "end of file" (simple.tig:2.1: )
error-->Cleanup: popping nterm program (simple.tig:1.1-25: )
error-->Parsing string: function _main () = (_exp (0); ())
error-->Starting parse
error-->Entering state 0
error-->Reading a token: --accepting rule at line 122 ("function")
error-->Next token is token "function" (:1.1-8: )
error-->Shifting token "function" (:1.1-8: )
error-->Entering state 8
error-->Reading a token: --accepting rule at line 89 (" ")
error-->--accepting rule at line 171 ("_main")
error-->Next token is token "identifier" (:1.10-14: _main)
error-->Shifting token "identifier" (:1.10-14: _main)
error-->Entering state 47
error-->Reading a token: --accepting rule at line 89 (" ")
error-->--accepting rule at line 94 ("(")
error-->Next token is token "(" (:1.16: )
error-->Shifting token "(" (:1.16: )
error-->Entering state 100
error-->Reading a token: --accepting rule at line 95 (")")
error-->Next token is token ")" (:1.17: )
error-->Reducing stack 0 by rule 101 (line 663):
error-->-> $$ = nterm funargs (:1.17-16: )
error-->Entering state 153
error-->Next token is token ")" (:1.17: )
error-->Shifting token ")" (:1.17: )
error-->Entering state 197
error-->Reading a token: --accepting rule at line 89 (" ")
error-->--accepting rule at line 108 ("=")
error-->Next token is token "=" (:1.19: )
error-->Shifting token "=" (:1.19: )
error-->Entering state 227
error-->Reading a token: --accepting rule at line 89 (" ")
error-->--accepting rule at line 94 ("(")
error-->Next token is token "(" (:1.21: )
error-->Shifting token "(" (:1.21: )
error-->Entering state 12
error-->Reading a token: --accepting rule at line 161 ("_exp")
error-->Next token is token "_exp" (:1.22-25: )
error-->Shifting token "_exp" (:1.22-25: )
error-->Entering state 21
error-->Reading a token: --accepting rule at line 89 (" ")
error-->--accepting rule at line 94 ("(")
error-->Next token is token "(" (:1.27: )
error-->Shifting token "(" (:1.27: )
error-->Entering state 64
error-->Reading a token: --accepting rule at line 143 ("0")
error-->Next token is token "integer" (:1.28: 0)
error-->Shifting token "integer" (:1.28: 0)
error-->Entering state 113
error-->Reading a token: --accepting rule at line 95 (")")
error-->Next token is token ")" (:1.29: )
error-->Shifting token ")" (:1.29: )
error-->Entering state 173
error-->Reducing stack 0 by rule 44 (line 445):
error--> $1 = token "_exp" (:1.22-25: )
error--> $2 = token "(" (:1.27: )
error--> $3 = token "integer" (:1.28: 0)
error--> $4 = token ")" (:1.29: )
error-->-> $$ = nterm exp (:1.22-29: print ("Hello, World!\n"))
error-->Entering state 52
error-->Reading a token: --accepting rule at line 104 (";")
error-->Next token is token ";" (:1.30: )
error-->Reducing stack 0 by rule 56 (line 483):
error--> $1 = nterm exp (:1.22-29: print ("Hello, World!\n"))
error-->-> $$ = nterm exps.1 (:1.22-29: print ("Hello, World!\n"))
error-->Entering state 53
error-->Next token is token ";" (:1.30: )
error-->Shifting token ";" (:1.30: )
error-->Entering state 106
error-->Reading a token: --accepting rule at line 89 (" ")
error-->--accepting rule at line 94 ("(")
error-->Next token is token "(" (:1.32: )
error-->Shifting token "(" (:1.32: )
error-->Entering state 12
error-->Reading a token: --accepting rule at line 95 (")")
error-->Next token is token ")" (:1.33: )
error-->Reducing stack 0 by rule 60 (line 495):
error-->-> $$ = nterm exps.0.2 (:1.33-32: )
error-->Entering state 55
error-->Next token is token ")" (:1.33: )
error-->Shifting token ")" (:1.33: )
error-->Entering state 107
error-->Reducing stack 0 by rule 4 (line 322):
error--> $1 = token "(" (:1.32: )
error--> $2 = nterm exps.0.2 (:1.33-32: )
error--> $3 = token ")" (:1.33: )
error-->-> $$ = nterm exp (:1.32-33: ())
error-->Entering state 162
error-->Reading a token: --(end of buffer or a NUL)
error-->--accepting rule at line 95 (")")
error-->Next token is token ")" (:1.34: )
error-->Reducing stack 0 by rule 59 (line 490):
error--> $1 = nterm exps.1 (:1.22-29: print ("Hello, World!\n"))
error--> $2 = token ";" (:1.30: )
error--> $3 = nterm exp (:1.32-33: ())
error-->-> $$ = nterm exps.2 (:1.22-33: print ("Hello, World!\n"), ())
error-->Entering state 54
error-->Reducing stack 0 by rule 61 (line 496):
error--> $1 = nterm exps.2 (:1.22-33: print ("Hello, World!\n"), ())
error-->-> $$ = nterm exps.0.2 (:1.22-33: print ("Hello, World!\n"), ())
error-->Entering state 55
error-->Next token is token ")" (:1.34: )
error-->Shifting token ")" (:1.34: )
error-->Entering state 107
error-->Reducing stack 0 by rule 4 (line 322):
error--> $1 = token "(" (:1.21: )
error--> $2 = nterm exps.0.2 (:1.22-33: print ("Hello, World!\n"), ())
error--> $3 = token ")" (:1.34: )
error-->-> $$ = nterm exp (:1.21-34: (
error--> print ("Hello, World!\n");
error--> ()
error-->))
error-->Entering state 244
error-->Reading a token: --(end of buffer or a NUL)
error-->--EOF (start condition 0)
error-->Now at end of input.
error-->Reducing stack 0 by rule 97 (line 653):
error--> $1 = token "function" (:1.1-8: )
error--> $2 = token "identifier" (:1.10-14: _main)
error--> $3 = token "(" (:1.16: )
error--> $4 = nterm funargs (:1.17-16: )
error--> $5 = token ")" (:1.17: )
error--> $6 = token "=" (:1.19: )
error--> $7 = nterm exp (:1.21-34: (
error--> print ("Hello, World!\n");
error--> ()
error-->))
error-->-> $$ = nterm fundec (:1.1-34:
error-->function _main () =
error--> (
error--> print ("Hello, World!\n");
error--> ()
error--> ))
error-->Entering state 39
error-->Now at end of input.
error-->Reducing stack 0 by rule 94 (line 646):
error--> $1 = nterm fundec (:1.1-34:
error-->function _main () =
error--> (
error--> print ("Hello, World!\n");
error--> ()
error--> ))
error-->-> $$ = nterm fundecs (:1.1-34:
error-->function _main () =
error--> (
error--> print ("Hello, World!\n");
error--> ()
error--> ))
error-->Entering state 38
error-->Now at end of input.
error-->Reducing stack 0 by rule 20 (line 366):
error-->-> $$ = nterm decs (:1.35-34: )
error-->Entering state 90
error-->Reducing stack 0 by rule 16 (line 359):
error--> $1 = nterm fundecs (:1.1-34:
error-->function _main () =
error--> (
error--> print ("Hello, World!\n");
error--> ()
error--> ))
error--> $2 = nterm decs (:1.35-34: )
error-->-> $$ = nterm decs (:1.1-34:
error-->function _main () =
error--> (
error--> print ("Hello, World!\n");
error--> ()
error--> ))
error-->Entering state 28
error-->Reducing stack 0 by rule 2 (line 315):
error--> $1 = nterm decs (:1.1-34:
error-->function _main () =
error--> (
error--> print ("Hello, World!\n");
error--> ()
error--> ))
error-->-> $$ = nterm program (:1.1-34: )
error-->Entering state 26
error-->Now at end of input.
error-->Shifting token "end of file" (:1.35-34: )
error-->Entering state 69
error-->Cleanup: popping token "end of file" (:1.35-34: )
error-->Cleanup: popping nterm program (:1.1-34: )
A lexical error must be properly diagnosed and reported. The following (generated) examples display the location: this is not required for TC-0; nevertheless, an error message on the standard error output is required.
"\z does not exist."
$ tc -X --parse back-zee.tig
error-->back-zee.tig:1.1-3: unrecognized escape: \z
⇒2
Similarly for syntactical errors.
a++
$ tc -X --parse postinc.tig
error-->postinc.tig:1.3: syntax error, unexpected +
⇒3
We don't need several directories, you can program in the top level of the package.
You must write:
lval supports strings, integers and even symbols. Nevertheless,
symbols (i.e., identifiers) are returned as plain C++ strings for the
time being: the class misc::symbol is introduced in
TC-1.
If the environment variable SCAN is defined (to whatever value)
Flex scanner debugging traces are enabled, i.e., set the variable
yy_flex_debug to 1.
main if you wish. Bison advanced features
will be used in TC-1.
yydebug to 1, run:
PARSE=1 ./tc foo.tig
%printer to improve the tracing of semantic values. For
instance,
%union
{
int ival;
}
%token <ival> INT "integer"
%printer { fprintf (stderr, "%d", $$); } "integer"
main, in this file. Putting it
into parsetiger.yy is OK in TC-0 as it is reduced
to its simplest form with no option support. Of course the exit status
must conform to the standard (see Errors).
The requirements on the tarball are the same as usual, see Tarballs.
"\n" can produce a token STRING
with the semantic value \n (translation) or \\n (no
translation). You are free to choose your favorite implementation, but
keep in mind that if you translate, you'll have to “untranslate”
later (i.e., convert \n back to \\n).
We encourage you to do this translation, but the other solution is also correct, as long as the next steps of your compiler follow the same conventions as your input.
You must check for bad escapes whatever solution you choose.
Possible improvements include:
%destructor%destructor to reclaim the memory lost during the
error recovery. It is mandated in TC-2, see TC-2 FAQ.
Object-related productions from the Tiger grammar are:
# Class definition (canonical form).
ty ::= class [ extends type-id ] { classfields }
# Class definition (alternative form).
dec ::= class id [ extends type-id ] { classfields }
classfields ::= { classfield }
# Class fields.
classfield ::=
# Attribute declaration.
vardec
# Method declaration.
| method id ( tyfields ) [ : type-id ] = exp
# Object creation.
exp ::= new type-id
# Method call.
exp ::= lvalue . id ( [ exp { , exp }] )
This section has been updated for EPITA-2012 on 2010-01-11.
Scanner and parser are properly running, but the abstract syntax tree is not built yet. Differences with TC-0 include:
Relevant lecture notes include dev-tools.pdf and scanner.pdf.
Things to learn during this stage that you should remember:
Location and Position provide a good start to
study foreign C++ classes. Your understanding them will be controlled,
including the ‘operator’s.
misc::symbol and misc::unique is incomplete.
std::setmisc::unique class relies on
std::set.
misc::unique class is an implementation of the Flyweight
design pattern.
The only information the compiler provides is about lexical and syntax errors. If there are no errors, the compiler shuts up, and exits successfully:
/* An array type and an array variable. */
let
type arrtype = array of int
var arr1 : arrtype := arrtype [10] of 0
in
arr1[2]
end
$ tc -X --parse test01.tig
If there are lexical errors, the exit status is 2, and an error message is output on the standard error output. Its format is standard and mandatory: file, (precise) location, and then the message (see Errors).
1
/* This comments starts at /* 2.2 */
$ tc -X --parse unterminated-comment.tig
error-->unterminated-comment.tig:2.2-3.1: unexpected end of file in a comment
⇒2
If there are syntax errors, the exit status is set to 3:
let var a : nil := ()
in
1
end
$ tc -X --parse type-nil.tig
error-->type-nil.tig:1.13-15: syntax error, unexpected nil, expecting identifier or _namety
⇒3
If there are errors which are non lexical, nor syntactic (Windows will not pass by me):
$ tc C:/TIGER/SAMPLE.TIG
error-->/home/levill_r/src/tc/_build/src/.libs/lt-tc: cannot open `C:/TIGER/SAMPLE.TIG': No such file or directory
⇒1
The option --parse-trace, which relies on Bison's %debug
and %printer directives, must work properly5:
a + "a"
$ tc -X --parse-trace --parse a+a.tig
error-->Parsing file: a+a.tig
error-->Starting parse
error-->Entering state 0
error-->Reading a token: Next token is token "identifier" (a+a.tig:1.1: a)
error-->Shifting token "identifier" (a+a.tig:1.1: a)
error-->Entering state 2
error-->Reading a token: Next token is token "+" (a+a.tig:1.3: )
error-->Reducing stack 0 by rule 91 (line 621):
error--> $1 = token "identifier" (a+a.tig:1.1: a)
error-->-> $$ = nterm varid (a+a.tig:1.1: a)
error-->Entering state 35
error-->Reducing stack 0 by rule 45 (line 450):
error--> $1 = nterm varid (a+a.tig:1.1: a)
error-->-> $$ = nterm lvalue (a+a.tig:1.1: a)
error-->Entering state 29
error-->Next token is token "+" (a+a.tig:1.3: )
error-->Reducing stack 0 by rule 42 (line 443):
error--> $1 = nterm lvalue (a+a.tig:1.1: a)
error-->-> $$ = nterm exp (a+a.tig:1.1: a)
error-->Entering state 27
error-->Next token is token "+" (a+a.tig:1.3: )
error-->Shifting token "+" (a+a.tig:1.3: )
error-->Entering state 80
error-->Reading a token: Next token is token "string" (a+a.tig:1.5-7: a)
error-->Shifting token "string" (a+a.tig:1.5-7: a)
error-->Entering state 1
error-->Reducing stack 0 by rule 22 (line 373):
error--> $1 = token "string" (a+a.tig:1.5-7: a)
error-->-> $$ = nterm exp (a+a.tig:1.5-7: "a")
error-->Entering state 128
error-->Reading a token: Now at end of input.
error-->Reducing stack 0 by rule 36 (line 420):
error--> $1 = nterm exp (a+a.tig:1.1: a)
error--> $2 = token "+" (a+a.tig:1.3: )
error--> $3 = nterm exp (a+a.tig:1.5-7: "a")
error-->-> $$ = nterm exp (a+a.tig:1.1-7: (a + "a"))
error-->Entering state 27
error-->Now at end of input.
error-->Reducing stack 0 by rule 1 (line 313):
error--> $1 = nterm exp (a+a.tig:1.1-7: (a + "a"))
error-->-> $$ = nterm program (a+a.tig:1.1-7: )
error-->Entering state 26
error-->Now at end of input.
error-->Shifting token "end of file" (a+a.tig:2.1: )
error-->Entering state 69
error-->Cleanup: popping token "end of file" (a+a.tig:2.1: )
error-->Cleanup: popping nterm program (a+a.tig:1.1-7: )
error-->Parsing string: function _main () = (_exp (0); ())
error-->Starting parse
error-->Entering state 0
error-->Reading a token: Next token is token "function" (:1.1-8: )
error-->Shifting token "function" (:1.1-8: )
error-->Entering state 8
error-->Reading a token: Next token is token "identifier" (:1.10-14: _main)
error-->Shifting token "identifier" (:1.10-14: _main)
error-->Entering state 47
error-->Reading a token: Next token is token "(" (:1.16: )
error-->Shifting token "(" (:1.16: )
error-->Entering state 100
error-->Reading a token: Next token is token ")" (:1.17: )
error-->Reducing stack 0 by rule 101 (line 663):
error-->-> $$ = nterm funargs (:1.17-16: )
error-->Entering state 153
error-->Next token is token ")" (:1.17: )
error-->Shifting token ")" (:1.17: )
error-->Entering state 197
error-->Reading a token: Next token is token "=" (:1.19: )
error-->Shifting token "=" (:1.19: )
error-->Entering state 227
error-->Reading a token: Next token is token "(" (:1.21: )
error-->Shifting token "(" (:1.21: )
error-->Entering state 12
error-->Reading a token: Next token is token "_exp" (:1.22-25: )
error-->Shifting token "_exp" (:1.22-25: )
error-->Entering state 21
error-->Reading a token: Next token is token "(" (:1.27: )
error-->Shifting token "(" (:1.27: )
error-->Entering state 64
error-->Reading a token: Next token is token "integer" (:1.28: 0)
error-->Shifting token "integer" (:1.28: 0)
error-->Entering state 113
error-->Reading a token: Next token is token ")" (:1.29: )
error-->Shifting token ")" (:1.29: )
error-->Entering state 173
error-->Reducing stack 0 by rule 44 (line 445):
error--> $1 = token "_exp" (:1.22-25: )
error--> $2 = token "(" (:1.27: )
error--> $3 = token "integer" (:1.28: 0)
error--> $4 = token ")" (:1.29: )
error-->-> $$ = nterm exp (:1.22-29: (a + "a"))
error-->Entering state 52
error-->Reading a token: Next token is token ";" (:1.30: )
error-->Reducing stack 0 by rule 56 (line 483):
error--> $1 = nterm exp (:1.22-29: (a + "a"))
error-->-> $$ = nterm exps.1 (:1.22-29: (a + "a"))
error-->Entering state 53
error-->Next token is token ";" (:1.30: )
error-->Shifting token ";" (:1.30: )
error-->Entering state 106
error-->Reading a token: Next token is token "(" (:1.32: )
error-->Shifting token "(" (:1.32: )
error-->Entering state 12
error-->Reading a token: Next token is token ")" (:1.33: )
error-->Reducing stack 0 by rule 60 (line 495):
error-->-> $$ = nterm exps.0.2 (:1.33-32: )
error-->Entering state 55
error-->Next token is token ")" (:1.33: )
error-->Shifting token ")" (:1.33: )
error-->Entering state 107
error-->Reducing stack 0 by rule 4 (line 322):
error--> $1 = token "(" (:1.32: )
error--> $2 = nterm exps.0.2 (:1.33-32: )
error--> $3 = token ")" (:1.33: )
error-->-> $$ = nterm exp (:1.32-33: ())
error-->Entering state 162
error-->Reading a token: Next token is token ")" (:1.34: )
error-->Reducing stack 0 by rule 59 (line 490):
error--> $1 = nterm exps.1 (:1.22-29: (a + "a"))
error--> $2 = token ";" (:1.30: )
error--> $3 = nterm exp (:1.32-33: ())
error-->-> $$ = nterm exps.2 (:1.22-33: (a + "a"), ())
error-->Entering state 54
error-->Reducing stack 0 by rule 61 (line 496):
error--> $1 = nterm exps.2 (:1.22-33: (a + "a"), ())
error-->-> $$ = nterm exps.0.2 (:1.22-33: (a + "a"), ())
error-->Entering state 55
error-->Next token is token ")" (:1.34: )
error-->Shifting token ")" (:1.34: )
error-->Entering state 107
error-->Reducing stack 0 by rule 4 (line 322):
error--> $1 = token "(" (:1.21: )
error--> $2 = nterm exps.0.2 (:1.22-33: (a + "a"), ())
error--> $3 = token ")" (:1.34: )
error-->-> $$ = nterm exp (:1.21-34: (
error--> (a + "a");
error--> ()
error-->))
error-->Entering state 244
error-->Reading a token: Now at end of input.
error-->Reducing stack 0 by rule 97 (line 653):
error--> $1 = token "function" (:1.1-8: )
error--> $2 = token "identifier" (:1.10-14: _main)
error--> $3 = token "(" (:1.16: )
error--> $4 = nterm funargs (:1.17-16: )
error--> $5 = token ")" (:1.17: )
error--> $6 = token "=" (:1.19: )
error--> $7 = nterm exp (:1.21-34: (
error--> (a + "a");
error--> ()
error-->))
error-->-> $$ = nterm fundec (:1.1-34:
error-->function _main () =
error--> (
error--> (a + "a");
error--> ()
error--> ))
error-->Entering state 39
error-->Now at end of input.
error-->Reducing stack 0 by rule 94 (line 646):
error--> $1 = nterm fundec (:1.1-34:
error-->function _main () =
error--> (
error--> (a + "a");
error--> ()
error--> ))
error-->-> $$ = nterm fundecs (:1.1-34:
error-->function _main () =
error--> (
error--> (a + "a");
error--> ()
error--> ))
error-->Entering state 38
error-->Now at end of input.
error-->Reducing stack 0 by rule 20 (line 366):
error-->-> $$ = nterm decs (:1.35-34: )
error-->Entering state 90
error-->Reducing stack 0 by rule 16 (line 359):
error--> $1 = nterm fundecs (:1.1-34:
error-->function _main () =
error--> (
error--> (a + "a");
error--> ()
error--> ))
error--> $2 = nterm decs (:1.35-34: )
error-->-> $$ = nterm decs (:1.1-34:
error-->function _main () =
error--> (
error--> (a + "a");
error--> ()
error--> ))
error-->Entering state 28
error-->Reducing stack 0 by rule 2 (line 315):
error--> $1 = nterm decs (:1.1-34:
error-->function _main () =
error--> (
error--> (a + "a");
error--> ()
error--> ))
error-->-> $$ = nterm program (:1.1-34: )
error-->Entering state 26
error-->Now at end of input.
error-->Shifting token "end of file" (:1.35-34: )
error-->Entering state 69
error-->Cleanup: popping token "end of file" (:1.35-34: )
error-->Cleanup: popping nterm program (:1.1-34: )
Note that (i), --parse is needed, (ii), it cannot see that the variable is not declared nor that there is a type checking error, since type checking... is not implemented, and (iii), the output might be slightly different, depending upon the version of Bison you use. But what matters is that one can see the items: ‘"identifier" a’, ‘"string" a’.
Some code is provided: 2012-tc-1.0.tar.bz2, 2012-tc-1.1.tar.bz2. The transition between the tarballs can be done thanks to the following diff: 2012-tc-1.0-1.1.diff. See The Top Level, src, src/parse, lib/misc.
Be sure to read Flex and Bison documentations and tutorials, see Flex & Bison.
make check.
std::string. See the following
code for the basics.
...
\" lval->str = new std::string (); BEGIN SC_STRING;
<SC_STRING>{ /* Handling of the strings. Initial " is eaten. */
\" {
BEGIN INITIAL;
return token::STRING;
}
...
\\x[0-9a-fA-F]{2} {
lval->str->append (1, strtol (yytext + 2, 0, 16));
}
...
}
misc::symbol
objects, not strings.
Location to use is produced
by Bison: src/parse/location.hh.
To track of locations, adjust your scanner, use YY_USER_ACTION
and the yylex prologue:
...
%%
%{
// Everything here is run each time yylex is invoked.
%}
"if" return token::IF;
...
%%
...
See the lecture notes, and read the C++ chapter of GNU Bison's documentation. Pay special attention to its “Complete C++ Example” which is very much like our set up.
%expect 0.
%printer to implement --parse-trace
support (see TC-1 Samples). Pay special attention to the display
of strings and identifiers.
%destructor to reclaim the memory bound to symbols thrown
away during error recovery.
TigerParser drives the lexing and parsing of input file.
Its implementation in src/parse/tiger-parser.cc is incomplete.
misc::symbol keeps a single copy of identifiers, see
lib/misc. Its implementation in lib/misc/symbol.hxx and
lib/misc/symbol.cc is incomplete. Note that running ‘make
check’ in lib/misc exercises lib/misc/test-symbol.cc:
having this unit test pass should be a goal by itself. As a matter of
fact, unit tests were left to help you: once they pass successfully you
may proceed to the rest of the compiler. misc::symbol's
implementation is based on misc::unique, a generic class
implementing the Flyweight design pattern. The definition of this
class, lib/misc/unique.hxx, is also to be completed.
"string", but none to exp, then it
will choke on:
exp: "string";
because it actually means
exp: "string" { $$ = $1; };
which is not type coherent. So write this instead:
exp: "string" {};
ast::Exp?PKGDATADIR is set to this directory. Its value depends on the
use of configure's option --prefix, defaulting to
/usr/local.
TC_PKGDATADIRPKGDATADIR.
Possible improvements include:
This section has been updated for EPITA-2012 on 2010-02-01.
At the end of this stage, the compiler can build abstract syntax trees of Tiger programs and pretty-print them. The parser is now a GLR parser and equiped with error recovery. The memory is properly deallocated on demand.
The code must follow our coding style and be documented, see Coding Style, and Doxygen.
Relevant lecture notes include dev-tools.pdf, ast.pdf.
Things to learn during this stage that you should remember:
error token, and building usable asts in spite
of lexical/syntax errors.
std::list, misc::symbol uses
std::set.
accept.
virtualmisc::indentmisc::indent extends std::ostream with indentation
features. Use it in the PrettyPrinter to pretty-print.
Understanding how misc::indent will be checked later, see
TC-3 Goals.
PrettyPrinter is an implementation of the Visitor
pattern.
Here are a few samples of the expected features.
The parser builds abstract syntax trees that can be output by a pretty-printing module:
/* Define a recursive function. */
let
/* Calculate n!. */
function fact (n : int) : int =
if n = 0
then 1
else n * fact (n - 1)
in
fact (10)
end
$ tc -XA simple-fact.tig
/* == Abstract Syntax Tree. == */
function _main () =
(
let
function fact (n : int) : int =
(if (n = 0)
then 1
else (n * fact ((n - 1))))
in
fact (10)
end;
()
)
Passing -D, --ast-delete, reclaims the memory associated to the ast. Valgrind will be used to hunt memory leaks, see Valgrind.
No heroic effort is asked for silly options combinations.
$ tc -D simple-fact.tig
$ tc -DA simple-fact.tig
error-->../../src/ast/tasks.cc:24: Precondition `the_program' failed.
error-->/bin/sh: line 1: 1860 Aborted tc -DA simple-fact.tig
⇒134
The pretty-printed output must be valid and equivalent.
Valid means that any Tiger compiler must be able to parse with success your output. Pay attention to the banners such as ‘== Abstract...’: you should use comments: ‘/* == Abstract... */’. Pay attention to special characters too.
print ("\"\x45\x50ITA\"\n")
$ tc -XAD string-escapes.tig
/* == Abstract Syntax Tree. == */
function _main () =
(
print ("\"EPITA\"\n");
()
)
Equivalent means that, except for syntactic sugar, the output and the input are equal. Syntactic sugar refers to ‘&’, ‘|’, unary ‘-’, etc.
1 = 1 & 2 = 2
$ tc -XAD 1s-and-2s.tig
/* == Abstract Syntax Tree. == */
function _main () =
(
(if (1 = 1)
then ((2 = 2) <> 0)
else 0);
()
)
$ tc -XAD 1s-and-2s.tig >output.tig
$ tc -XAD output.tig
/* == Abstract Syntax Tree. == */
function _main () =
(
(if (1 = 1)
then ((2 = 2) <> 0)
else 0);
()
)
Beware that for loops are encoded using a ast::VarDec: do
not display the ‘var’:
for i := 0 to 100 do
(print_int (i))
$ tc -XAD for-loop.tig
/* == Abstract Syntax Tree. == */
function _main () =
(
(for i := 0 to 100 do
print_int (i));
()
)
Parentheses must not stack for free; you must even remove them as the following example demonstrates.
((((((((((0))))))))))
$ tc -XAD parens.tig
/* == Abstract Syntax Tree. == */
function _main () =
(
0;
()
)
This is not a pretty-printer trick: the asts of this program and
that of ‘0’ are exactly the same: a single ast::IntExp.
As a result, anything output by ‘tc -AD’ is equal to what ‘tc -AD | tc -XAD -’ displays!
The type checking rules of Tiger, or rather its binding rules, justify the contrived parsing of declarations. This is why this section uses -b/--bindings-compute, implemented later (see TC-3).
In Tiger, to support recursive types and functions, continuous
declarations of functions and continuous declarations of types are
considered “simultaneously”. For instance in the following program,
foo and bar are visible in each other's scope, and
therefore the following program is correct wrt type checking.
let function foo () : int = bar ()
function bar () : int = foo ()
in
0
end
$ tc -b foo-bar.tig
In the following sample, because bar is not declared in the same
bunch of declarations, it is not visible during the declaration of
foo. The program is invalid.
let function foo () : int = bar ()
var stop := 0
function bar () : int = foo ()
in
0
end
$ tc -b foo-stop-bar.tig
error-->foo-stop-bar.tig:1.29-34: undeclared function: bar
⇒4
The same applies to types.
We shall name chunk a continuous series of type (or function) declaration.
A single name cannot be defined more than once in a chunk.
let function foo () : int = 0
function bar () : int = 1
function foo () : int = 2
var stop := 0
function bar () : int = 3
in
0
end
$ tc -b fbfsb.tig
error-->fbfsb.tig:3.5-29: redefinition: foo
error-->fbfsb.tig:1.5-29: first definition
⇒4
It behaves exactly as if chunks were part of embedded let in end,
i.e., as if the previous program was syntactic sugar for the following
one (in fact, in 2006-tc used to desugar it that way).
let
function foo () : int = 0
function bar () : int = 1
in
let
function foo () : int = 2
in
let
var stop := 0
in
let
function bar () : int = 3
in
0
end
end
end
end
Given the type checking rules for variables, whose definitions cannot be recursive, chunks of variable declarations are reduced to a single variable.
Your parser must be robust to (some) syntactic errors. Observe that on the following input several parse errors are reported, not merely the first one:
(
1;
(2, 3);
(4, 5);
6
)
$ tc multiple-parse-errors.tig
error-->multiple-parse-errors.tig:3.5: syntax error, unexpected ",", expecting ;
error-->multiple-parse-errors.tig:4.5: syntax error, unexpected ",", expecting ;
⇒3
Of course, the exit status still reveals the parse error. Error recovery must not break the rest of the compiler.
$ tc -XAD multiple-parse-errors.tig
error-->multiple-parse-errors.tig:3.5: syntax error, unexpected ",", expecting ;
error-->multiple-parse-errors.tig:4.5: syntax error, unexpected ",", expecting ;
/* == Abstract Syntax Tree. == */
function _main () =
(
(
1;
();
();
6
);
()
)
⇒3
Code is provided: 2012-tc-2.0.tar.bz2. The transition from the previous versions can be done thanks to the following diffs: 2012-tc-1.0-2.0.diff, 2012-tc-1.1-2.0.diff.
For a description of the new modules, see lib/misc, and src/ast.
What is to be done:
error. Read the
Bison documentation about it.
glr.cc, use the %glr-parser
directive. Thanks to GLR, conflicts (S/R and/or R/R) can be accepted.
Use %expect and %expect-rr to specify their number. For
information, we have no R/R conflicts, and two S/R: one related to the
“big lvalue” issue, and the other to the implementation of the two
_cast operators (see Additional Syntactic Specifications).
ast::FunctionDecs, ast::VarDecs, and ast::TypeDecs;
they are implemented thanks to ast::AnyDecs).
GenDefaultVisitor class template. It is the basis
for following visitors in the Tiger compiler.
PrettyPrinter class must be written entirely. It must use the
misc::xalloc features to support indentation.
NameTy, or a Symbolstack used
by the parser. This file is no longer generated nor used when the C++
GLR skeleton is used, which shall be the case starting from
TC-2 (see TC-2 Code to Write). As you must write and
maintain an LALR(1) parser during the TC-0 and
TC-1 stages, the code given at TC-1
(see TC-1 Given Code) distributes the file
src/parse/stack.hh. To avoid distribution issues at TC-2 with
the GLR parser, you have to adjust the list of distributed files in
src/parse/Makefile.am to ignore src/parse/stack.hh (see
the variable FROM_PARSETIGER_YY).
%destructor
directive:
%union
{
std::string *str;
}
%token <str> STRING "string"
%destructor { delete $$; } "string"
misc::errorTigerParser aggregates the local error handler.
From scan_open, for instance, your code should look like:
if (!yyin)
error_ << misc::error::failure
<< program_name << ": cannot open `" << name << "': "
<< strerror (errno) << std::endl
<< &misc::error::exit;
ast::fields_type vs. ast::VarDecs # Function declaration
<dec> ::= <fun-id> "(" <tyfields> ")" [ ":" <type-id> ] "=" <exp>
# Record declaration
<ty> ::= "{" <tyfields> "}"
# List of ``id : type''
<tyfields> ::= [ <id> ":" <type-id> { "," <id> ":" <type-id> } ]
This grammar snippet shows that we used tyfields twice, in two
very different contexts: a list of formal arguments of a
function, and a list of record fields. The fact that the syntax is
similar in both cases is an “accident”: it is by no means required by
the language. A. Appel could have chosen to make them different, but
what would have been the point then? It does make sense, sometimes, to
make two different things look alike, that's a form of economy — a
sane engineering principle.
If the concrete syntaxes were chosen to be identical, should it be the case for abstract too? I would say it depends: the innert data is definitely the same, but the behaviors (i.e., the handling in the various visitors) are very different. So if your language features “innert data”, say C or ML, then keeping the same abstract syntax makes sense; if your language features “active data” — let's call this... objects — then it is a mistake. Sadly enough, the first edition of Red Tiger book made this mistake, and we also did it for years.
The second edition of the Tiger in Java introduces a dedicated
abstract syntax for formal arguments; we made a different choice: there
is little difference between formal arguments and local variables, so we
use a VarDecs, which fits nicely with the semantics of chunks.
Of course this means that you will have to duplicate your parsing
of the tyfields non-terminal in your parser.
ast::DefaultVisitor and ast::NonObjectVisitorast::NonObjectVisitor is the result of a
reasonable compromise between (relative) safety and complexity.
The problem is: as object-aware programs are to be desugared into object-free ones, (a part of) our front-end infrastructure must support two kinds of traversals:
ast::PrettyPrinter,
object::Binder, object::TypeChecker,
object::DesugarVisitor.
bind::Binder,
type::TypeChecker, and all other ast visitors.
The first category has visit methods for all type of nodes of our (object-oriented) ast, so they raise no issue. On the other hand, the second category of visitors knows nothing about objects, and should either be unable to visit ast w/ objects (static solution) or raise an error if they encounter objects (dynamic solution).
Which led us to several solutions:
accept methods) must be duplicated,
too.
ast::NonObjectVisitor. That is the solution we chose.
Solutions 2 and 3 let us provide a default visitor for asts without objects, but it's harder to have a meaningful default visitor for asts with objects: indeed, concrete visitors on asts w/ objects inherit from their non-object counterparts, where methods visiting object nodes are already defined! (Though they abort at run time.)
We have found that having two visitors (ast::DefaultVisitor and
ast::NonObjectVisitor) to solve this problem was more elegant,
rather than merging both of them in ast::DefaultVisitor. The
pros are that ast::DefaultVisitor remains a default visitor; the
cons are that this visitor is now abstract, since object-related nodes
have no visit implementation. Of course, we could derive an
ast::DefaultObjectVisitor from ast::DefaultVisitor to add
the missing methods, but as we said earlier, it would probably be
useless.
We might reconsider this design in the future.
Possible improvements include:
| and &
operators and the unary minus operator are desugared in abstract
syntax (i.e., using explicit instantiations of AST nodes). Using
TigerInput, you can desugar using Tiger's concrete syntax
instead. This second solution is advised.
Error classWhile this behavior is compliant with the assignment, you may improve
this by introducing an Error class (one?), which will never
trigger type checking errors.
The treecc program is designed to assist in the development of compilers and other language-based tools. It manages the generation of code to handle abstract syntax trees and operations upon the trees.
Treecc, the Tree (= ast) Compiler Compiler, brings aspect
programming to compiler writers. Using Treecc completely changes the
implementation of Tiger; for instance, there are no longer Visitors,
since aspect-orientation makes them useless. If you are interested with
this option, (i) read the
“Treecc: An Aspect-Oriented Approach to Writing Compilers” paper, (ii) contact
Akim. This will be accepted only for groups with good C++
fluency.
This section has been updated for EPITA-2012 on 2010-02-22.
At the end of this stage, the compiler must be able to compute and display the bindings. These features are triggered by the options -b/--bindings-compute and -B/--bindings-display.
Relevant lecture notes include: names.pdf.
Things to learn during this stage that you should remember:
Task module is based on the Command design pattern.
misc::scoped_map.
super_type and qualified method invocation to factor common
code.
std::ios::xalloc, std::stream::iword,
and std::stream::pword (see
ios_base documentation by Cplusplus Ressources). Indented output can use it
directly in operator<<, see lib/misc/indent.* and
lib/misc/test-indent.cc. More generally, if have to resort to
using print because you need additional arguments than the sole
stream, consider using this feature instead.
Use this feature so that the PrettyPrinter can be told from
the std::ostream whether escapes and bindings should be displayed.
Binding is relating a name use to its definition.
let
var me := 0
in
me
end
$ tc -XbBA me.tig
/* == Abstract Syntax Tree. == */
function _main /* 0x8d980a8 */ () =
(
let
var me /* 0x8d92e58 */ := 0
in
me /* 0x8d92e58 */
end;
()
)
This is harder when there are several occurrences of the same name. Note that primitive types are accepted, but have no pre-declaration, contrary to primitive functions.
let
var me := 0
function id (me : int) : int = me
in
me
end
$ tc -XbBA meme.tig
/* == Abstract Syntax Tree. == */
function _main /* 0x8a5b0a8 */ () =
(
let
var me /* 0x8a55e70 */ := 0
function id /* 0x8a5efe0 */ (me /* 0x8a5eed8 */ : int /* 0 */) : int /* 0 */ =
me /* 0x8a5eed8 */
in
me /* 0x8a55e70 */
end;
()
)
TC-3 is in charge of incorrect uses of the names, such as undefined names,
me
$ tc -bBA nome.tig
error-->nome.tig:1.1-2: undeclared variable: me
⇒4
or redefined names.
let
type me = {}
type me = {}
function twice (a: int, a: int) : int = a + a
in
me {} = me {}
end
$ tc -bBA tome.tig
error-->tome.tig:3.3-14: redefinition: me
error-->tome.tig:2.3-14: first definition
error-->tome.tig:4.26-32: redefinition: a
error-->tome.tig:4.19-24: first definition
⇒4
In addition to binding names, --bindings-compute is also in charge of binding the
break to their corresponding loop construct.
break
$ tc -b break.tig
error-->break.tig:1.1-5: `break' outside any loop
⇒4
Embedded loops show that there is scoping for breaks. Beware
that there are places, apparently inside loops, where breaks make
no sense too.
Although it is a matter of definitions and uses of names, record members are not bound here, because it is easier to implement during type checking. Nevertheless, for future classes, TC-3 might also be in charge of binding field uses to record definitions.
let
type box = {value : int}
var box := box { value = 51 }
in
box.head
end
$ tc -XbBA box.tig
/* == Abstract Syntax Tree. == */
function _main /* 0x9b9a0d0 */ () =
(
let
type box /* 0x9b9de70 */ = { value : int /* 0 */ }
var box /* 0x9b9e000 */ := box /* 0x9b9de70 */ { value = 51 }
in
box /* 0x9b9e000 */.head
end;
()
)
$ tc -T box.tig
error-->box.tig:5.3-10: invalid field: head
⇒5
Likewise, class members (both attributes and methods) are not to be bound at TC-3, but at the type-checking stage (see TC-4).
let
type C = class {}
var c := new C
in
c.missing_method ();
c.missing_attribute
end
$ tc -X --object-bindings-compute -BA bad-member-bindings.tig
/* == Abstract Syntax Tree. == */
function _main /* 0x8182158 */ () =
(
let
type C /* 0x8185ed0 */ =
class extends Object /* 0 */
{
}
var c /* 0x8186020 */ := new C /* 0x8185ed0 */
in
(
c /* 0x8186020 */.missing_method /* 0 */ ();
c /* 0x8186020 */.missing_attribute
)
end;
()
)
$ tc --object-types-compute bad-member-bindings.tig
error-->bad-member-bindings.tig:5.3-21: unknown method: missing_method
error-->bad-member-bindings.tig:6.3-21: unknown attribute: missing_attribute
⇒5
Concerning the super class type, the compiler should just check that this type exists in the environment at TC-3. Other checks are left to TC-4 (see TC-4 Samples).
let
/* Super class doesn't exist. */
class Z extends Ghost {}
in
end
$ tc -X --object-bindings-compute -BA missing-super-class.tig
error-->missing-super-class.tig:3.19-23: undeclared type: Ghost
⇒4
Code is provided: 2012-tc-3.0.tar.bz2. The transition from the previous versions can be done thanks to the following diff: 2012-tc-2.0-3.0.diff.
misc::scoped_map<Key, Data>misc::scoped_map in
lib/misc/scoped-map.hh and
lib/misc/scoped-map.hxx. See lib/misc, See scoped_map,
for more details.
astCallExp, with
def_, def_get, and def_set to be able to set a
reference to their definition, here a FunctionDec.
ast::PrettyPrinterPrettyPrinter. Be sure to display the addresses exactly as
displayed in this document: immediately after the identifier.
bind::Binderobject::Binderobject::Binder inherits from bind::Binder so
as to factor common parts.
Possible improvements include:
ast module, several classes need to be changed to be
“bindable”, i.e., to have new data and function members to set, store,
and retrieve their associated definition. Instead of changing several
classes in a very similar fashion, introduce a Bindable template
class and derive from its instantiation.
This section has been updated for EPITA-2012 on 2010-02-22.
At the end of this stage, when given the option --rename, the compiler produces an AST such that no identifier is defined twice.
Relevant lecture notes include: names.pdf.
Note that the transformation does not apply to field names.
let
type a = { a: int }
function a (a: a): a = a{ a = a + a }
var a : a := a (1, 2)
in
a.a
end
$ tc -X --rename -A as.tig
/* == Abstract Syntax Tree. == */
function _main () =
(
let
type a_0 = { a : int }
function a_2 (a_1 : a_0) : a_0 =
a_0 { a = (a_1 + a_1) }
var a_3 : a_0 := a_2 (1, 2)
in
a_3.a
end;
()
)
bind::Renamer_main?_main.
This section has been updated for EPITA-2012 on 2010-02-22.
At the end of this stage, the compiler must be able to compute and display the escaping variables. These features are triggered by the options --escapes-compute/-e and --escapes-display/-E.
Relevant lecture notes include: names.pdf and intermediate.pdf.
Things to learn during this stage that you should remember:
union, and that
using template allows for more factoring.
This example demonstrates the computation and display of escaping variables (and formal arguments). By default, all the variables must be considered as escaping, since it is safe to put a non escaping variable onto the stack, while the converse is unsafe.
let
var one := 1
var two := 2
function incr (x: int) : int = x + one
in
incr (two)
end
$ tc -XEAeEA variable-escapes.tig
/* == Abstract Syntax Tree. == */
function _main () =
(
let
var /* escaping */ one := 1
var /* escaping */ two := 2
function incr (/* escaping */ x : int) : int =
(x + one)
in
incr (two)
end;
()
)
/* == Abstract Syntax Tree. == */
function _main () =
(
let
var /* escaping */ one := 1
var two := 2
function incr (x : int) : int =
(x + one)
in
incr (two)
end;
()
)
Compute the escapes after binding, so that the ast is known to be
sane enough (type checking is irrelevant): the EscapeVisitor
should not bother with undeclared entities.
undeclared
$ tc -e undefined-variable.tig
error-->undefined-variable.tig:1.1-10: undeclared variable: undeclared
⇒4
Run your compiler on merge.tig and to study its output. There is a number of silly mistakes that people usually make on TC-E: they are all easy to defeat when you do have a reasonable test suite, and once you understood that torturing your project is a good thing to do.
No additional code is provided, see TC-3 Given Code.
See src/ast, and src/escapes.
ast::PrettyPrinterPrettyPrinter.
Follow strictly the output format, since we parse your output to check
it. Display the ‘/* escaping */’ flag where needed, and
only where needed: each definition of an escaping variable/formal
is preceded by the comment ‘/* escaping */’. Do not display
meaningless flags due to implementation details. How this
pretty-printing is implemented is left to you, but factor common code.
escapes::EscapesVisitorescapes::EscapesVisitor in src/escapes/escapes-visitor.hh.
ast::Escapableast::VarDec inherit from ast::Escapable.
See Escapable.
Possible improvements include:
This section has been updated for EPITA-2012 on 2010-02-25.
At the end of this stage, the compiler type checks Tiger programs, and annotates the AST. Clear error messages are required.
Relevant lecture notes include names.pdf, type-checking.pdf.
Things to learn during this stage that you should remember:
TypeChecker.
Type checking is optional, invoked by --types-compute. As for the computation of bindings, this option only handles programs with no object construct. To perform the type-checking of programs with objects, use --object-types-compute.
Implementing overloaded functions in Tiger is an option, which requires the implementation of a different type checker, triggered by --overfun-types-compute (see TC-A). The option --typed/-T makes sure one of them was run.
Currently, the compiler cannot perform the type-checking with both overloading and object support enabled.
1 + "2"
$ tc int-plus-string.tig
$ tc -T int-plus-string.tig
error-->int-plus-string.tig:1.5-7: type mismatch
error--> right operand type: string
error--> expected type: int
⇒5
When there are several type errors, it is admitted that some remain hidden by others.
unknown_function (unknown_variable)
$ tc -T unknowns.tig
error-->unknowns.tig:1.1-35: undeclared function: unknown_function
⇒4
Be sure to check the type of all the constructs.
if 1 then 2
$ tc -T bad-if.tig
error-->bad-if.tig:1.1-11: type mismatch
error--> then clause type: int
error--> else clause type: void
⇒5
Be aware that type and function declarations are recursive by chunks. For instance:
let
type one = { hd : int, tail : two }
type two = { hd : int, tail : one }
function one (hd : int, tail : two) : one
= one { hd = hd, tail = tail }
function two (hd : int, tail : one) : two
= two { hd = hd, tail = tail }
var one := one (11, two (22, nil))
in
print_int (one.tail.hd); print ("\n")
end
$ tc -T mutuals.tig
In case you are interested, the result is:
$ tc -H mutuals.tig >mutuals.hir
$ havm mutuals.hir
22
The type-checker must catch erroneous inheritance relations.
let
/* Mutually recursive inheritance. */
type A = class extends A {}
/* Mutually recursive inheritance. */
type B = class extends C {}
type C = class extends B {}
/* Class inherits from a non-class type. */
type E = class extends int {}
in
end
$ tc --object-types-compute bad-super-type.tig
error-->bad-super-type.tig:3.12-29: recursive inheritance: A
error-->bad-super-type.tig:6.12-29: recursive inheritance: C
error-->bad-super-type.tig:10.26-28: class type expected, got: int
⇒5
Handle the type-checking of TypeDecs with care in
object::TypeChecker: they are processed in three steps, while
other declarations use a two-step visit. The object::TypeChecker
visitor proceeds as follows when it encounters a TypeDecs:
This three-pass visit allows class members to make forward references to
other types defined in the same block of types, for instance,
instantiate a class B from a class A (defined in the same
block), even if B is defined after A.
let
/* A block of types. */
class A
{
/* Valid forward reference to B, defined in the same block
as the class enclosing this member. */
var b := new B
}
type t = int
class B
{
}
in
end
$ tc --object-types-compute forward-reference-to-class.tig
(See object::TypeChecker::operator() (ast::TypeDecs&) for more
details.
Some code is provided: 2012-tc-4.0.tar.bz2, 2012-tc-4.1.tar.bz2. The transition from the previous version can be done thanks to the following diffs: 2012-tc-3.0-4.0.diff, 2012-tc-3.0-4.1.diff, 2012-tc-4.0-4.1.diff. See src/type.
What is to be done.
ast::Typableast::TypeConstructorast::Exp, ast::Dec, ast::Tyast::FunctionDec, ast::TypeDec, ast::Tytype::String, type::Int, and
type::Void. Using templates would be particularly appreciated to
factor the code between the four singleton classes, see TC-4 Options.
The remaining classes are incomplete.
Pay extra attention to type::operator== (const Type& a, const
Type& b) and type::Type::compatible_with.
type::TypeCheckerobject::TypeCheckerThese are features that you might want to implement in addition to the core features.
type::ErrorInt, which can create cascades of errors:
"666" = if 000 then 333 else "666"
$ tc -T is_devil.tig
error-->is_devil.tig:1.9-34: type mismatch
error--> then clause type: int
error--> else clause type: string
⇒5
One means to avoid this issue consists in introducing a new type,
type::Error, that the type checker would never complain about.
This can be a nice complement to ast::Error.
object::DesugarVisitor and implementing the
--object-desugar option. See TC-O.
let type weirdo = array of weirdo
in
print ("I'm a creep.\n")
end
the answer is "yes", as nothing prevents this in the Tiger
specifications. This type is not usable though.
type::Field useful?std::pair in type::Record is probably enough, and
simpler.
nil compatible with objects?var a : Object := nil"
The answer is no: nil is restricted to records.
Object?Object (syntactic sugar of class without an extends
clause).
For example,
let class Object {} in end
is invalid, since it is similar to
let class Object extends Object {} in end
and recursive inheritance is invalid.
One can try and introduce a Dummy type as a workaround
let
class Dummy {}
class Object extends Dummy {}
in
end
but this is just postponing the problem, since the code above is the same as the following:
let
class Dummy {} extends Object
class Object {} extends Dummy
in
end
where there is still a recursive inheritance.
The one solution is to define our Dummy type beforehand (i.e., in
its own block of type declarations), then to redefine Object.
/* Valid. */
let
class Dummy {}
in
let
class Object extends Dummy {}
in
end
end
Take care: this new Object type is different from the
built-in one. The code below gives an example of an invalid mix of
these two types.
let
class Dummy {}
function get_builtin_object () : Object = new Object /* builtin */
in
let
class Object extends Dummy {} /* custom */
/* Invalid assignment, since an instance of the builtin Object
is *not* an instance of the custom Object. */
var o : Object /* custom */ := get_builtin_object () /* builtin */
in
end
end
Possible improvements include:
type module alone requires four instances! Therefore
a template to generate such singletons is desirable. There are two ways
to address this issue: tailored to type (directly in
src/type/builtin-types.*), or in a completely generic way (in
lib/misc/singleton.*). See Modern C++ Design, for a topnotch
implementation.
Named-depth (i.e., the
number of Named objects traversed) to one. Another, nicer
possibility, would be to limit the expansion to once per
Named.
tcsh is up and running. You might want to use it to implement a
gui using Python's Tkinter.
This section has been updated for EPITA-2009 on 2007-04-26.
At the end of this stage, the compiler must be able to remove syntactic sugar from a type-checked AST. These features are triggered by the options --desugar and --overfun-desugar.
String comparisons can be translated to an equivalent AST using function calls, before the translation to HIR.
"foo" = "bar"
$ tc --desugar-string-cmp --desugar -A string-equality.tig
/* == Abstract Syntax Tree. == */
primitive print (string_0 : string)
primitive print_err (string_1 : string)
primitive print_int (int_2 : int)
primitive flush ()
primitive getchar () : string
primitive ord (string_3 : string) : int
primitive chr (code_4 : int) : string
primitive size (string_5 : string) : int
primitive streq (s1_6 : string, s2_7 : string) : int
primitive strcmp (s1_8 : string, s2_9 : string) : int
primitive substring (string_10 : string, start_11 : int, length_12 : int) : string
primitive concat (fst_13 : string, snd_14 : string) : string
primitive not (boolean_15 : int) : int
primitive exit (status_16 : int)
function _main () =
(
streq ("foo", "bar");
()
)
"foo" < "bar"
$ tc --desugar-string-cmp --desugar -A string-less.tig
/* == Abstract Syntax Tree. == */
primitive print (string_0 : string)
primitive print_err (string_1 : string)
primitive print_int (int_2 : int)
primitive flush ()
primitive getchar () : string
primitive ord (string_3 : string) : int
primitive chr (code_4 : int) : string
primitive size (string_5 : string) : int
primitive streq (s1_6 : string, s2_7 : string) : int
primitive strcmp (s1_8 : string, s2_9 : string) : int
primitive substring (string_10 : string, start_11 : int, length_12 : int) : string
primitive concat (fst_13 : string, snd_14 : string) : string
primitive not (boolean_15 : int) : int
primitive exit (status_16 : int)
function _main () =
(
(strcmp ("foo", "bar") < 0);
()
)
for loops can be seen as sugared while loops, and be
transformed as such.
for i := 0 to 10 do print_int (i)
$ tc --desugar-for --desugar -A simple-for-loop.tig
/* == Abstract Syntax Tree. == */
primitive print (string_0 : string)
primitive print_err (string_1 : string)
primitive print_int (int_2 : int)
primitive flush ()
primitive getchar () : string
primitive ord (string_3 : string) : int
primitive chr (code_4 : int) : string
primitive size (string_5 : string) : int
primitive streq (s1_6 : string, s2_7 : string) : int
primitive strcmp (s1_8 : string, s2_9 : string) : int
primitive substring (string_10 : string, start_11 : int, length_12 : int) : string
primitive concat (fst_13 : string, snd_14 : string) : string
primitive not (boolean_15 : int) : int
primitive exit (status_16 : int)
function _main () =
(
let
var _lo := 0
var _hi := 10
var i_17 := _lo
in
(if (_lo <= _hi)
then (while 1 do
(
print_int (i_17);
(if (i_17 = _hi)
then break
else ());
(i_17 := (i_17 + 1))
))
else ())
end;
()
)
This section has been updated for EPITA-2009 on 2007-04-26.
At the end of this stage, the compiler inlines function bodies where functions are called. In a later pass, useless functions can be pruned from the AST. These features are triggered by the options --inline and --prune. If you also implemented function overloading (see TC-A), use the options --overfun-inline and --overfun-prune.
let
function sub (i: int, j: int) :int = i + j
in
sub (1, 2)
end
$ tc -X --inline -A sub.tig
/* == Abstract Syntax Tree. == */
function _main () =
(
let
function sub_2 (i_0 : int, j_1 : int) : int =
(i_0 + j_1)
in
let
var a_3 : int := 1
var a_4 : int := 2
in
(a_3 + a_4)
end
end;
()
)
Recursive functions cannot be inlined.
--- topological_sort.hpp.old 2006-01-11 10:02:37.000000000 +0100
+++ topological_sort.hpp 2006-01-11 10:02:14.000000000 +0100
@@ -37,7 +37,7 @@
: m_iter(_iter) { }
template <typename Edge, typename Graph>
- void back_edge(const Edge& u, Graph&) { throw not_a_dag(); }
+ void back_edge(const Edge&, Graph&) { throw not_a_dag(); }
template <typename Vertex, typename Graph>
void finish_vertex(const Vertex& u, Graph&) { *m_iter++ = u; }
This section has been updated for EPITA-2009 on 2007-04-26.
At the end of this stage, the compiler adds dynamic checks of the bounds of arrays to the AST. Every access (either on read or write) is checked, and the program should stops with the runtime exit code (120) on out-of-bound access. This feature is triggered by the options --bound-checks-add and --overfun-bound-checks-add.
Here is an example with an array subscript out of bounds, run with HAVM.
let
type int_array = array of int
var foo := int_array [10] of 3
in
/* Index of bounds. */
foo[20]
end
$ tc --bound-checks-add -A subscript-read.tig
/* == Abstract Syntax Tree. == */
primitive print (string_0 : string)
primitive print_err (string_1 : string)
primitive print_int (int_2 : int)
primitive flush ()
primitive getchar () : string
primitive ord (string_3 : string) : int
primitive chr (code_4 : int) : string
primitive size (string_5 : string) : int
primitive streq (s1_6 : string, s2_7 : string) : int
primitive strcmp (s1_8 : string, s2_9 : string) : int
primitive substring (string_10 : string, start_11 : int, length_12 : int) : string
primitive concat (fst_13 : string, snd_14 : string) : string
primitive not (boolean_15 : int) : int
primitive exit (status_16 : int)
function _main () =
let
type __int_array = array of int
type _int_array = {
arr : __int_array,
size : int
}
function _check_bounds (a : _int_array, index : int, location : string) : int =
(
(if (if (index < 0)
then 1
else ((index >= a.size) <> 0))
then (
print_err (location);
print_err (": array index out of bounds.\n");
exit (120)
)
else ());
index
)
in
(
let
type int_array_17 = array of int
type _box_int_array_17 = {
arr : int_array_17,
size : int
}
var foo_18 := let
var _size := 10
in
_box_int_array_17 {
arr = int_array_17 [_size] of 3,
size = _size
}
end
in
foo_18.arr[_check_bounds (_cast (foo_18, _int_array), 20, "subscript-read.tig:6.3-9")]
end;
()
)
end
$ tc --bound-checks-add -L subscript-read.tig >subscript-read.lir
$ havm subscript-read.lir
error-->subscript-read.tig:6.3-9: array index out of bounds.
⇒120
And here is an example with an out-of-bounds assignment to an array cell, tested with Nolimips.
let
type int_array = array of int
var foo := int_array [10] of 3
in
/* Index of bounds. */
foo[42] := 51
end
$ tc --bound-checks-add -A subscript-write.tig
/* == Abstract Syntax Tree. == */
primitive print (string_0 : string)
primitive print_err (string_1 : string)
primitive print_int (int_2 : int)
primitive flush ()
primitive getchar () : string
primitive ord (string_3 : string) : int
primitive chr (code_4 : int) : string
primitive size (string_5 : string) : int
primitive streq (s1_6 : string, s2_7 : string) : int
primitive strcmp (s1_8 : string, s2_9 : string) : int
primitive substring (string_10 : string, start_11 : int, length_12 : int) : string
primitive concat (fst_13 : string, snd_14 : string) : string
primitive not (boolean_15 : int) : int
primitive exit (status_16 : int)
function _main () =
let
type __int_array = array of int
type _int_array = {
arr : __int_array,
size : int
}
function _check_bounds (a : _int_array, index : int, location : string) : int =
(
(if (if (index < 0)
then 1
else ((index >= a.size) <> 0))
then (
print_err (location);
print_err (": array index out of bounds.\n");
exit (120)
)
else ());
index
)
in
(
let
type int_array_17 = array of int
type _box_int_array_17 = {
arr : int_array_17,
size : int
}
var foo_18 := let
var _size := 10
in
_box_int_array_17 {
arr = int_array_17 [_size] of 3,
size = _size
}
end
in
(foo_18.arr[_check_bounds (_cast (foo_18, _int_array), 42, "subscript-write.tig:6.3-9")] := 51)
end;
()
)
end
$ tc --bound-checks-add -S subscript-write.tig >subscript-write.s
$ nolimips -l nolimips -Nue subscript-write.s
error-->subscript-write.tig:6.3-9: array index out of bounds.
⇒120
The bound checking extension relies on the use of casts (see TC-B Samples), see See Language Extensions. However, a simplistic implementation of casts introduces ambiguities in the grammar that even a GLR parser cannot resolve dynamically.
Consider the following example, where foo is an l-value :
_cast (foo, string)
This piece of code can be parsed in two different ways:
exp -> cast-exp -> exp -> lvalue (foo)
exp -> lvalue -> cast-lvalue -> lvalue (foo)
As the cast must preserve the l-value nature of foo, it must
itself produce an l-value. Hence we want the latter interpretation.
But the GLR parser cannot resolve this ambiguity, despite its ability to
postpone conflicts at run time. To help it take the right decision, you
can favor the right path by assigning dynamic priorities to
relevant rules, using Bison's %dprec keyword. See Bison's manual
(see Flex & Bison) for more information on this feature.
This section has been updated for EPITA-2009 on 2007-04-26.
At the end of this stage, the compiler must be able to resolve overloaded function calls. These features are triggered by the options --overfun-bindings-compute and --overfun-types-compute/-O.
Relevant lecture notes include: names.pdf.
Overloaded functions are not supported in regular Tiger.
let
function null (i: int) : int = i = 0
function null (s: string) : int = s = ""
in
null ("123") = null (123)
end
$ tc -Xb sizes.tig
error-->sizes.tig:3.3-42: redefinition: null
error-->sizes.tig:2.3-41: first definition
⇒4
Instead of regular binding, overloaded binding binds each function call to the set of active function definitions. Unfortunately displaying this set is not implemented, so we cannot see them in the following example:
$ tc -X --overfun-bindings-compute -BA sizes.tig
/* == Abstract Syntax Tree. == */
function _main /* 0x91460d8 */ () =
(
let
function null /* 0x9149fa8 */ (i /* 0x9149e38 */ : int /* 0 */) : int /* 0 */ =
(i /* 0x9149e38 */ = 0)
function null /* 0x914a1e0 */ (s /* 0x914a070 */ : string /* 0 */) : int /* 0 */ =
(s /* 0x914a070 */ = "")
in
(null /* 0 */ ("123") = null /* 0 */ (123))
end;
()
)
The selection of the right binding cannot be done before type-checking, since precisely overloading relies on types to distinguish the actual function called. Therefore it is the type checker that “finishes” the binding.
$ tc -XOBA sizes.tig
/* == Abstract Syntax Tree. == */
function _main /* 0x9d7c0b8 */ () =
(
let
function null /* 0x9d7ffa0 */ (i /* 0x9d7fe58 */ : int /* 0 */) : int /* 0 */ =
(i /* 0x9d7fe58 */ = 0)
function null /* 0x9d801f0 */ (s /* 0x9d80080 */ : string /* 0 */) : int /* 0 */ =
(s /* 0x9d80080 */ = "")
in
(null /* 0x9d801f0 */ ("123") = null /* 0x9d7ffa0 */ (123))
end;
()
)
There can be ambiguous (overloaded) calls.
let
type foo = {}
function empty (f: foo) : int = f = nil
type bar = {}
function empty (b: bar) : int = b = nil
in
empty (foo {});
empty (bar {});
empty (nil)
end
$ tc -XO over-amb.tig
error-->over-amb.tig:9.3-13: nil ambiguity calling `empty'
error-->matching declarations:
error--> empty @
error--> {
error--> f : foo =
error--> {
error--> }
error--> }
error--> empty @
error--> {
error--> b : bar =
error--> {
error--> }
error--> }
⇒5
The spirit of plain Tiger is kept: a “chunk” is not allowed to redefine a function with the same signature:
let
function foo (i: int) = ()
function foo (i: int) = ()
in
foo (42)
end
$ tc -XO over-duplicate.tig
error-->over-duplicate.tig:3.3-28: function complete redefinition: foo
error-->over-duplicate.tig:2.3-28: first definition
⇒5
but a signature can be defined twice in different blocks of function definitions.
let
function foo (i: int) = ()
in
let
function foo (i: int) = ()
in
foo (51)
end
end
$ tc -XOBA over-scoped.tig
/* == Abstract Syntax Tree. == */
function _main /* 0x8deb0b8 */ () =
(
let
function foo /* 0x8deeed8 */ (i /* 0x8deee58 */ : int /* 0 */) =
()
in
let
function foo /* 0x8def0c0 */ (i /* 0x8def018 */ : int /* 0 */) =
()
in
foo /* 0x8def0c0 */ (51)
end
end;
()
)
No additional code is provided.
See src/ast, and src/overload.
This section has been updated for EPITA-2012 on 2010-02-24.
At the end of this stage, the compiler must be able to desugar object constructs into plain Tiger without objects, a.k.a. Panther. This feature is triggered by the option --object-desugar.
This a very hard assignment. If you plan to work on it, start with very simple programs, and progressively add new desugaring patterns. Be sure to keep a complete test suite to cover all cases and avoid regressions.
Achieving a faithful and complete translation from Tiger to Panther requires a lot of work. Even the reference implementation of the object-desugar pass (about 1,000 lines of code) is not perfect, as some inputs may generate invalid Tiger code after desugaring objects (in particular when playing with scopes).
Be warned: even Small object-oriented Tiger programs may generate complicated desugared outputs.
let
class A {}
in
end
$ tc -X --object-desugar -A empty-class.tig
/* == Abstract Syntax Tree. == */
function _main () =
let
type _variant_Object = { exact_type : int }
type _variant_A_0 = { exact_type : int }
var _id_Object := 0
var _id_A_0 := 1
function _new_Object () : _variant_Object =
_variant_Object { exact_type = _id_Object }
in
(
let
function _new_A_0 () : _variant_A_0 =
let
in
_variant_A_0 { exact_type = _id_A_0 }
end
function _upcast_A_0_to_Object (source : _variant_A_0) : _variant_Object =
_variant_Object { exact_type = _id_A_0 }
in
()
end;
()
)
end
let
class B
{
var a := 42
method m () : int = self.a
}
var b := new B
in
b.a := 51
end
$ tc -X --object-desugar -A simple-class.tig
/* == Abstract Syntax Tree. == */
function _main () =
let
type _variant_Object = {
exact_type : int,
field_B_1 : _contents_B_1
}
type _contents_B_1 = { a : int }
type _variant_B_1 = {
exact_type : int,
field_B_1 : _contents_B_1
}
var _id_Object := 0
var _id_B_1 := 1
function _new_Object () : _variant_Object =
_variant_Object {
exact_type = _id_Object,
field_B_1 = nil
}
in
(
let
function _new_B_1 () : _variant_B_1 =
let
var contents_B_1 := _contents_B_1 { a = 42 }
in
_variant_B_1 {
exact_type = _id_B_1,
field_B_1 = contents_B_1
}
end
function _upcast_B_1_to_Object (source : _variant_B_1) : _variant_Object =
_variant_Object {
exact_type = _id_B_1,
field_B_1 = source.field_B_1
}
function _method_B_1_m (self : _variant_B_1) : int =
self.field_B_1.a
function _dispatch_B_1_m (self : _variant_B_1) : int =
_method_B_1_m (self)
var b_2 := _new_B_1 ()
in
(b_2.field_B_1.a := 51)
end;
()
)
end
let
class C
{
var a := 0
method m () : int = self.a
}
class D extends C
{
var b := 9
/* Override C.m(). */
method m () : int = self.a + self.b
}
var d : D := new D
/* Valid upcast due to inclusion polymorphism. */
var c : C := d
in
c.a := 42;
/* Note that accessing `c.b' is not allowed, since `c' is
statically known as a `C', even though it is actually a `D'
at run time. */
let
/* Polymorphic call. */
var res := c.m ()
in
print_int(res);
print("\n")
end
end
$ tc --object-desugar -A override.tig
/* == Abstract Syntax Tree. == */
primitive print (string_0 : string)
primitive print_err (string_1 : string)
primitive print_int (int_2 : int)
primitive flush ()
primitive getchar () : string
primitive ord (string_3 : string) : int
primitive chr (code_4 : int) : string
primitive size (string_5 : string) : int
primitive streq (s1_6 : string, s2_7 : string) : int
primitive strcmp (s1_8 : string, s2_9 : string) : int
primitive substring (string_10 : string, start_11 : int, length_12 : int) : string
primitive concat (fst_13 : string, snd_14 : string) : string
primitive not (boolean_15 : int) : int
primitive exit (status_16 : int)
function _main () =
let
type _variant_Object = {
exact_type : int,
field_C_18 : _contents_C_18,
field_D_20 : _contents_D_20
}
type _contents_C_18 = { a : int }
type _variant_C_18 = {
exact_type : int,
field_C_18 : _contents_C_18,
field_D_20 : _contents_D_20
}
type _contents_D_20 = { b : int }
type _variant_D_20 = {
exact_type : int,
field_D_20 : _contents_D_20,
field_C_18 : _contents_C_18
}
var _id_Object := 0
var _id_C_18 := 1
var _id_D_20 := 2
function _new_Object () : _variant_Object =
_variant_Object {
exact_type = _id_Object,
field_C_18 = nil,
field_D_20 = nil
}
in
(
let
function _new_C_18 () : _variant_C_18 =
let
var contents_C_18 := _contents_C_18 { a = 0 }
in
_variant_C_18 {
exact_type = _id_C_18,
field_C_18 = contents_C_18,
field_D_20 = nil
}
end
function _upcast_C_18_to_Object (source : _variant_C_18) : _variant_Object =
_variant_Object {
exact_type = _id_C_18,
field_C_18 = source.field_C_18,
field_D_20 = source.field_D_20
}
function _downcast_C_18_to_D_20 (source : _variant_C_18) : _variant_D_20 =
_variant_D_20 {
exact_type = _id_D_20,
field_D_20 = source.field_D_20,
field_C_18 = source.field_C_18
}
function _method_C_18_m (self : _variant_C_18) : int =
self.field_C_18.a
function _dispatch_C_18_m (self : _variant_C_18) : int =
(if (self.exact_type = _id_C_18)
then _method_C_18_m (self)
else _method_D_20_m (_downcast_C_18_to_D_20 (self)))
function _new_D_20 () : _variant_D_20 =
let
var contents_D_20 := _contents_D_20 { b = 9 }
var contents_C_18 := _contents_C_18 { a = 0 }
in
_variant_D_20 {
exact_type = _id_D_20,
field_D_20 = contents_D_20,
field_C_18 = contents_C_18
}
end
function _upcast_D_20_to_C_18 (source : _variant_D_20) : _variant_C_18 =
_variant_C_18 {
exact_type = _id_D_20,
field_C_18 = source.field_C_18,
field_D_20 = source.field_D_20
}
function _upcast_D_20_to_Object (source : _variant_D_20) : _variant_Object =
_variant_Object {
exact_type = _id_D_20,
field_C_18 = source.field_C_18,
field_D_20 = source.field_D_20
}
function _method_D_20_m (self : _variant_D_20) : int =
(self.field_C_18.a + self.field_D_20.b)
function _dispatch_D_20_m (self : _variant_D_20) : int =
_method_D_20_m (self)
var d_21 : _variant_D_20 := _new_D_20 ()
var c_22 : _variant_C_18 := _upcast_D_20_to_C_18 (d_21)
in
(
(c_22.field_C_18.a := 42);
let
var res_23 := _dispatch_C_18_m (c_22)
in
(
print_int (res_23);
print ("\n")
)
end
)
end;
()
)
end
$ tc --object-desugar -L override.tig >override.lir
$ havm override.lir
51
This section has been updated for EPITA-2012 on 2010-03-11.
At the end of this stage the compiler translates the ast into the high level intermediate representation, hir for short.
Relevant lecture notes include intermediate.pdf.
Things to learn during this stage that you should remember:
operator-> and operator*. std::auto_ptr are
also smart pointers.
std::auto_ptrauto_ptr to
guarantee it is released (delete) at the end of the run. The
translate module interface also uses auto_ptr to specify
sources and sinks.
misc::ref provides reference counting smart
pointers to ease the memory management. It is used to handle nodes of
the intermediate representation, especially because during
TC-6 some rewriting might transform this tree into an
dag, in which case memory deallocation is complex.
union keyword, inherited from C. Not only is
union not type safe, it also forbids class members. Some people
have worked hard to implement union ŕ la C++, i.e., with type
safety, polymorphism etc. These union are called “discriminated
unions” or “variants” to follow the vocabulary introduced by Caml.
See the papers from Andrei Alexandrescu:
Discriminated Unions (i),
Discriminated Unions (ii),
Discriminated Unions (iii) for an introduction to the techniques. We
use boost::variant (see Boost.org) in temp.
I strongly encourage you to read these enlightening articles.
misc::ref. You must be able
to explain these pitfalls.
temp module heavily uses this feature: understand it, and be
ready to write similar code.
temp::Identifier
is based on these ideas.
clone, as in
frame::Access::clone (). Understand return type covariance.
TC-5 can be started (and should be started if you don't want to finish it in a hurry) by first making sure your compiler can handle code that uses no variables. Then, you can complete your compiler to support more and more Tiger features.
This example is probably the simplest Tiger program.
0
$ tc --hir-display 0.tig
/* == High Level Intermediate representation. == */
# Routine: _main
label main
# Prologue
# Body
seq
sxp
const 0
sxp
const 0
seq end
# Epilogue
label end
You should then probably try to make more difficult programs with literals only. Arithmetics is one of the easiest tasks.
1 + 2 * 3
$ tc -H arith.tig
/* == High Level Intermediate representation. == */
# Routine: _main
label main
# Prologue
# Body
seq
sxp
binop add
const 1
binop mul
const 2
const 3
sxp
const 0
seq end
# Epilogue
label end
Use havm to exercise your output.
$ tc -H arith.tig >arith.hir
$ havm arith.hir
Unfortunately, without actually printing something, you won't see the final result, which means you need to implement function calls. Fortunately, you can ask havm for a verbose execution:
$ havm --trace arith.hir
error-->plaining
error-->unparsing
error-->checking
error-->checkingLow
error-->evaling
error--> call ( name main ) []
error-->9.6-9.13: const 1
error-->11.8-11.15: const 2
error-->12.8-12.15: const 3
error-->10.6-12.15: binop mul 2 3
error-->8.4-12.15: binop add 1 6
error-->7.2-12.15: sxp 7
error-->14.4-14.11: const 0
error-->13.2-14.11: sxp 0
error--> end call ( name main ) [] = 0
If you look carefully, you will find an ‘sxp 7’ in there...
Then you are encouraged to implement control structures.
if 101 then 102 else 103
$ tc -H if-101.tig
/* == High Level Intermediate representation. == */
# Routine: _main
label main
# Prologue
# Body
seq
seq
cjump ne
const 101
const 0
name l0
name l1
label l0
sxp
const 102
jump
name l2
label l1
sxp
const 103
label l2
seq end
sxp
const 0
seq end
# Epilogue
label end
And even more difficult control structure uses:
while 101
do (if 102 then break)
$ tc -H while-101.tig
/* == High Level Intermediate representation. == */
# Routine: _main
label main
# Prologue
# Body
seq
seq
label l1
cjump ne
const 101
const 0
name l2
name l0
label l2
seq
cjump ne
const 102
const 0
name l3
name l4
label l3
jump
name l0
jump
name l5
label l4
sxp
const 0
label l5
seq end
jump
name l1
label l0
seq end
sxp
const 0
seq end
# Epilogue
label end
Beware that havm has some known bugs with its handling of
break, see Havm Bugs.
Optimize the number of jumps needed to compute nested if, using
‘translate::Ix’. A plain use of ‘translate::Cx’ is possible,
but less efficient.
Consider the following sample:
if if 11 < 22 then 33 < 44 else 55 < 66 then print ("OK\n")
a naive implementation will probably produce too many cjump
instructions6:
$ tc --hir-naive -H boolean.tig
/* == High Level Intermediate representation. == */
label l7
"OK\n"
# Routine: _main
label main
# Prologue
# Body
seq
seq
cjump ne
eseq
seq
cjump lt
const 11
const 22
name l0
name l1
label l0
move
temp t0
eseq
seq
move
temp t1
const 1
cjump lt
const 33
const 44
name l3
name l4
label l4
move
temp t1
const 0
label l3
seq end
temp t1
jump
name l2
label l1
move
temp t0
eseq
seq
move
temp t2
const 1
cjump lt
const 55
const 66
name l5
name l6
label l6
move
temp t2
const 0
label l5
seq end
temp t2
jump
name l2
label l2
seq end
temp t0
const 0
name l8
name l9
label l8
sxp
call
name print
name l7
call end
jump
name l10
label l9
sxp
const 0
jump
name l10
label l10
seq end
sxp
const 0
seq end
# Epilogue
label end
$ tc --hir-naive -H boolean.tig >boolean-1.hir
$ havm --profile boolean-1.hir
error-->/* Profiling. */
error-->fetches from temporary : 2
error-->fetches from memory : 0
error-->binary operations : 0
error-->function calls : 1
error-->stores to temporary : 2
error-->stores to memory : 0
error-->jumps : 2
error-->conditional jumps : 3
error-->/* Execution time. */
error-->number of cycles : 19
OK
An analysis of this pessimization reveals that it is related to the computation of an intermediate expression (the value of ‘if 11 < 22 then 33 < 44 else 55 < 66’) later decoded as a condition. A better implementation will produce:
$ tc -H boolean.tig
/* == High Level Intermediate representation. == */
label l0
"OK\n"
# Routine: _main
label main
# Prologue
# Body
seq
seq
seq
cjump lt
const 11
const 22
name l4
name l5
label l4
cjump lt
const 33
const 44
name l1
name l2
label l5
cjump lt
const 55
const 66
name l1
name l2
seq end
label l1
sxp
call
name print
name l0
call end
jump
name l3
label l2
sxp
const 0
label l3
seq end
sxp
const 0
seq end
# Epilogue
label end
$ tc -H boolean.tig >boolean-2.hir
$ havm --profile boolean-2.hir
error-->/* Profiling. */
error-->fetches from temporary : 0
error-->fetches from memory : 0
error-->binary operations : 0
error-->function calls : 1
error-->stores to temporary : 0
error-->stores to memory : 0
error-->jumps : 1
error-->conditional jumps : 2
error-->/* Execution time. */
error-->number of cycles : 13
OK
The game becomes more interesting with primitive calls (which are easier to compile than function definitions and function calls).
(print_int (101); print ("\n"))
$ tc -H print-101.tig >print-101.hir
$ havm print-101.hir
101
Complex values, arrays and records, also need calls to the runtime system:
let
type ints = array of int
var ints := ints [51] of 42
in
print_int (ints[ints[0]]); print ("\n")
end
$ tc -H print-array.tig
/* == High Level Intermediate representation. == */
label l0
"\n"
# Routine: _main
label main
# Prologue
move
temp t1
temp fp
move
temp fp
temp sp
move
temp sp
binop sub
temp sp
const 4
# Body
seq
seq
move
mem
temp fp
eseq
move
temp t0
call
name init_array
const 51
const 42
call end
temp t0
seq
sxp
call
name print_int
mem
binop add
mem
temp fp
binop mul
mem
binop add
mem
temp fp
binop mul
const 0
const 4
const 4
call end
sxp
call
name print
name l0
call end
seq end
seq end
sxp
const 0
seq end
# Epilogue
move
temp sp
temp fp
move
temp fp
temp t1
label end
$ tc -H print-array.tig >print-array.hir
$ havm print-array.hir
42
The case of record is more subtle. Think carefully about the following example
let
type list = { h: int, t: list }
var list := list { h = 1,
t = list { h = 2,
t = nil } }
in
print_int (list.t.h); print ("\n")
end
The following example demonstrates the usefulness of information about escapes: when it is not computed, all the variables are stored on the stack.
let
var a := 1
var b := 2
var c := 3
in
a := 2;
c := a + b + c;
print_int (c);
print ("\n")
end
$ tc -H vars.tig
/* == High Level Intermediate representation. == */
label l0
"\n"
# Routine: _main
label main
# Prologue
move
temp t0
temp fp
move
temp fp
temp sp
move
temp sp
binop sub
temp sp
const 12
# Body
seq
seq
move
mem
temp fp
const 1
move
mem
binop add
temp fp
const -4
const 2
move
mem
binop add
temp fp
const -8
const 3
seq
move
mem
temp fp
const 2
move
mem
binop add
temp fp
const -8
binop add
binop add
mem
temp fp
mem
binop add
temp fp
const -4
mem
binop add
temp fp
const -8
sxp
call
name print_int
mem
binop add
temp fp
const -8
call end
sxp
call
name print
name l0
call end
seq end
seq end
sxp
const 0
seq end
# Epilogue
move
temp sp
temp fp
move
temp fp
temp t0
label end
Once escaping variable computation implemented, we know none escape in this example, hence they can be stored in temporaries:
$ tc -eH vars.tig
/* == High Level Intermediate representation. == */
label l0
"\n"
# Routine: _main
label main
# Prologue
# Body
seq
seq
move
temp t0
const 1
move
temp t1
const 2
move
temp t2
const 3
seq
move
temp t0
const 2
move
temp t2
binop add
binop add
temp t0
temp t1
temp t2
sxp
call
name print_int
temp t2
call end
sxp
call
name print
name l0
call end
seq end
seq end
sxp
const 0
seq end
# Epilogue
label end
$ tc -eH vars.tig >vars.hir
$ havm vars.hir
7
Then, you should implement the declaration of functions:
let
function fact (i: int) : int =
if i = 0 then 1
else i * fact (i - 1)
in
print_int (fact (15));
print ("\n")
end
$ tc -H fact15.tig
/* == High Level Intermediate representation. == */
# Routine: fact
label l0
# Prologue
move
temp t1
temp fp
move
temp fp
temp sp
move
temp sp
binop sub
temp sp
const 8
move
mem
temp fp
temp i0
move
mem
binop add
temp fp
const -4
temp i1
# Body
move
temp rv
eseq
seq
cjump eq
mem
binop add
temp fp
const -4
const 0
name l1
name l2
label l1
move
temp t0
const 1
jump
name l3
label l2
move
temp t0
binop mul
mem
binop add
temp fp
const -4
call
name l0
mem
temp fp
binop sub
mem
binop add
temp fp
const -4
const 1
call end
label l3
seq end
temp t0
# Epilogue
move
temp sp
temp fp
move
temp fp
temp t1
label end
label l4
"\n"
# Routine: _main
label main
# Prologue
# Body
seq
seq
sxp
call
name print_int
call
name l0
temp fp
const 15
call end
call end
sxp
call
name print
name l4
call end
seq end
sxp
const 0
seq end
# Epilogue
label end
$ tc -H fact15.tig >fact15.hir
$ havm fact15.hir
2004310016
And finally, you should support escaping variables (see File 4.27).
$ tc -eH variable-escapes.tig
/* == High Level Intermediate representation. == */
# Routine: incr
label l0
# Prologue
move
temp t2
temp fp
move
temp fp
temp sp
move
temp sp
binop sub
temp sp
const 4
move
mem
temp fp
temp i0
move
temp t1
temp i1
# Body
move
temp rv
binop add
temp t1
mem
mem
temp fp
# Epilogue
move
temp sp
temp fp
move
temp fp
temp t2
label end
# Routine: _main
label main
# Prologue
move
temp t3
temp fp
move
temp fp
temp sp
move
temp sp
binop sub
temp sp
const 4
# Body
seq
sxp
eseq
seq
move
mem
temp fp
const 1
move
temp t0
const 2
seq end
call
name l0
temp fp
temp t0
call end
sxp
const 0
seq end
# Epilogue
move
temp sp
temp fp
move
temp fp
temp t3
label end
Some code is provided: 2012-tc-5.0.tar.bz2. The transition from the previous versions can be done thanks to the following diffs: 2012-tc-4.0-5.0.diff, 2012-tc-4.1-5.0.diff. For a description of the new modules, see src/temp, src/tree, src/frame, src/translate.
You are encouraged to first try very simple examples: ‘nil’, ‘1 + 2’, ‘"foo" < "bar"’ etc. Then consider supporting variables, and finally handle the case of the functions.
temp::IdentifierYou are invited to follow the best practices for variants, in
particular, avoid “type switching” by hand, rather use variant
visitors. For instance the IdentifierEqualVisitor can be used
this way:
template <template <typename Tag_> class Traits_>
bool
Identifier<Traits_>::operator== (const Identifier<Traits_>& rhs) const
{
return
rank_get () == rhs.rank_get () &&
boost::apply_visitor (IdentifierEqualToVisitor (), value_, rhs.value_);
}
tree::Fragmenttree::ProcFrag::dump that outputs the
routine themselves plus the glue code (allocating the frame
etc.).
translate::TranslatorThis section documents possible extensions you could implement in TC-5.
The implementation of the bounds checking can be done when generating the IR. Requirements are the same than for the see TC-B option. You can use HAVM to test the success of your bounds checking.
Warning: this optimization is difficult to do it perfectly, and therefore, expect a big bonus.
In a first and conservative extension, the compiler considers that all
the functions (but the builtins!) need a static link. This is correct,
but inefficient: for instance, the traditional fact function will
spend almost as much time handling the static link, than its real
argument.
Some functions need a static link, but don't need to save it on the stack. For instance, in the following example:
let
var foo := 1
function foo () : int = foo
in
foo ()
end
the function foo does need a static link to access the variable
foo, but does not need to store its static link on the stack.
It is suggested to address these problems in the following order:
$ tc -E fact.tig
/* == Escapes. == */
let
function fact (/* escaping sl *//* escaping */ n : int) : int =
if (n = 0)
then 1
else (n * fact ( (n - 1)))
in
fact (10)
end
$ tc -eE fact.tig
/* == Escapes. == */
let
function fact (n : int) : int =
if (n = 0)
then 1
else (n * fact ( (n - 1)))
in
fact (10)
end
call and progFrag prologues.
$ tc -eE escaping-sl.tig
/* == Escapes. == */
let
var toto := 1
function outer (/* escaping sl */) : int =
let function inner (/* sl */) : int = toto
in inner () end
in
outer ()
end
Watch out, it is not trivial to find the minimum. What do you think
about the static link of the function sister below?
let
var var := 1
function outer () : int =
let
function inner () : int = var
in
inner ()
end
function sister () : int = outer ()
in
sister ()
end
That would mean that the target module (see src/target) would
be given during TC-5, which seemed too difficult and
anti-pedagogical, so we used fp and rv where he uses
$fp and $v0. While this does make TC-5 more
target independent and TC-5 tarballs lighter, it slightly
complicates the rest of the compiler.
There remains one target dependent information wired in hard: the word
size is set to 4.
translate::Level reads:
// Install a slot for the static link if needed.
Level::Level (const misc::symbol& name,
const Level* parent,
frame::bool_list_type formal_escapes) :
parent_ (parent),
frame_ (new frame::Frame (name))
{
// FIXME: Some code was deleted here (Allocate a formal for the static link).
// Install translate::Accesses for all the formals.
frame::bool_list_type::const_iterator i;
for (i = formal_escapes.begin (); i != formal_escapes.end (); ++i)
formal_alloc (*i);
}
To allocate a formal for the static link, look at how other formals are allocated, and take these into account:
translate::Level;
Obviously, this won't hold if you plan to optimize the static links
(see TC-5 Optimizing Static Links); you'll have to tweak
translate::Level's constructor.
var i := 0 won't compile?SIGSEGV or a
failed assertion. For instance, the following command probably won't
work: echo 'var i := 0' | tc --hir-compute -.
Variables must be allocated in a level (see
translate::Translator::operator() (const ast::VarDec&)). However,
there is no level for global variable declarations (outside
_main). The current language specification does not address this
case, so you are free to handle it as you wish, though an assertion on
the presence of an enclosing level is probably the easiest solution.
Possible improvements include:
Tree creates new nodes for equal
expressions; for instance two uses of the variable foo lead to
two equal instantiations of tree::Temp. The same applies to more
complex constructs such as the same translation if foo is
actually a frame resident variable etc. Because memory consumption may
have a negative impact on performances, it is desirable to implement
maximal sharing: whenever a Tree is needed, we first check
whether it already exists and then reuse it. This must be done
recursively: the translation of ‘(x + x) * (x + x)’ should have a
single instantiation of ‘x + x’ instead of two, but also a single
instantiation of ‘x’ instead of four.
Node sharing makes some algorithms, such as rewriting, more complex,
especially wrt memory management. Garbage collection is almost
required, but fortunately the node of Tree are reference counted!
Therefore, almost everything is ready to implement maximal node sharing.
See spot, for an explanation on how this approach was successfully
implemented. See
the ATermLibrary for a general implementation of maximally shared trees.
This section has been updated for EPITA-2012 on 2010-04-20.
At the end of this stage, the compiler produces low level intermediate representation: lir. lir is a subset of the hir: some patterns are forbidden. This is why it is also named canonicalization.
Relevant lecture notes include intermediate.pdf.
Things to learn during this stage that you should remember:
std::splice, std::find_if,
std::unary_function, std::not1 etc.
There are several stages in TC-6.
The first task in TC-6 is getting rid of all the
eseq. To do this, you have to move the statement part of an
eseq at the end of the current sequence point, and keeping
the expression part in place.
Compare for instance the hir to the lir in the following case:
let function print_ints (a: int, b: int) =
(print_int (a); print (", "); print_int (b); print ("\n"))
var a := 0
in
print_ints (1, (a := a + 1; a))
end
One possible hir translation is:
$ tc -eH preincr-1.tig
/* == High Level Intermediate representation. == */
label l1
", "
label l2
"\n"
# Routine: print_ints
label l0
# Prologue
move
temp t2
temp fp
move
temp fp
temp sp
move
temp sp
binop sub
temp sp
const 4
move
mem
temp fp
temp i0
move
temp t0
temp i1
move
temp t1
temp i2
# Body
seq
sxp
call
name print_int
temp t0
call end
sxp
call
name print
name l1
call end
sxp
call
name print_int
temp t1
call end
sxp
call
name print
name l2
call end
seq end
# Epilogue
move
temp sp
temp fp
move
temp fp
temp t2
label end
# Routine: _main
label main
# Prologue
# Body
seq
seq
move
temp t3
const 0
sxp
call
name l0
temp fp
const 1
eseq
move
temp t3
binop add
temp t3
const 1
temp t3
call end
seq end
sxp
const 0
seq end
# Epilogue
label end
A possible canonicalization is then:
$ tc -eL preincr-1.tig
/* == Low Level Intermediate representation. == */
label l1
", "
label l2
"\n"
# Routine: print_ints
label l0
# Prologue
move
temp t2
temp fp
move
temp fp
temp sp
move
temp sp
binop sub
temp sp
const 4
move
mem
temp fp
temp i0
move
temp t0
temp i1
move
temp t1
temp i2
# Body
seq
label l3
sxp
call
name print_int
temp t0
call end
sxp
call
name print
name l1
call end
sxp
call
name print_int
temp t1
call end
sxp
call
name print
name l2
call end
label l4
seq end
# Epilogue
move
temp sp
temp fp
move
temp fp
temp t2
label end
# Routine: _main
label main
# Prologue
# Body
seq
label l5
move
temp t3
const 0
move
temp t5
temp fp
move
temp t3
binop add
temp t3
const 1
sxp
call
name l0
temp t5
const 1
temp t3
call end
label l6
seq end
# Epilogue
label end
The example above is simple because ‘1’ commutes with ‘(a := a + 1; a)’: the order does not matter. But if you change the ‘1’ into ‘a’, then you cannot exchange ‘a’ and ‘(a := a + 1; a)’, so the translation is different. Compare the previous lir with the following, and pay attention to
let function print_ints (a: int, b: int) =
(print_int (a); print (", "); print_int (b); print ("\n"))
var a := 0
in
print_ints (a, (a := a + 1; a))
end
$ tc -eL preincr-2.tig
/* == Low Level Intermediate representation. == */
label l1
", "
label l2
"\n"
# Routine: print_ints
label l0
# Prologue
move
temp t2
temp fp
move
temp fp
temp sp
move
temp sp
binop sub
temp sp
const 4
move
mem
temp fp
temp i0
move
temp t0
temp i1
move
temp t1
temp i2
# Body
seq
label l3
sxp
call
name print_int
temp t0
call end
sxp
call
name print
name l1
call end
sxp
call
name print_int
temp t1
call end
sxp
call
name print
name l2
call end
label l4
seq end
# Epilogue
move
temp sp
temp fp
move
temp fp
temp t2
label end
# Routine: _main
label main
# Prologue
# Body
seq
label l5
move
temp t3
const 0
move
temp t5
temp fp
move
temp t6
temp t3
move
temp t3
binop add
temp t3
const 1
sxp
call
name l0
temp t5
temp t6
temp t3
call end
label l6
seq end
# Epilogue
label end
As you can see, the output is the same for the hir and the lir:
$ tc -eH preincr-2.tig >preincr-2.hir
$ havm preincr-2.hir
0, 1
$ tc -eL preincr-2.tig >preincr-2.lir
$ havm preincr-2.lir
0, 1
Be very careful when dealing with
mem. For instance, rewriting
something like:
call (foo, eseq (move (temp t, const 51), temp t))
into
move temp t1, temp t
move temp t, const 51
call (foo, temp t)
is wrong: ‘temp t’ is not a subexpression, rather it is being defined here. You should produce:
move temp t, const 51
call (foo, temp t)
Another danger is the handling of ‘move (mem, )’. For instance:
move (mem foo, x)
must be rewritten into:
move (temp t, foo)
move (mem (temp t), x)
not as:
move (temp t, mem (foo))
move (temp t, x)
In other words, the first subexpression of ‘move (mem (foo), )’ is ‘foo’, not ‘mem (foo)’. The following example is a good crash test against this problem:
let type int_array = array of int
var tab := int_array [2] of 51
in
tab[0] := 100;
tab[1] := 200;
print_int (tab[0]); print ("\n");
print_int (tab[1]); print ("\n")
end
$ tc -eL move-mem.tig >move-mem.lir
$ havm move-mem.lir
100
200
You also ought to get rid of nested calls:
print (chr (ord ("\n")))
$ tc -L nested-calls.tig
/* == Low Level Intermediate representation. == */
label l0
"\n"
# Routine: _main
label main
# Prologue
# Body
seq
label l1
move
temp t1
call
name ord
name l0
call end
move
temp t2
call
name chr
temp t1
call end
sxp
call
name print
temp t2
call end
label l2
seq end
# Epilogue
label end
There are only two valid call forms: ‘sxp (call (...))’, and ‘move (temp (...), call (...))’.
Contrary to C, the hir and lir always denote the same value. For instance the following Tiger code:
let
var a := 1
function a (t: int) : int =
(a := a + 1;
print_int (t); print (" -> "); print_int (a); print ("\n");
a)
var b := a (1) + a (2) * a (3)
in
print_int (b); print ("\n")
end
should always produce:
$ tc -L seq-point.tig >seq-point.lir
$ havm seq-point.lir
1 -> 2
2 -> 3
3 -> 4
14
independently of the what ir you ran. It has nothing to do with operator precedence!
In C, you have no such guarantee: the following program can give different results with different compilers and/or on different architectures.
#include <stdio.h>
int a_ = 1;
int
a (int t)
{
++a_;
printf ("%d -> %d\n", t, a_);
return a_;
}
int
main (void)
{
int b = a (1) + a (2) * a (3);
printf ("%d\n", b);
return 0;
}
Once your eseq and call canonicalized, normalize
cjumps: they must be followed by their “false” label. This
goes in two steps:
A basic block is a sequence of code starting with a label, ending with a jump (conditional or not), and with no jumps, no labels inside.
Now put all the basic blocks into a single sequence.
The following example highlights the need for new labels: at least one for the entry point, and one for the exit point:
1 & 2
$ tc -L 1-and-2.tig
/* == Low Level Intermediate representation. == */
# Routine: _main
label main
# Prologue
# Body
seq
label l3
cjump ne
const 1
const 0
name l0
name l1
label l1
label l2
jump
name l4
label l0
jump
name l2
label l4
seq end
# Epilogue
label end
The following example contains many jumps. Compare the hir to the lir:
while 10 | 20 do if 30 | 40 then break else break
$ tc -H broken-while.tig
/* == High Level Intermediate representation. == */
# Routine: _main
label main
# Prologue
# Body
seq
seq
label l1
seq
cjump ne
const 10
const 0
name l3
name l4
label l3
cjump ne
const 1
const 0
name l2
name l0
label l4
cjump ne
const 20
const 0
name l2
name l0
seq end
label l2
seq
seq
cjump ne
const 30
const 0
name l8
name l9
label l8
cjump ne
const 1
const 0
name l5
name l6
label l9
cjump ne
const 40
const 0
name l5
name l6
seq end
label l5
jump
name l0
jump
name l7
label l6
jump
name l0
label l7
seq end
jump
name l1
label l0
seq end
sxp
const 0
seq end
# Epilogue
label end
$ tc -L broken-while.tig
/* == Low Level Intermediate representation. == */
# Routine: _main
label main
# Prologue
# Body
seq
label l10
label l1
cjump ne
const 10
const 0
name l3
name l4
label l4
cjump ne
const 20
const 0
name l2
name l0
label l0
jump
name l11
label l2
cjump ne
const 30
const 0
name l8
name l9
label l9
cjump ne
const 40
const 0
name l5
name l6
label l6
jump
name l0
label l5
jump
name l0
label l8
cjump ne
const 1
const 0
name l5
name l13
label l13
jump
name l6
label l3
cjump ne
const 1
const 0
name l2
name l14
label l14
jump
name l0
label l11
seq end
# Epilogue
label end
Some code is provided: 2012-tc-6.0.tar.bz2. The transition from the previous versions can be done thanks to 2012-tc-5.0-6.0.diff. For a description of the new module, see src/canon.
It includes most of the canonicalization.
Everything you need.
Possible improvements include:
This section has been updated for EPITA-2012 on 2010-06-07.
At the end of this stage, the compiler produces the very low level
intermediate representation: assem. This language is basically
the target assembly, enhanced with arbitrarily many registers
($x666). This output is obviously target dependent: we aim at
mips, as we use Nolimips to run it.
Relevant lecture notes include instr-selection.pdf.
Things to learn during this stage that you should remember:
ios::xallocInstr are contained in Instrs, itself in Fragment,
itself in Fragments. Suppose you mean to add a debugging flag to
print an Instr, what shall you do? Add another argument to all
the dump methods in these four hierarchies? The problem with
Temp is even worse: they are scattered everywhere, yet we would like
to specify how to output them thanks to a std::map. Should we
pass this map in each and every single call?
Using ios::xalloc, ostream::pword, and
ostream::iword saves the day.
The goal of TC-7 is straightforward: starting from
lir, generate the mips instructions, except that you don't
have actual registers: we still heavily use Temps. Register
allocation will be done in a later stage, TC-9.
1 + 2 * 3
$ tc --inst-display seven.tig
# == Final assembler ouput. == #
# Routine: _main
tc_main:
# Allocate