This document (split and beautiful or in one simple piece) details the various tasks the "Compilation" students must complete.
It was last edited on September 17, 2003, using:
$ tc --version tc (LRDE Tiger Compiler 0.62) Revision 0.1004 Tue, 09 Sep 2003 16:44:26 +0200 This package was written by and with the assistance of * Akim Demaille akim@freefriends.org - Maintenance. * Alexandre Duret-Lutz duret_g@epita.fr * Cedric Bail bail_c@epita.fr - Initial escaping static link computation framework. * Alexis Brouard brouar_a@epita.fr - Portability of tc-check to NetBSD. * Benoît Perrot benoit@lrde.epita.fr - Extensive documentation. - Redesign of the Task system. - Design and implementation of target handling. - Deep clean up of every single module. * Daniel Gazard gazard_d@epita.fr - Initial framework from LIR to MIPS. * Francis Maes - Generation of static C++ Tree As Types. * Pierre-Yves Strub strub_p@epita.fr - Redesign of the AST. - Design of Symbol. * Quôc Peyrot chojin@lrde.epita.fr - Initial Task framework. * Raphaël Poss r.poss@online.fr - Conversion of AST to using pointers instead of references. - Breakup between interfaces and implementations (.hh only -> .hxx, .cc) - Miscellaneous former TODO items. * Robert Anisko anisko_r@epita.fr * Sébastien Broussaud brouss_s@epita.fr - Escapes torture tests. * Stéphane Molina molina_s@epita.fr - Configuration files in tc-check. * Thierry Géraud theo@epita.fr - Initial idea for visitors. - Initial idea for tasks. - Initial implementation of AST. - Initial implementation of Tree. * Valentin David david_v@epita.fr - Some additional tests. * Yann Popo popo_y@epita.fr - Implementation of the Timer class. * Yann Régis-Gianas yann@lrde.epita.fr - Reimplementation of graphs Copyright (C) 2003 LRDE. Example 1: tc --version
$ havm --version HAVM 0.20 Written by Robert Anisko. Copyright (C) 2003 Laboratoire de Recherche et Développement de l'EPITA. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Example 2: havm --version
$ mipsy --version mipsy (Mipsy) 0.5 Written by Benoit Perrot. Copyright (C) 2003 Benoit Perrot. mipsy comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute and modify it under certain conditions; see source for details. Example 3: mipsy --version
src
Directory
src/misc
Directory
src/task
Directory
src/symbol
Directory
src/ast
Directory
src/parse
Directory
src/type
Directory
src/temp
Directory
src/tree
Directory
src/frame
Directory
src/translate
Directory
src/canon
Directory
src/assem
Directory
src/target
Directory
src/codegen
Directory
src/codegen/mips
Directory
src/codegen/ia32
Directory
src/graph
Directory
src/liveness
Directory
src/regalloc
Directory
This document presents the Tiger Project as part of the EPITA scholarship. It aims at the implementation of a Tiger compiler (see Modern Compiler Implementation) in C++.
If you are a newcomer, you might be afraid by its sheer size. Don't worry, but in any case, do not give up: as stated in the very beginning of this document:
Basically this document contains three kinds of informations:
This project is quite different from most other EPITA projects, and has aims at several different goals, in different areas:
This also means that you have to design a test suite, and maintain it
through out the project. The test suite is an integral part of
the project.
By the past, some oral and written examinations were made in English.
It may well be back some day.
The Tiger Compiler Project evolves every year, so as to improve its infrastructure, to demonstrate more instructional material and so forth. This section tries to keep a list of these changes, together with the most constructive criticisms from students (or ourselves).
If you have information, including criticisms, that should be mentioned here, please send it to me.
The years correspond to the class, e.g., Tiger 2005 refers to EPITA class 2004, i.e., the project ran from January 2003 to September 2003.
Before diving into the history of the Tiger Compiler Project in EPITA, a whole project in itself for ourselves, with experimental tries, failures etc. it might be good to review some constrains that can explain why things are the way they are. Understanding these constraints will make it easier to criticize actual flaws, instead of focusing on issues that are mandated by other factors.
Tiger is an instructional project, the purpose of which is detailed above, see Why the Tiger Project. Because the input is a stream of students with virtually no knowledge whatsoever in C++, and our target is a stream of students with good fluency in the many constructs and an understanding of more complex matters, we have to gradually transform them via intermediate forms with increasing skills. In particular this means that by the end of the project, evolved techniques can and should be used, while at the beginning only introductory facts should be needed. As a consequence, we cannot have a nice and high-tech AST for instance.
Because the insight of compilers is not the primary goal, when a choice is to be made between (i) more interesting work on compiler internals with little C++ novelty, and (ii) providing most of this work and focusing on something else, then we are most likely to select the first option. This means that the Tiger Project is doomed to be a low-tech featureless compiler, with no call graph, no default optimization, no debugging support (outputting comments in the assembly showing the original code), no bells, no whistles etc. This also implies that sometimes good and interested students will feel we "stole" the pleasure to write nice pieces of code from them; understand that we actually provided code to the other students. However, you are free to rewrite everything if you wish.
*~
, #*#
, etc.).
all
target as first running clean
and then the
actual build.
As a result I grew tired of fixing the tarballs, and in order to have a robust, efficient (albeit some piece of pain in the neck sometimes) distributions 1 we moved to using Automake, and hence Autoconf.
There are reasons not to be happy with it, agreed. But there are many more reasons to be sad without it. So Autoconf and Automake are here to stay.
Note, however, that you are free to use another system if you wish.
Just obey to the standard package interface (see Delivery).
SemantVisitor
is a nightmare to maintain
SemantVisitor
, which performs both the type checking and the
translation to intermediate code, was near to impossible to deliver in
pieces to the students: because type checking and translation were so
much intertwined, it was not possible to deliver as a first step the
type checking machinery template, and then the translation pieces.
Students had to fight with non applicable patches. This was fixed in
Tiger 2003 by splitting the SemantVisitor
into TypeVisitor
and TranslationVisitor
. The negative impact, of course, is a
performance loss.
Task
model.
EscapeVisitor
"optional" (actually it became a rush).
*.hh
files. Since then the policy wrt file
contents was defined (see File Conventions), and in Tiger 2006 was
adjusted to obey these conventions. Unfortunately, although the
improvement was significant, it was not measured precisely.
The interfaces between modules have also been cleaned to avoid excessive
inter dependencies. Also, when possible, opaque types are used to avoid
additional includes. For instance, ast/at-tasks.hh
today
includes:
namespace ast { // Forward decl. class Exp; namespace tasks { /// Global root node of abstract syntax tree. extern ast::Exp* the_program; // ... } }
where it used to include all the ast headers to define exactly
the type ast::Exp
.
There are a few mandatory requirements over the tarballs.
The naming scheme for provided tarballs is different from the scheme you
must follow (see Delivery). Our naming scheme looks like
2004-tc-2.0.tar.bz2
. If we update the tarballs, they will be
named 2004-tc-2.x.tar.bz2
. But your tarball must
be named login
-tc-2.tar.bz2
, even if you send a second
version of your project.
We also (try to) provide patches from one tarball to another. For
instance 2005-tc-1.0-2.0.diff.bz2
is the diff
erence
from 2005-tc-1.0.tar.bz2
to 2005-tc-2.0.tar.bz2
.
You are encouraged to read this file as understanding a patch is
expected from any Unix programmer. Just run bzless
2005-tc-1.0-2.0.diff.bz2
.
To apply the patch:
find
. -name '*.orig' -o -name '*.rej' | xargs rm
)
bzcat 2005-tc-1.0-2.0.diff.bz2 | patch -p1
find . -name '*.rej
) and fix them by
hand once you understood why the patch did not apply
You might need to repeat the process to jump from a version x to x + 2 via version x + 1.
The code you deliver must be clean. In particular, when some
code is provided, and you have to fill in the blanks denoted by
FIXME: Some code has been deleted.
. Sometimes you will have to
write the code from scratch.
In any case, dead code and dead comments must be removed. You
are free to leave comments spotting places where you fixed a
FIXME:
, but never leave a fixed FIXME:
in your code. Nor
any irrelevant comment.
The official compiler for this project, is GNU C++ Compiler, 3.2 or higher (see GCC).
dynamic_cast
of references
const IntExp &ie = dynamic_cast <const IntExp &> (exp); int val = ie.value_get ();
const IntExp *iep = dynamic_cast <const IntExp &> (exp); assert (iep); int val = iep->value_get ();
While upon type mismatch the second abort
s, the first throws a
std::bad_cast
: they are equally safe.
ESn refers to item n in Effective STL (see Bibliography).
typedef set::set<const Temp *> temp_set_t;
declare
/** Object function to compare two Temp*. */ struct temp_compare : public binary_function<const Temp *, const Temp*, bool> { bool operator() (const Temp *s1, const Temp *s2) const { return *s1 < *s2; } }; typedef set::set<const Temp *, temp_compare> temp_set_t;
Scott Meyers mentions several good reasons, but leaves implicit a very
important one: if you don't, since the outputs will be based on the
order of the pointers in memory, and since (i) this order may change if
your allocation pattern changes and (ii) this order depends of the
environment you run, then you cannot compare outputs (including
traces). Needless to say that, at least during development, this is a
serious misfeature.
for_each
, find
, find_if
, transform
etc. is preferred over explicit loops. This is for (i) efficiency, (ii)
correctness, and (iii) maintainability. Knowing these algorithms is
mandatory for who claims to be a C++ programmer.
my_set.find (my_item)
to find
(my_item, my_set.begin (), my_set.end ())
. This is for efficiency: the
former has a logarithmic complexity, versus... linear for the latter!
You may find the
Item 44 of Effective STL on the Internet.
There are some strict conventions to obey wrt the files and their contents.
*.hh
*.hh
should contain only declarations, i.e., prototypes,
extern
for variables etc. Inlined short methods are accepted
when there are few of them, otherwise, create an *.hxx
file.
The documentation should be here too.
There is no good reason for huge objects to be defined here.
*.hxx
*.hh
file, and implement them in the
*.hxx
file.
*.cc
*.cc
file corresponding to
the declaration/documentation file *.hh
.
lib*.hh
and lib*.cc
are pure
*-tasks.hh
and *-tasks.cc
are impure
The following items are more a matter of style than the others. Nevertheless, you are asked to follow this style.
\directive
\
) to the commercial at (@
) to specify
directives.
/** ... */
) to C++ comments (/// ...
).
This is to ensure consistency with the style we use.
/** \brief Name of this program. */ extern const char *program_name;
prefer
/// Name of this program. extern const char *program_name;
For instance, instead of
/* Construct an InterferenceGraph. */ InterferenceGraph (const std::string &name, const assem::instrs_t& instrs, bool trace = false);
or
/** @brief Construct an InterferenceGraph. ** @param name its name, hopefully based on the function name ** @param instrs the code snippet to study ** @param trace trace flag **/ InterferenceGraph (const std::string &name, const assem::instrs_t& instrs, bool trace = false);
or
/// \brief Construct an InterferenceGraph. /// \param name its name, hopefully based on the function name /// \param instrs the code snippet to study /// \param trace trace flag InterferenceGraph (const std::string &name, const assem::instrs_t& instrs, bool trace = false);
write
/** \brief Construct an InterferenceGraph. \param name its name, hopefully based on the function name \param instrs the code snippet to study \param trace trace flag */ InterferenceGraph (const std::string &name, const assem::instrs_t& instrs, bool trace = false);
Of course, Doxygen documentation is not appropriate everywhere.
foo_get
, not get_foo
foo_get
and foo_set
.
Each group must provide a tarball, made via make distcheck
. All
the information about the delivery per se is given on the
Yaka's Delivery Page.
If bardec_f
is the head of your group, the tarball must be
bardec_f-tc-
n.tar.bz2
where n is the number of the
"release" (see Package Name and Version). The following commands
must work properly:
$ bunzip2 -cd bardec_f-tc-n.tar.bz2 | tar xvf - $ cd bardec_f-tc-n $ export CC=gcc-3.2 $ export CXX=g++-3.2 $ ./configure $ make $ cd src $ ./tc /tmp/test.tig $ cd .. $ make distcheck
For more information on the tools, see The GNU Build System, GCC.
Your tarball must be done via make distcheck
(see Making a Tarball). Any tarball which is not built thanks to make
distcheck
(this is easy to see: they include files we don't want, and
don't contain some files we need...) will be penalized with at least
### tarball_not_clean
.
This section describes the mandatory layout of the tarball.
AUTHORS
AUTHORS
which contents is as follows:
Fabrice Bardèche <bardec_f@epita.fr> Jean-Paul Sartre <sartre_j@epita.fr> Jean-Paul Deux <deux_j@epita.fr> Jean-Paul Belmondo <belmon_j@epita.fr>
The group leader is the first in the list. Do not include emails other
than those of EPITA. I repeat: give the 6_1@epita.fr
address.
Note that the file AUTHORS
is automatically distributed, but pay
attention to the spelling.
ChangeLog
C-x 4 a
.
README
argp/
src/
tests/
src
Directory
common.hh (src/) | File |
Used throughout the project. |
tc (src/) | File |
Your compiler. |
tc.cc (src/) | File |
Main entry. Called, the driver. |
src/misc
Directory
Convenient C++ routines.
contract.hh (src/misc/) | File |
A useful improvement over cassert .
|
escape.hh (src/misc/) | File |
This file implements a means to output string while escaping non
printable characters. An example:
cout << "escape (\"\111\") = " << escape ("\"\111\"") << endl; Understanding how |
set.hh (src/misc/) | File |
A wrapper around std::set that introduce convenient operators
(operator+ and so forth).
|
timer.hh (src/misc/) | File |
timer.cc (src/misc/) | File |
A class that makes it possible to have timings of the compilation
process, as when using --time-report with gcc , or
--report=time with bison . It is used in the
Task machinery, but can be used to provide better timings (e.g.,
separating the scanner from the parser).
|
src/task
Directory
No namespace for the time being, but it should be task
.
Delivered for T1. A generic scheme to handle the components of our
compiler, and their dependencies.
src/symbol
Directory
Namespace symbol
, delivered for T1 or T2.
symbol.hh (src/symbol/) | File |
The handling of the symbols. |
table.hh (src/symbol/) | File |
The handling of generic symbol tables, i.e., it is independent of functions, types and variables. |
src/ast
Directory
Namespace ast
, delivered for T2. Implementation of the abstract
syntax tree. The file ast/README
gives an overview of the
involved class hierarchy.
location.hh (src/ast/) | File |
position.hh (src/ast/) | File |
These files are now simply forwarding the definitions of
yy::Position and yy::Location as provided by Bison.
|
visitor.hh (src/ast/) | File |
Abstract base class of the compiler's visitor hierarchy. Actually, it
defines a class template GenVisitor , which expects an argument
which can be either non_const_kind or const_kind . This
allows to define to parallel hierarchies: ConstVisitor and
Visitor , similar to iterator and const_iterator .
The understanding of the template programming used is not required at this stage as it is quite delicate, and goes far beyond your (average) current understanding of templates. |
default-visitor.hh (src/ast/) | File |
Implementation of the DefaultVisitor class, which walks the
abstract syntax tree, doing nothing. It is mainly used as a basis for
deriving other visitors. Actually, just as above, there is a template,
so that we have two different default visitors:
DefaultVisitor<const_kind> and
DefaultVisitor<non_const_kind> .
|
print-visitor.hh (src/ast/) | File |
Implementation of the PrintVisitor class, which performs
pretty-printing in the tiger compiler.
|
src/parse
Directory
Namespace parse
. Delivered during T1.
scantiger.ll (src/parse/) | File |
The scanner. |
parsetiger.yy (src/parse/) | File |
The parser. |
position.hh (src/ast/) | File |
Keeping track of a point (cursor) in a file. |
location.hh (src/ast/) | File |
Keeping track of a range (two cursors) in a (or two) file. |
libparse.hh (src/ast/) | File |
which prototypes what tc.cc needs to know about the module
parse .
|
src/type
Directory
Namespace type
. Type checking.
libtype.hh (src/type/) | File |
The interface of the Type module. It exports a single procedure,
type_check .
|
types.hh (src/type/) | File |
The definition of all the types. You are free to use whatever layout
you wish (several files); we have a single types.hh file.
|
type-entry.hh (src/type/) | File |
Definitions of type::TypeEntry , type::VarEntry , and
type::FunEntry , used in type::TypeEnv to associate data to
types, variables, and functions (obviously).
|
type-env.hh (src/type/) | File |
The types environment, comprising three symbol tables: types, functions,
and variables, used by the type::TypeVisitor .
|
src/temp
Directory
Namespace temp
, delivered for T5.
temp.hh (src/temp/) | File |
So called temporaries are pseudo-registers: we may allocate as many temporaries as we want. Eventually the register allocator will map those temporaries to either an actual register, or it will allocate a slot in the activation block (aka frame) of the current function. |
label.hh (src/temp/) | File |
We need labels for jump s, for functions, strings etc.
|
src/tree
Directory
Namespace tree
, delivered for T5. The implementation of the
intermediate representation. The file tree/README
should give
enough explanations to understand how it works.
Reading the corresponding explanations in Appel's book is mandatory.
It is worth noting that contrary to A. Appel, just as we did for
ast
, we use n-ary structures. For instance, where Appel uses a
binary seq
, we have an n-ary seq
which allows us to put as
many statements as we want.
To avoid gratuitous name clashes, what Appel denotes exp
is
denoted sxp
(Statement Expression), implemented in
translate::Sxp
.
Please, pay extra attention to the fact that there are temp::Temp
used to create unique temporaries (similar to symbol::Symbol
),
and tree::Temp
which is the intermediate representation
instruction denoting a temporary (hence a tree::Temp
needs a
temp::Temp
). Similarly, on the one hand, there is
temp::Label
which is used to create unique labels, and on the
other hand there are tree::Label
which is the IR statement to
define to a label, and tree::Name
used to refer to
a label (typically, a tree::Jump
needs a tree::Name
which
in turn needs a temp::Label
).
src/frame
Directory
Namespace frame
, delivered for T5.
access.hh (src/frame/) | File |
access.cc (src/frame/) | File |
An Access is a location of a variable: on the stack, or in a
temporary.
|
frame.hh (src/frame/) | File |
frame.cc (src/frame/) | File |
A Frame knows only what are the "variables" it contains.
|
src/translate
Directory
Namespace translate
. Translation to intermediate code
translation. It includes:
libtranslate.hh (src/translate/) | File |
The interface. |
libtranslate.cc (src/translate/) | File |
The compiled module. |
fragment.hh (src/translate/) | File |
It implements translate::Fragment , an abstract class,
translate::DataFrag to store the literal strings, and
translate::ProcFrag to store the routines.
|
access.hh (src/translate/) | File |
access.cc (src/translate/) | File |
Static link aware versions of level::Access .
|
level.hh (src/translate/) | File |
level.cc (src/translate/) | File |
translate::Level are wrappers frame::Frame that support
the static links, so that we can find an access to the variables of the
"parent function".
|
exp.hh (src/translate/) | File |
Implementation of translate::Ex (expressions), Nx
(instructions), Cx (conditions), and Ix (if )
shells. They wrap tree::Node to delay their translation until
the actual use is known.
|
level-entry.hh (src/translate/) | File |
All the information that the environment must keep about variables and functions. |
level-env.hh (src/translate/) | File |
The levels environment, containing LevelVarEntry 's and
LevelFunEntry 's. We don't need to store information related to
types here.
|
translation.hh (src/translate/) | File |
functions used by the translate::TranslateVisitor to translate
the AST into HIR. For instance, it contains
Exp *simpleVar (const Access &access, const Level &level) ,
Exp *callExp (const temp::Label &label, std::list<Exp *> args)
etc. which are routines that produce some Tree::Exp . They handle
all the unCx etc. magic.
|
translate-visitor.hh (src/translate/) | File |
Implements the class TranslateVisitor which performs the IR
generation thanks to translation.hh . It must not be polluted
with translation details: it is only coordinating the AST traversal with
the invocation of translation routines. For instance, here is the
translation of a ast::SimpleVar :
virtual void visit (const SimpleVar& e) { const Access &access = _env.var_access_get (e.name_get ()); _exp = simpleVar (access, *_level); } |
src/canon
Directory
Namespace tree
.
src/assem
Directory
Namespace assem
, delivered for T7.
This directory contains the implementation of the Assem language: yet another intermediate representation that aims at encoding an assembly language, plus a few need features so that register allocation can be performed afterwards. Given in full.
instr.hh (src/assem/) | File |
move.hh (src/assem/) | File |
oper.hh (src/assem/) | File |
label.hh (src/assem/) | File |
Implementation of the basic types of assembly instructions. |
fragment.hh (src/assem/) | File |
fragment.cc (src/assem/) | File |
Implementation of assem::Fragment , assem::ProcFrag , and
assem::DataFrag . They are comparable to
translate::Fragment : aggregate some informations that must remain
together, such as a frame::Frame and the instructions (a list of
assem::Instr ).
|
visitor.hh (src/assem/) | File |
The root of assembler visitors. |
layout.hh (src/assem/) | File |
A pretty printing visitor for assem::Fragment .
|
libassem.hh (src/assem/) | File |
libassem.cc (src/assem/) | File |
The interface of the module, and its implementation. |
src/target
Directory
Namespace target
, delivered for T7. Some data on the back end.
Given in full.
cpu.hh (src/target/) | File |
Description of a CPU: everything about its registers, and its word size. |
target.hh (src/target/) | File |
Description of a target (language): its CPU, its assembly
(codegen::Assembly ), and it translator (codegen::Codegen ).
|
mips-cpu.hh (src/target/) | File |
mips-target.hh (src/target/) | File |
The description of the MIPS (actually, SPIM/Mipsy) target. |
ia32-cpu.hh (src/target/) | File |
ia32-target.hh (src/target/) | File |
Description of the i386. This is not part of the project, it is left only as an incomplete source of inspiration. |
target-tasks.cc (src/target/) | File |
target-tasks.hh (src/target/) | File |
The command line interface to specify the target architecture. |
src/codegen
Directory
Namespace codegen
, delivered for T7.
mips (src/codegen/) | File |
ia32 (src/codegen/) | File |
The instruction selection per se split into a generic part, and a target specific (MIPS and IA32) part. See src/codegen/mips, and src/codegen/ia32. |
assembly.hh (src/codegen/) | File |
The abstract class codegen::Assembly which is the interface for
elementary assembly instructions generation.
|
codegen.hh (src/codegen/) | File |
The abstract class codegen::Codegen which is the interface for
all our back ends.
|
libcodegen.hh (src/codegen/) | File |
libcodegen.cc (src/codegen/) | File |
Converting translate::Fragment s into assem::Fragment s.
|
codegen-tasks.hh (src/codegen/) | File |
codegen-tasks.cc (src/codegen/) | File |
Command line interface. |
tiger-runtime.c (src/codegen/) | File |
This is the Tiger runtime, written in C, based on
Andrew Appel's runtime.c . The actual runtime.s file
for MIPS was written by hand, but the ia32 was a
compiled version of this file. It should be noted that:
|
src/codegen/mips
Directory
Namespace codegen::mips
, delivered for T7. Code generation for
MIPS R2000.
runtime.s (src/codegen/mips/) | File |
runtime.cc (src/codegen/mips/) | File |
The Tiger runtime in MIPS assembly language: print etc.
The C++ file runtime.cc is built from runtime.s : do not
edit the former. See src/codegen, tiger-runtime .
|
spim-assembly.hh (src/codegen/mips/) | File |
spim-assembly.cc (src/codegen/mips/) | File |
Our assembly language (syntax, opcodes and layout); it abstracts the
generation of MIPS 2000 instructions.
codegen::mips::SpimAssembly derives from
codegen::Assembly .
|
codegen.hh (src/codegen/mips/) | File |
codegen.cc (src/codegen/mips/) | File |
Our real and only back end: a translator from LIR to
ASSEM using the MIPS 2000 instruction set defined
by codegen::mips::SpimAssembly . It is implemented as a maximal munch.
codegen::mips::Codegen derives from codegen::Codegen .
|
spim-layout.hh (src/codegen/mips/) | File |
spim-layout.cc (src/codegen/mips/) | File |
How MIPS (and SPIM/Mipsy) fragments are to be displayed. In other words, that's where the (global) syntax of the target assembly file is selected. |
src/codegen/ia32
Directory
Namespace codegen::ia32
, delivered for T7. Code generation for
IA32. This is not part of the student project, but it is left
to satisfy their curiosity. In addition its presence is a sane
invitation to respect the constraints of a multi-back-end compiler.
runtime.s (src/codegen/ia32/) | File |
runtime.cc (src/codegen/ia32/) | File |
The Tiger runtime in IA32 assembly language: print etc.
The C++ file runtime.cc is built from runtime.s : do not
edit the former. See src/codegen, tiger-runtime .
|
gas-assembly.hh (src/codegen/ia32/) | File |
gas-assembly.cc (src/codegen/ia32/) | File |
Our assembly language (syntax, opcodes and layout); it abstracts the
generation of IA32 instructions using Gas' syntax.
codegen::ia32::GasAssembly derives from
codegen::Assembly .
|
codegen.hh (src/codegen/ia32/) | File |
codegen.cc (src/codegen/ia32/) | File |
The IA32 back-end: a translator from LIR to
ASSEM using the IA32 instruction set defined by
codegen::ia32::GasAssembly . It is implemented as a maximal munch.
codegen::ia32::Codegen derives from codegen::Codegen .
|
gas-layout.hh (src/codegen/ia32/) | File |
gas-layout.cc (src/codegen/ia32/) | File |
How IA32 fragments are to be displayed. In other words, that's where the (global) syntax of the target assembly file is selected. |
src/graph
Directory
Namespace graph
, a generic implementation of graphs. Delivered
for T7.
graph.hh (src/graph/) | File |
graph.hxx (src/graph/) | File |
Oriented and undirected graphs. |
handler.hh (src/graph/) | File |
handler.hxx (src/graph/) | File |
Abstractions/indirections for graph nodes and edges. |
iterator.hh (src/graph/) | File |
iterator.hxx (src/graph/) | File |
Iterating over nodes and edges of graphs. |
test-graph.cc (src/graph/) | File |
Exercising this nodule. |
src/liveness
Directory
Namespace liveness
, delivered for T8.
flowgraph.hh (src/liveness/) | File |
FlowGraph implementation.
|
test-flowgraph.cc (src/liveness/) | File |
FlowGraph test.
|
liveness.hh (src/liveness/) | File |
liveness.cc (src/liveness/) | File |
Computing the live-in and live-out information from the
FlowGraph .
|
interference-graph.hh (src/liveness/) | File |
interference-graph.cc (src/liveness/) | File |
Computing the InterferenceGraph from the live-in/live-out
information.
|
src/regalloc
Directory
Namespace regalloc
, register allocation, delivered for T9.
color.hh (src/regalloc/) | File |
Coloring an interference graph. |
regallocator.hh (src/regalloc/) | File |
Repeating the coloration until it succeeds (no spills). |
libregalloc.hh (src/regalloc/) | File |
libregalloc.cc (src/regalloc/) | File |
Removing useless move s once the register allocation performed,
and allocating the register for fragments.
|
test-regalloc.cc (src/regalloc/) | File |
Exercising this. |
regalloc-tasks.hh (src/regalloc/) | File |
regalloc-tasks.cc (src/regalloc/) | File |
Command line interface. |
We provide a few test cases: you must write your own tests. Writing tests is part of the project. Do not just copy test cases from other groups, as you will not understand why they were written.
The initial test suite is available for download at
tests.tgz
. It contains the following directories:
good
scan
parse
type
Some stages are evaluated only by a program, and others are evaluated both by humans, and a program.
Each stage of the compiler will be evaluated by an automatic corrector.
As soon as the tarball are delivered, the logs are available on
http://www.lrde.epita.fr/~akim/compil
, in the directory corresponding to your class and stage.
For instance, 2004 students ought to read
http://www.lrde.epita.fr/~akim/compil/2004/4/bardec_f-tc-4.log
.
We stress that automated evaluation enforces the requirements: you must stick to what is being asked. For instance, for T3 it is explicitly asked to display something like:
var /* escaping */ i : int := 2
so if you display any of the following outputs
var i : int /* escaping */ := 2 var i /* escaping */ : int := 2 var /* Escapes */ i : int := 2
be sure to fail all the tests, even if the computation is correct.
If you find some unexpected errors (your project does compile with the
reference compiler, some files are missing, your output is slightly
incorrect etc.) immediately send a new tarball to
yaka@epita.fr with [Tiger]
as prefix of the subject.
This corresponds to ### patch
.
Do not wait for the final marks to be computed, this is extremely irritating, and doomed to failure. You must understand that (i) you increase our workload, and (ii) anyway this is the wrong approach, the Tiger Compiler is a big project which must be continuously improved.
If, anyway, you send a tarball to fix your problems long after the
initial date, you will be flagged as ### super_late
, which impact
on the mark is quite bad...
When you are defending your projects, here are a few rules to follow:
Conversely, there is something I wish to make clear: I, Akim, and the other examiners, will probably be harsh (maybe even very harsh), but this does not mean I disrespect you, or judge you badly.
You are here to defend your project and knowledge, I'm here to stress them, to make sure they are right. Learning to be strong under pressure is part of the exercise. Don't burst into tears, react! Don't be shy, that's not the proper time: you are selling me something, and I will never buy something from someone who cries when I'm criticizing his product.
You should also understand that human examination is the moment where we try to evaluate who, or what group, needs help. We are here to diagnose your project and provide solutions to your problems. If you know there is a problem in your project, but you failed to fix it, tell it to the examiner! Work with her/him to fix your project.
The point of this evaluation is to measure:
stderr
is forbidden in proper C++ code) and so
forth. It also aims at detecting cheaters, who will be severely
punished (mark = -42).
In results in an evaluation file
(login
-tc-
stage.eval
, e.g.,
bardec_f-tc-2.eval
), structured as follows.
The format is free, except some lines starting with specific markers,
when they are at the first column. The first marker is *
,
used to denote the login of the group head:
* bardec_f ---------------------------------------- 2002-03-07: T2 par akim tc-check: Summary: tc-check: == Testing ./tc --parse-trace -l --ast-display == Successes: 177/177 Suppléments: - --stdin pour parser stdin. - Le parser fait attention à ses leaks, même sur les Symbol. - --gcc-ast - ChangeLog - Ne meurt pas quand on lui donne un répertoire en entrée. Le code est joli, et en 80 col ! Plutôt que des listes vides, passent 0.
As you can see, the person who evaluated is allowed to put whatever
comment comes to her mind. Specific features, specific failures,
explanations to help the students to fix their project should be
included. Each time an entry is made, there should be the date, and who
wrote it (2002-03-07: T2 par akim
).
Then, the mark of the project, aka, the note of the group:
### Note T2 = 20
and optionally:
### late
### super_late
### patch
### super_patch
### not_compile
### tarball_not_clean
### cheater
### reevaluation
### bonus
Important note to the examiners:Note
field.The examiner should not take (too much) the automated tests into account to decide the mark. This is because the mark is computed later, taking this into account, so don't do it twice. Similarly, a very nice project, which is super late, shall be flagged as super late, but should have a good
## Note
entry.
Important note to the examiners: broken tarballs.If you fixed the tarball (
### not_compile
, or### tarball_not_clean
or whatever modification, you must runmake distcheck
again, and replace the tarball they delivered with the new one. Do not keep the old tarball, do not install it in a special place: just replace the first tarball with it, but say so in theeval
file.The rationale is simple: only tarballs pass the tests, and every tarball must be able to pass the tests. If you don't do that, then someone else will have to do it again.
The next bits is the evaluation of each member of the group:
---------------------------------------- ## bardec_f:100:120 Parser, locations, scanner. A pratiquement e'crit tout T2. ##sartre_j:100:40 AST, Visitor Pourtant ne comprend rien au Visitor. Se fout de ma gueule. ##deux_j:80:80 Pas de problème C++. FIXME: Sa note de T1 est à revoir (resoutenance). ##belmon_j:100:100 Actions, les delete dans le parser. A implémenté la sortie en AST compatible avec GCC.
The formalism is ##
login:
bonus1:
bonus2. If
the stage is T2, then bonus1 refers to a bonus on T1, and
bonus2 on T2. For T4, it covers T3 and T4, and so forth.
The values can be "negative" (80
, 60
, 40
,
0
), or positive if a member deserves it (100
, 120
,
or even more if justifiable).
There should be information about who did what.
Because the Tiger Compiler is a project with stages, the computation of the marks depends on the stages too. To spell it out explicitly:
A stage is penalized by bad results on tests performed for previous stages.
It means, for instance, that a T3 compiler will be exercised on T1, T2, and T3. If there are still errors on T1 and T2 tests, they will pessimize the result of T3 tests. The older the errors are, the more expensive they are.
As an example, here are the formulas to compute the global success rate of T3 and T5:
global-rate-T3 := rate-T3 * (+ 2 * rate-T1 + 1 * rate-T2) / 3 global-rate-T5 := rate-T5 * (+ 4 * rate-T1 + 3 * rate-T2 + 2 * rate-T3 + 1 * rate-T4) / 10
Because a project which fail half of the time is not a project that deserves half of 20, the global-rate is elevated to 1.7 before computing the mark:
mark-T3 := roundup (power (global-rate-T3, 1.7) * 20 - malus-T3, 1)
where roundup (
x, 1)
is x rounded up to one decimal
(roundup (15, 1) = 15
, roundup (15.01, 1) = 15.1
).
When the project is also evaluated by a human, power
is not
used. Rather, the success rate modifies the mark given by the examiner:
mark-T2 := roundup (eval-T2 * global-rate-T2 - malus-T2, 1)
It happens that some groups have big problems with their project. Here is how to solve these issues.
But first of all, you must know that cheating, stealing another group's project, is not a solution, because:
### cheater
).
Because the Tiger Compiler is a long project, you continuously have to improve it, but marks starting from 8 to higher are definitive. If your mark is less than 8, can be re-examined: you must provide a better tarball, and in the case of human evaluated stages, another audition will be made.
This new audition will be flagged as ### reevaluation
. It cannot
be better than 12/20, even if your project is excellent: a good project
out of date cannot be judged better than a reasonable good project on
time.
Pay attention that when providing an updated tarball for reexamination
of stage n, it must follow the naming scheme of stage n,
even if the contents goes further. For instance, it is perfectly valid
to ask for a T2 reevaluation with a compiler that implements T4, but the
tarball must be bardec_f-tc-2.tar.bz2
.
You have to ask for a reevaluation, we will not look after you.
The compiler will be written in several steps, described below.
This section has been updated for EPITA-2005.
T0 is a weak form of T1: the scanner and the parser are written, but there is a set of simplifications:
int main (int argc, const char *argc[]) { assert (argc == 1); yyin = fopen (argv[1]); assert (yyin); return !!yyparse (); }
i.e., there is no support for options at all.
SCAN
, PARSE
SCAN
and
PARSE
. I.e., running
PARSE=1 ./tc foo.tig
will set yydebug
to 1, which causes the traces of the parsing to
be displayed.
YYPRINT
YYPRINT
support.
yylval
symbol::Symbol
is to be implemented in T1.
Things to learn during this stage that you should remember:
You must write
scantiger.ll
parsetiger.yy
main
if you wish.
tc.cc
main
, in this file.
Putting it into parsetiger.yy
is OK in T0.
Makefile
make
must build the
binary tc
.
The requirements on the tarball are the same as usual, see Tarballs.
Scanner and parser are properly running, but the abstract syntax tree is not built yet. Differences with T0 include:
Things to learn during this stage that you should remember:
string
class
The only information the compiler will give is about lexical and syntax errors.
If there are no errors, the compiler shuts up, and exits successfully:
/* an array type and an array variable */
let
type arrtype = array of int
var arr1 : arrtype := arrtype [10] of 0
in
arr1[2]
end
File 4: test01.tig
$ tc test01.tig Example 5: tc test01.tig
If there are lexical errors, the exit status is 2, and a an error message is output on the standard error output. Note that its format is standard: file, (precise) location, and then the message.
1
/* This comments starts at /* 2.2 */
File 6: unterminated-comment.tig
$ tc unterminated-comment.tig error-->unterminated-comment.tig:2.1-3.0: unexpected end of file in a comment =>2 Example 7: tc unterminated-comment.tig
If there are syntax errors, the exit status is set to 3:
let var a : nil := ()
in
1
end
File 8: type-nil.tig
$ tc type-nil.tig error-->type-nil.tig:1.12-14: syntax error, unexpected "nil", expecting "identifier" error-->Parsing Failed =>3 Example 9: tc type-nil.tig
If there are errors which are non lexical, nor syntactic (Windows will not pass by me):
$ tc C:/TIGER/SAMPLE.TIG error-->tc: cannot open `C:/TIGER/SAMPLE.TIG': No such file or directory =>1 Example 10: tc C:/TIGER/SAMPLE.TIG
The option --parse-trace
, which relies on Bison's %debug
directive, and the use of YYPRINT
, must work properly:
$ cat foo.tig a + "a" $ ./tc --parse-trace foo.tig Starting parse Entering state 0 Reading a token: Next token is 258 (ID a) Shifting token 258 (ID), Entering state 2 Reading a token: Next token is 270 (PLUS) Reducing via rule 74 (line 318), ID -> varid state stack now 0 Entering state 16 Reducing via rule 34 (line 196), varid -> lvalue state stack now 0 Entering state 13 Next token is 270 (PLUS) Reducing via rule 33 (line 192), lvalue -> exp state stack now 0 Entering state 12 Next token is 270 (PLUS) Shifting token 270 (PLUS), Entering state 30 Reading a token: Next token is 257 (STRING a) Shifting token 257 (STRING), Entering state 1 Reducing via rule 15 (line 151), STRING -> exp state stack now 0 12 30 Entering state 65 Reading a token: Now at end of input. Reducing via rule 27 (line 182), exp PLUS exp -> exp state stack now 0 Entering state 12 Now at end of input. Reducing via rule 1 (line 106), exp -> program state stack now 0 Entering state 159 Now at end of input. Shifting token 0 ($), Entering state 160 Now at end of input. $ echo $? =>0
Note that (i), it cannot see that the variable is not declared nor that
there is a type checking error, since type checking... is not
implemented, and (ii), the output might be slightly different, depending
upon the version of Bison you use. But what matters is that one can see
the items: ID a
, STRING a
.
Some code is provided: 2005-tc-1.0.tar.bz2
. See The Top Level, src, src/parse, src/misc.
Be sure to read Flex and Bison documentations and tutorials, see Flex & Bison.
src/parse/scantiger.ll
std::string
. See the following code for the basics.
... \" yylval->str = new std::string (); BEGIN STATE_STRING; <STATE_STRING>{ /* Handling of the strings. Initial " is eaten. */ \" { BEGIN INITIAL; return STRING; } ... \\x[0-9a-fA-F]{2} { yylval->str->append (1, strtol (yytext + 2, 0, 16)); } ... }
The locations are tracked.
src/parse/parsetiger.yy
--parse-trace
(see T1 Samples). Pay
special attention to the display of strings and identifiers.
Bison will certainly complain because of a type clash for some actions.
For instance, if you have given a type to STRING
, but none to
exp
, then it will choke on:
exp: STRING;
because it actually means
exp: STRING { $$ = $1; };
which is not type coherent. So write this instead:
exp: STRING {};
src/ast/position.hh
ast::Position
is completed.
src/ast/location.hh
ast::Location
must be completed.
src/symbol/symbol.hxx
symbol::Symbol
keeps a single copy of identifiers, so
that (i) we save space, and (ii) symbol comparison is fast. The file
src/symbol/symbol.hh
describes the interface of the class
symbol::Symbol
, but the implementation is to be written in
src/symbol/symbol.hxx
.
./tc foo.tig
-A
, it considered -A
was a file, not an option. This bug is now
fixed. The patch is from Niels Möller:
diff -u -r1.16 -r1.17 --- argp-parse.c 18 Feb 2001 22:40:03 -0000 1.16 +++ argp-parse.c 4 Feb 2003 19:52:30 -0000 1.17 @ -1021,6 +1021,8 @ *arg_ebadkey = 1; if (parser->first_nonopt != parser->last_nonopt) { + exchange(parser); + /* Start processing the arguments we skipped previously. */ parser->state.next = parser->first_nonopt;
Go into argp/
, run patch -p0
, paste the patch, type
<CTRL>-d.
This section was last updated for EPITA-2005 on 2003-02-25.
Things to learn during this stage that you should remember:
virtual
Here are a few examples of expected features.
The parser builds abstract syntax trees that can be output by a pretty-printing module:
/* define a recursive function */
let
/* calculate n! */
function fact (n : int) : int =
if n = 0
then 1
else n * fact (n - 1)
in
fact (10)
end
File 11: simple-fact.tig
$ tc -A simple-fact.tig /* == Abstract Syntax Tree. == */ let function fact (n : int) : int = if (n = 0) then 1 else (n * fact ((n - 1))) in fact (10) end Example 12: tc -A simple-fact.tig
The output from your pretty-printer must be valid Tiger code, and be equivalent to the input.
By valid, we mean that any Tiger compiler must be able to parse
with success your output. Pay attention to the banners such as ==
Abstract...
: you should use comments: /* == Abstract... */
. Pay
attention to special characters too.
print ("\"\x45\x50ITA\n\"")
File 13: string-escapes.tig
$ tc -A string-escapes.tig /* == Abstract Syntax Tree. == */ print ("\"EPITA\n\"") Example 14: tc -A string-escapes.tig
By equivalent, we mean that except for syntactic sugar, the output
and the input are equal. Syntactic sugar refers to &
, |
,
unary -
, and in some cases if then
:
1 = 1 & 2 = 2
File 15: 1s-and-2s.tig
$ tc -A 1s-and-2s.tig /* == Abstract Syntax Tree. == */ if (1 = 1) then (2 = 2) else 0 Example 16: tc -A 1s-and-2s.tig
$ tc -A 1s-and-2s.tig >output.tig Example 17: tc -A 1s-and-2s.tig >output.tig
$ tc -A output.tig /* == Abstract Syntax Tree. == */ if (1 = 1) then (2 = 2) else 0 Example 18: tc -A output.tig
For loops must be properly displayed, i.e., although we use a
ast::VarDec
for the index of the loop, you must not display
var
:
/* valid let and for */
let
var a := 0
in
for i := 0 to 100 do (a := a+1; ())
end
File 19: for-loop.tig
$ tc -A for-loop.tig /* == Abstract Syntax Tree. == */ let var a := 0 in for i := 0 to 100 do ( a := (a + 1); () ) end Example 20: tc -A for-loop.tig
Notice too that parentheses must not stack for free. In fact, you must even remove them.
% cat parens.tig (((0))) % ./tc -A parens.tig /* == Abstract Syntax Tree. == */ 0
As a result, anything output by tc -A
is equal to what
tc -A | tc -A -
displays!
Another part of T2 is the improvement of your parser: it must be robust to some forms of errors. Observe that on the following input:
(
1;
(2, 3);
(4, 5);
6
)
File 21: multiple-parse-errors.tig
several parse errors are reported, not merely the first one:
$ tc multiple-parse-errors.tig error-->multiple-parse-errors.tig:3.4: syntax error, unexpected ",", expecting ";" error-->multiple-parse-errors.tig:4.4: syntax error, unexpected ",", expecting ";" =>3 Example 22: tc multiple-parse-errors.tig
Of course, the exit status still reveals the parse error. Be sure that your error recovery does not break the rest of the compiler...
$ tc -A multiple-parse-errors.tig error-->multiple-parse-errors.tig:3.4: syntax error, unexpected ",", expecting ";" error-->multiple-parse-errors.tig:4.4: syntax error, unexpected ",", expecting ";" /* == Abstract Syntax Tree. == */ ( 1; (); (); 6 ) =>3 Example 23: tc -A multiple-parse-errors.tig
Some code is provided: 2004-tc-2.0.tar.bz2
. See src/misc,
src/symbol, and src/ast.
What is to be done:
src/symbol.*
src/parse/scantiger.ll
symbol::Symbol
instead of
std::string
for identifiers. Of course, the parser must be
adjusted too.
The scanner must be updated to keep track of locations of tokens in Tiger
programs. To adjust your scanner, you are strongly encouraged to use
YY_USER_ACTION
, and also the yylex
prologue:
...
%%
%{
// Everything here is run each time yylex
is invoked.
%}
"if" return IF;
...
%%
...
Have a look at the scanner and parser chapters of
this draft.
src/parse/parsetiger.yy
error
. Read the Bison documentation about it.
The grammar must be changed to process declarations by chunks. In Tiger, the following program is invalid:
let function foo () = () function foo () = () var foo := 0 in () end
while the following code is valid:
let function foo () = () var foo := 0 function foo () = () in () end
this is because declarations are cut in "chunks" of declarations of the same kind. In the first example, there are two chunks: one chunk of two function declarations and one variable declaration. In the second example: there are three chunks: one chunk containing only one function declaration, one with one variable declaration, and the third one, with a single function declaration again.
The rule is "a chunk cannot define twice the same name".
In order to implement this easily, you must adjust your grammar so that
declarations are parsed by chunks. Pay special attention to the
implementation of ast::FunctionDecs
, ast::VarDecs
, and
ast::TypeDecs
(which are all implemented thanks to
ast::AnyDecs
): they are these chunks of declaration. Therefore,
an ast::LetExp
uses a list of chunks.
src/ast
FIXME:
anywhere in the code we gave.
Several files are missing (fieldvar.hh
nilexp.hh
intexp.hh
stringexp.hh
callexp.hh
assignexp.hh
whileexp.hh
breakexp.hh
arrayexp.hh
). See src/ast/README
for additional
information on the missing classes.
src/ast/default-visitor.hh
DefaultVisitor
class must be completed, and must be able to
walk whole abstract syntax trees. Do not forget that your
DefaultVisitor
must be a sound basis for your further work on the
Tiger compiler.
src/ast/print-visitor.hh
bison
escapes_get
, etc.)
#if 0
/#endif
.
kind_get
, etc.)
_kind
,
kind_get
and so forth. These are to be used only in T5, you
don't have to complete them now.
This section was updated for Tiger 2004. The project will be taken on Friday, March 15th, at noon.
At the end of this stage, the compiler must be able to compute and
display the escaping variables. These features are triggered by the
options --escapes-compute
/-e
and
--escapes-display
/-E
.
Be sure to read the chapter "Escapes" in the lecture notes.
Things to learn during this stage that you should remember:
This example demonstrates the computation and display of escaping variables/formals. Notice that by default, all variable must be considered as escaping, since it is safe to put a non escaping variable onto the stack, while the converse is unsafe.
let
var escaping := "I rule the world!\n"
var not_escaping := "Peace on Earth for humans of good will.\n"
function print_slogan (not_escaping: string) =
(print (not_escaping); print (escaping))
in
print_slogan (not_escaping)
end
File 24: variable-escapes.tig
$ tc -EeE variable-escapes.tig /* == Escapes. == */ let var /* escaping */ escaping := "I rule the world!\n" var /* escaping */ not_escaping := "Peace on Earth for humans of good will.\n" function print_slogan (/* escaping */ not_escaping : string) = ( print (not_escaping); print (escaping) ) in print_slogan (not_escaping) end /* == Escapes. == */ let var /* escaping */ escaping := "I rule the world!\n" var not_escaping := "Peace on Earth for humans of good will.\n" function print_slogan (not_escaping : string) = ( print (not_escaping); print (escaping) ) in print_slogan (not_escaping) end Example 25: tc -EeE variable-escapes.tig
You are strongly encouraged to run your compiler on merge.tig
and
to study its output. There is a number of silly mistakes that people
usually do on T3: they are all easy to defeat when you do have a
reasonable test suite, and once you understood that torturing your
project is a good thing to do.
ast::PrintVisitor
/* escaping */
flag where needed, and
only where needed. If you don't pay attention, you might display
meaningless flags due to implementation details.
escapes::EscapesVisitor
escapes::EscapesVisitor
in
src/escapes/escapes-visitor.hh
.
You are suggested to implement three additional classes:
Definition
symbol::Table
into Table <Definition>
.
escape_set (void) | virtual void |
Sets the escape to true. |
int _depth | Variable |
Depth at which this object has been created. |
depth_get () const | int |
Returns the depth associated to this Definition object.
|
VariableDefinition
Definition
. It has one additional attribute, a
VarDec &
. The method escape_set
is implemented, and when
invoked, set the escapes
flags of the corresponding
VarDec
.
FormalDefinition
Definition
. To be designed by yourself. Do not
forget that the ast
class used to register formals is used
elsewhere, and it would be a pity that your implementation makes no
difference... Be sure to write a test that verifies that your
implementation is not abused. I have one such test...
ast
escape_get
and escape_set
methods. Most
probably the code was already given, and is using const_cast
s;
try to use mutable
instead.
Modify the code so that each definition of an escaping variable/formal
is preceded by the comment /* escaping */
if the flag
display_escapes_p
is true. See the item "Driver" for an
example.
-ggdb
. So don't pass it.
This section was last updated for EPITA-2005 on 2003-04-08.
Things to learn during this stage that you should remember:
Type checking is optional, invoked by --types-check
or
-T
:
1 + "2"
File 26: int-plus-string.tig
$ tc int-plus-string.tig Example 27: tc int-plus-string.tig
$ tc int-plus-string.tig --types-check error-->int-plus-string.tig:1.0-6: type mismatch error--> right operand type: string error--> expected type: int =>4 Example 28: tc int-plus-string.tig --types-check
When there are several type errors, it is admitted that some remain hidden by others.
unknown_function (unknown_variable)
File 29: unknowns.tig
$ tc unknowns.tig --types-check error-->unknowns.tig:1.0-34: unknown function: unknown_function =>4 Example 30: tc unknowns.tig --types-check
Be sure to check the type of all the constructs.
if 1 then 2
File 31: bad-if.tig
$ tc bad-if.tig --types-check error-->bad-if.tig:1.0-10: type mismatch error--> then clause type: int error--> else clause type: void =>4 Example 32: tc bad-if.tig --types-check
Be aware that type and function declarations are recursive by chunks. For instance:
let type one = { hd : int, tail : two }
type two = { hd : int, tail : one }
function one (hd : int, tail : two) : one
= one { hd = hd, tail = tail }
function two (hd : int, tail : one) : two
= two { hd = hd, tail = tail }
var one := one (11, two (22, nil))
in
print_int (one.tail.hd); print ("\n")
end
File 33: mutuals.tig
$ tc mutuals.tig --types-check Example 34: tc mutuals.tig --types-check
In case you are interested, the result is:
$ tc -H mutuals.tig >mutuals.hir Example 35: tc -H mutuals.tig >mutuals.hir
$ havm mutuals.hir 22 Example 36: havm mutuals.hir
Some code is provided: 2005-tc-4.3.tar.bz2
. The transition
from the previous versions can be done thanks to the following diffs:
2005-tc-2.1-4.0.diff
, 2005-tc-4.0-4.1.diff
,
2005-tc-4.1-4.2.diff
, 2005-tc-4.2-4.3.diff
.
See src/misc.
What is to be done.
symbol::Table< class Entry_T >
symbol::Table
in
src/symbol/table.hh
which is a table of symbols dedicated to
storing some data which type is Entry_T *
. In short, it maps a
symbol::Symbol
to an Entry_T *
(that should ring a
bell...). You are encouraged to implement something simple, based on
stacks (see std::stack
or std::list
) and maps (see
std::map
).
symbol::Table
is a class template as it is used by virtually all
the AST visitors (e.g., escapes::EscapesVisitor
,
type::TypeVisitor
, translate::TranslateVisitor
etc.)
symbol::Table
must provide this interface:
scope_begin () | void |
Open a new scope. |
scope_end () | void |
Close the last scope, forgetting everything since the latest
scope_begin () .
|
put (Symbol key, Entry_T & value) | void |
Associate value to key in the current scope. |
get (Symbol key) const | Entry_T * |
If key was associated to some Entry_T in the open scopes,
return the most recent insertion. Otherwise return the empty pointer.
|
print (std::ostream & ostr) const | void |
Send the content of this table on ostr in a readable manner, the top of the stack being displayed last. |
src/type/types.hh
type::String
, type::Int
, and
type::Void
are to be implemented. Using templates would be
particularly appreciated to factor the code between the four singleton
classes.
type::Named
is almost entirely given.
type::Array
is even simpler than the four Singletons.
type::Record
is somewhat incomplete.
Pay extra attention to the implementation of
type::operator== (const Type& a, const Type& b)
,
type::Type::assignable_to
and type::Type::comparable_to
.
src/type/type-entry.hh
src/type/type-env.hh
type::TypeEnv
must be completed: it must fill
the environment with the definition of builtin types and functions. See
the Tiger Reference Manual.
The handling of types is left as an example, you still have to implement
the variables and functions support.
type::TypeVisitor
foo/
foo-tasks.hh
. You must clean up your code to use
the latest sources for Tasks
, and make sure that your
configure.ac
no longer includes
foo/lib
foo.hh
in src/modules.hh
.
These are features that you might want to implement in addition to the core features.
type::Error
Int
, which can create cascades of errors:
"666" = if 000 then 333 else "666"
File 37: is_devil.tig
$ tc is_devil.tig --types-check error-->is_devil.tig:1.8-33: type mismatch error--> then clause type: int error--> else clause type: string error-->is_devil.tig:1.0-33: type mismatch error--> left operand type: string error--> right operand type: int =>4 Example 38: tc is_devil.tig --types-check
One means to avoid this issue consists in introducing a new type,
type::Error
, that the type checker would never complain about.
This can be a nice complement to ast::Error
.
let type weirdo = array of weirdo in print ("I'm a creep.\n") end
the answer is "yes", as nothing prevents this in the Tiger
specifications. Note that this type is not usable though.
kind_get
, etc.)
_kind
,
kind_get
and so forth. These are to be used only in T5, you
don't have to complete them now.
TypeVisitor
is not a ConstVisitor
<
is overloaded (for
integers and strings), the translation needs to know the types of the
arguments. In a traditional compiler, type checking and translation
would be performed simultaneously, but our Tiger Compiler, in order to
simplify its architecture, has two different passes for each. Hence,
the TypeVisitor
will have to leave notes on the AST for
the TranslateVisitor
, therefore it cannot be a const visitor once
T5 implemented. It can perfectly be const during T4.
This section was last updated for EPITA-2005 on 2003-06-10.
Things to learn during this stage that you should remember:
The Ix
, Cx
, Nx
, and Ex
classes delay
computation to address context-depend issues in a context independent
way.
In this project, the ast is composed of different classes related by inheritance (as if the kinds of the nodes were class members). Here, the nodes are members of a single class, but their nature is specified by the object itself (as if the kinds of the nodes were object members).
T5 can be started (and should be started if you don't want to finish it in a hurry) by first making sure your compiler can handle code that uses no variables. Then, you can complete your compiler to support more and more Tiger features.
This example is probably the simplest Tiger program.
0
File 39: 0.tig
$ tc --hir-display 0.tig /* == High Level Intermediate representation. == */ # Routine: Main label l`Main' # Prologue # Body sxp const 0 # Epilogue label end Example 40: tc --hir-display 0.tig
You should then probably try to make more difficult programs with literals only. Arithmetics is one of the easiest tasks.
1 + 2 * 3
File 41: arith.tig
$ tc -H arith.tig /* == High Level Intermediate representation. == */ # Routine: Main label l`Main' # Prologue # Body sxp binop (+) const 1 binop (*) const 2 const 3 # Epilogue label end Example 42: tc -H arith.tig
You should use havm
to exercise your output.
$ tc -H arith.tig >arith.hir Example 43: tc -H arith.tig >arith.hir
$ havm arith.hir Example 44: havm arith.hir
Unfortunately, without actually printing something, you won't see the
final result, which means you need to implement calls. Fortunately, you
can ask havm
for a verbose execution:
$ havm --trace arith.hir error-->plaining error-->unparsing error-->checking error-->checkingLow error-->evaling error--> call ( name Main ) [] error-->8.8-8.15: const 1 error-->10.12-10.19: const 2 error-->11.12-11.19: const 3 error-->9.8-11.19: binop (*) 2 3 error-->7.4-11.19: binop (+) 1 6 error-->6.0-11.19: sxp 7 error--> end call ( name Main ) [] = 0 Example 45: havm --trace arith.hir
If you look carefully, you will find an sxp 7
in there...
Then you are encouraged to implement control structures.
if 101 then 102 else 103
File 46: if-101.tig
$ tc -H if-101.tig /* == High Level Intermediate representation. == */ # Routine: Main label l`Main' # Prologue # Body seq cjump ne const 101 const 0 name l0 name l1 label l0 sxp const 102 jump name l2 label l1 sxp const 103 label l2 seq end # Epilogue label end Example 47: tc -H if-101.tig
And even more difficult control structure uses:
while 101
do (if 102 then break)
File 48: while-101.tig
$ tc -H while-101.tig /* == High Level Intermediate representation. == */ # Routine: Main label l`Main' # Prologue # Body seq label l1 cjump ne const 101 const 0 name l2 name l0 label l2 seq cjump ne const 102 const 0 name l3 name l4 label l3 jump name l0 jump name l5 label l4 sxp const 0 label l5 seq end jump name l1 label l0 seq end # Epilogue label end Example 49: tc -H while-101.tig
Our compiler optimizes the number of jumps needed to compute nested
if
, using translate::Ix
where a plain use of
translate::Cx
, Nx
, and Ex
is possible, but less
efficient.
Consider the following sample:
if 11 | 22 then print ("OK\n")
File 50: boolean.tig
a naive implementation will probably produce too many successive
cjump
instructions:
$ tc --hir-naive -H boolean.tig /* == High Level Intermediate representation. == */ label l3 "OK\n" # Routine: Main label l`Main' # Prologue # Body seq cjump ne eseq seq cjump ne const 11 const 0 name l0 name l1 label l0 move temp t0 const 1 jump name l2 label l1 move temp t0 const 22 jump name l2 label l2 seq end temp t0 const 0 name l4 name l5 label l4 sxp call name l`print' name l3 call end jump name l6 label l5 sxp const 0 jump name l6 label l6 seq end # Epilogue label end Example 51: tc --hir-naive -H boolean.tig
$ tc --hir-naive -H boolean.tig >boolean-1.hir Example 52: tc --hir-naive -H boolean.tig >boolean-1.hir
$ havm --profile boolean-1.hir error-->/* Profiling. */ error-->fetches from temporary : 1 error-->fetches from memory : 0 error-->binary operations : 0 error-->function calls : 1 error-->stores to temporary : 1 error-->stores to memory : 0 error-->jumps : 2 error-->conditional jumps : 2 error-->/* Execution time. */ error-->number of cycles : 16 OK Example 53: havm --profile boolean-1.hir
If you carefully analyze the cause of this pessimization, it is related
to the computation of an intermediary expression (the value of 11
| 22
) which is later decoded as a condition. A proper implementation
will produce:
$ tc -H boolean.tig /* == High Level Intermediate representation. == */ label l0 "OK\n" # Routine: Main label l`Main' # Prologue # Body seq seq cjump ne const 11 const 0 name l4 name l5 label l4 cjump ne const 1 const 0 name l1 name l2 label l5 cjump ne const 22 const 0 name l1 name l2 seq end label l1 sxp call name l`print' name l0 call end jump name l3 label l2 sxp const 0 label l3 seq end # Epilogue label end Example 54: tc -H boolean.tig
$ tc -H boolean.tig >boolean-2.hir Example 55: tc -H boolean.tig >boolean-2.hir
$ havm --profile boolean-2.hir error-->/* Profiling. */ error-->fetches from temporary : 0 error-->fetches from memory : 0 error-->binary operations : 0 error-->function calls : 1 error-->stores to temporary : 0 error-->stores to memory : 0 error-->jumps : 1 error-->conditional jumps : 2 error-->/* Execution time. */ error-->number of cycles : 13 OK Example 56: havm --profile boolean-2.hir
But the game becomes more interesting when you implement function calls
(which is easier than compiling functions). print_int
is
probably the first builtin to implement:
(print_int (101); print ("\n"))
File 57: print-101.tig
$ tc -H print-101.tig >print-101.hir Example 58: tc -H print-101.tig >print-101.hir
$ havm print-101.hir 101 Example 59: havm print-101.hir
Complex values, arrays and records, also need calls to the runtime system:
let type list = { h: int, t: list }
var list := list { h = 1, t = list { h = 2, t = nil } }
in
print_int (list.t.h); print ("\n")
end
File 60: print-list.tig
$ tc -H print-list.tig /* == High Level Intermediate representation. == */ label l0 "\n" # Routine: Main label l`Main' # Prologue move temp t2 temp fp move temp fp temp sp move temp sp binop (-) temp sp const 4 # Body seq move mem temp $fp eseq seq move temp t1 call name l`malloc' const 8 call end move mem binop (+) temp t1 const 0 const 1 move mem binop (+) temp t1 const 4 eseq seq move temp t0 call name l`malloc' const 8 call end move mem binop (+) temp t0 const 0 const 2 move mem binop (+) temp t0 const 4 const 0 seq end temp t0 seq end temp t1 seq sxp call name l`print_int' mem binop (+) mem binop (+) mem temp $fp const 4 const 0 call end sxp call name l`print' name l0 call end seq end seq end # Epilogue move temp sp temp fp move temp fp temp t2 label end Example 61: tc -H print-list.tig
$ tc -H print-list.tig >print-list.hir Example 62: tc -H print-list.tig >print-list.hir
$ havm print-list.hir 2 Example 63: havm print-list.hir
Here is an example which demonstrates the usefulness of information about escapes: when escaping variables are not computed, they are all stored on the stack:
let var a := 1
var b := 2
var c := 3
in
a := 2;
c := a + b + c;
print_int (c);
print ("\n")
end
File 64: vars.tig
$ tc -H vars.tig /* == High Level Intermediate representation. == */ label l0 "\n" # Routine: Main label l`Main' # Prologue move temp t0 temp fp move temp fp temp sp move temp sp binop (-) temp sp const 12 # Body seq move mem temp $fp const 1 move mem binop (+) temp $fp const -4 const 2 move mem binop (+) temp $fp const -8 const 3 seq move mem temp $fp const 2 move mem binop (+) temp $fp const -8 binop (+) binop (+) mem temp $fp mem binop (+) temp $fp const -4 mem binop (+) temp $fp const -8 sxp call name l`print_int' mem binop (+) temp $fp const -8 call end sxp call name l`print' name l0 call end seq end seq end # Epilogue move temp sp temp fp move temp fp temp t0 label end Example 65: tc -H vars.tig
But once escaping variable computation implemented, we know none escape in this example, hence they can be stored in temporaries:
$ tc -eH vars.tig /* == High Level Intermediate representation. == */ label l0 "\n" # Routine: Main label l`Main' # Prologue # Body seq move temp t0 const 1 move temp t1 const 2 move temp t2 const 3 seq move temp t0 const 2 move temp t2 binop (+) binop (+) temp t0 temp t1 temp t2 sxp call name l`print_int' temp t2 call end sxp call name l`print' name l0 call end seq end seq end # Epilogue label end Example 66: tc -eH vars.tig
$ tc -eH vars.tig >vars.hir Example 67: tc -eH vars.tig >vars.hir
$ havm vars.hir 7 Example 68: havm vars.hir
Then, you should implement the declaration of functions:
let function fact (i: int) : int =
if i = 0 then 1
else i * fact (i - 1)
in
print_int (fact (15));
print ("\n")
end
File 69: fact15.tig
$ tc -H fact15.tig /* == High Level Intermediate representation. == */ # Routine: fact label l0 # Prologue move temp t1 temp fp move temp fp temp sp move temp sp binop (-) temp sp const 8 move mem temp $fp temp i0 move mem binop (+) temp $fp const -4 temp i1 # Body move temp $v0 eseq seq cjump eq mem binop (+) temp $fp const -4 const 0 name l1 name l2 label l1 move temp t0 const 1 jump name l3 label l2 move temp t0 binop (*) mem binop (+) temp $fp const -4 call name l0 mem temp $fp binop (-) mem binop (+) temp $fp const -4 const 1 call end label l3 seq end temp t0 # Epilogue move temp sp temp fp move temp fp temp t1 label end label l4 "\n" # Routine: Main label l`Main' # Prologue # Body seq sxp call name l`print_int' call name l0 temp $fp const 15 call end call end sxp call name l`print' name l4 call end seq end # Epilogue label end Example 70: tc -H fact15.tig
$ tc -H fact15.tig >fact15.hir Example 71: tc -H fact15.tig >fact15.hir
$ havm fact15.hir 2004310016 Example 72: havm fact15.hir
And finally, you should support escaping variables (see example 24):
$ tc -eH variable-escapes.tig /* == High Level Intermediate representation. == */ label l0 "I rule the world!\n" label l1 "Peace on Earth for humans of good will.\n" # Routine: print_slogan label l2 # Prologue move temp t2 temp fp move temp fp temp sp move temp sp binop (-) temp sp const 4 move mem temp $fp temp i0 move temp t1 temp i1 # Body seq sxp call name l`print' temp t1 call end sxp call name l`print' mem mem temp $fp call end seq end # Epilogue move temp sp temp fp move temp fp temp t2 label end # Routine: Main label l`Main' # Prologue move temp t3 temp fp move temp fp temp sp move temp sp binop (-) temp sp const 4 # Body seq move mem temp $fp name l0 move temp t0 name l1 sxp call name l2 temp $fp temp t0 call end seq end # Epilogue move temp sp temp fp move temp fp temp t3 label end Example 73: tc -eH variable-escapes.tig
Some code is provided, see T6 Given Code. See src/temp, src/tree, src/frame, src/translate.
You are encouraged to try first very simple examples: nil
,
1 + 2
, "foo" < "bar"
etc. Then consider supporting
variables, and finally handle the case of the functions.
--hir-compute
, but displays the result iff the option
-H
was given. Obviously, an input that has not been
type-checked cannot be translated, so --hir-compute
implies
--types-check
.
TypeVisitor
TranslateVisitor
often needs additional type information to
proceed, especially expression versus instruction. Hence, you'll have
to update the TypeVisitor
to leave notes on the AST
using kind_set
and so forth.
src/translate/fragment.hh
translate::ProcFrag::print
which
outputs the routine themselves plus the glue code (allocating the
frame etc.).
src/translate/level-env.hh
src/translate/translation.hh
src/translate/translate-visitor.hh
This section documents possible extensions you could implement in T5.
Implementing bounds checking is quite simple: it consists in having the program die when the program accesses an invalid subscript in an array. For instance, the following code is "succeeds" with a non-bounds-checking compiler.
let type int_array = array of int
var size := 2
var arr1 := int_array [size] of 0
var arr2 := int_array [size] of 0
var two := 2
var m_one := -1
in
arr1[two] := 3;
arr2[m_one] := -1;
print_int (arr1[1]);
print ("\n");
print_int (arr2[0]);
print ("\n")
end
File 74: bounds-violation.tig
$ tc -H bounds-violation.tig >bounds-violation.hir Example 75: tc -H bounds-violation.tig >bounds-violation.hir
$ havm bounds-violation.hir -1 3 Example 76: havm bounds-violation.hir
When run with --bound-checking
2, your compiler produces code that
diagnoses such cases, and exits with status 120. Something like:
error-->bounds-violation.tig:8.2-17: index out of arr1 bounds (0 .. 1): 2 =>120
Warning: this optimization is difficult to do it perfectly, and therefore, expect a big bonus.
In a first and conservative extension, the compiler considers that all
the functions (but the builtins!) need a static link. This is correct,
but inefficient: for instance, the traditional fact
function will
spend almost as much time handling the static link, than its real
argument.
Some functions need a static link, but don't need to save it on the stack. For instance, in the following:
let var foo := 1 function foo () : int = foo in foo () end
the function foo
does need a static link to access the variable
foo
, but does not need to store its static link on the stack.
It is suggested to address these problems in the following order:
--escapes-display
to display /*
escaping sl */
before the first formal argument of the functions
(declarations) that need the static link:
$ tc -E fact.tig /* == Escapes. == */ let function fact (/* escaping sl *//* escaping */ n : int) : int = if (n = 0) then 1 else (n * fact ( (n - 1))) in fact (10) end $ tc -eE fact.tig /* == Escapes. == */ let function fact (n : int) : int = if (n = 0) then 1 else (n * fact ( (n - 1))) in fact (10) end
call
and progFrag
prologues.
$ tc -eE escaping-sl.tig /* == Escapes. == */ let var toto := 1 function outer (/* escaping sl */) : int = let function inner (/* sl */) : int = toto in inner () end in outer () end
Watch out, it is not trivial to find the minimum. What do you think
about the static link of the function sister
below?
let var toto := 1 function outer () : int = let function inner () : int = toto in inner () end function sister () : int = outer () in sister () end
This section was last updated for EPITA-2005 on 2003-05-15.
At the end of this stage, the compiler produces low level intermediate representation: LIR. LIR is a subset of the HIR: some patterns are forbidden. This is why it is also named canonicalization.
Things to learn during this stage that you should remember:
There are several stages in T6.
The first task in T6 is getting rid of all the eseq
. To do this,
you have to move the statement part of an eseq
at the end of the
current sequence point, and keeping the expression part in place.
Compare for instance the HIR to the LIR in the following case:
let function print_ints (a: int, b: int) =
(print_int (a); print (", "); print_int (b); print ("\n"))
var a := 0
in
print_ints (1, (a := a + 1; a))
end
File 77: preincr-1.tig
One possible HIR translation is:
$ tc -eH preincr-1.tig /* == High Level Intermediate representation. == */ label l1 ", " label l2 "\n" # Routine: print_ints label l0 # Prologue move temp t2 temp fp move temp fp temp sp move temp sp binop (-) temp sp const 4 move mem temp $fp temp i0 move temp t0 temp i1 move temp t1 temp i2 # Body seq sxp call name l`print_int' temp t0 call end sxp call name l`print' name l1 call end sxp call name l`print_int' temp t1 call end sxp call name l`print' name l2 call end seq end # Epilogue move temp sp temp fp move temp fp temp t2 label end # Routine: Main label l`Main' # Prologue # Body seq move temp t3 const 0 sxp call name l0 temp $fp const 1 eseq move temp t3 binop (+) temp t3 const 1 temp t3 call end seq end # Epilogue label end Example 78: tc -eH preincr-1.tig
A possible canonicalization is then:
$ tc -eL preincr-1.tig /* == Low Level Intermediate representation. == */ label l1 ", " label l2 "\n" # Routine: print_ints label l0 # Prologue move temp t2 temp fp move temp fp temp sp move temp sp binop (-) temp sp const 4 move mem temp $fp temp i0 move temp t0 temp i1 move temp t1 temp i2 # Body seq label l3 sxp call name l`print_int' temp t0 call end sxp call name l`print' name l1 call end sxp call name l`print_int' temp t1 call end sxp call name l`print' name l2 call end label l4 seq end # Epilogue move temp sp temp fp move temp fp temp t2 label end # Routine: Main label l`Main' # Prologue # Body seq label l5 move temp t3 const 0 move temp t5 temp $fp move temp t3 binop (+) temp t3 const 1 sxp call name l0 temp t5 const 1 temp t3 call end label l6 seq end # Epilogue label end Example 79: tc -eL preincr-1.tig
But please note the example above is simple because 1
commutes with (a := a + 1; a)
: the order does not matter.
But if you change the 1
into a
, then you cannot exchange
a
and (a := a + 1; a)
, so the translation is different.
Compare the previous LIR with the following, and pay attention
to
let function print_ints (a: int, b: int) =
(print_int (a); print (", "); print_int (b); print ("\n"))
var a := 0
in
print_ints (a, (a := a + 1; a))
end
File 80: preincr-2.tig
$ tc -eL preincr-2.tig /* == Low Level Intermediate representation. == */ label l1 ", " label l2 "\n" # Routine: print_ints label l0 # Prologue move temp t2 temp fp move temp fp temp sp move temp sp binop (-) temp sp const 4 move mem temp $fp temp i0 move temp t0 temp i1 move temp t1 temp i2 # Body seq label l3 sxp call name l`print_int' temp t0 call end sxp call name l`print' name l1 call end sxp call name l`print_int' temp t1 call end sxp call name l`print' name l2 call end label l4 seq end # Epilogue move temp sp temp fp move temp fp temp t2 label end # Routine: Main label l`Main' # Prologue # Body seq label l5 move temp t3 const 0 move temp t5 temp $fp move temp t6 temp t3 move temp t3 binop (+) temp t3 const 1 sxp call name l0 temp t5 temp t6 temp t3 call end label l6 seq end # Epilogue label end Example 81: tc -eL preincr-2.tig
As you can see, the output is the same for the HIR and the LIR:
$ tc -eH preincr-2.tig >preincr-2.hir Example 82: tc -eH preincr-2.tig >preincr-2.hir
$ havm preincr-2.hir 0, 1 Example 83: havm preincr-2.hir
$ tc -eL preincr-2.tig >preincr-2.lir Example 84: tc -eL preincr-2.tig >preincr-2.lir
$ havm preincr-2.lir 0, 1 Example 85: havm preincr-2.lir
Be very careful when dealing with mem
. For instance, rewriting
something like:
call (foo, eseq (move (temp t, const 51), temp t))
into
move temp t1, temp t move temp t, const 51 call (foo, temp t)
is dead wrong: temp t
is a subexpression: it is being
defined here. You should produce:
move temp t, const 51 call (foo, temp t)
Another danger is the handling of move (mem, )
. For instance:
move (mem foo, x)
must be rewritten into:
move (temp t, foo) move (mem (temp t), x)
not as:
move (temp t, mem (foo)) move (temp t, x)
In other words, the first subexpression of move (mem (foo), )
is
foo
, not mem (foo)
. The following example is a good crash
test against this problem:
let type int_array = array of int
var tab := int_array [2] of 51
in
tab[0] := 100;
tab[1] := 200;
print_int (tab[0]); print ("\n");
print_int (tab[1]); print ("\n")
end
File 86: move-mem.tig
$ tc -eL move-mem.tig >move-mem.lir Example 87: tc -eL move-mem.tig >move-mem.lir
$ havm move-mem.lir 100 200 Example 88: havm move-mem.lir
You also ought to get rid of nested calls:
print (chr (ord ("\n")))
File 89: nested-calls.tig
$ tc -L nested-calls.tig /* == Low Level Intermediate representation. == */ label l0 "\n" # Routine: Main label l`Main' # Prologue # Body seq label l1 move temp t1 call name l`ord' name l0 call end move temp t2 call name l`chr' temp t1 call end sxp call name l`print' temp t2 call end label l2 seq end # Epilogue label end Example 90: tc -L nested-calls.tig
In fact there are only two valid call forms: sxp (call (...))
,
and move (temp (...), call (...))
.
Note that, contrary to C, the HIR and LIR always denote the same value. For instance the following Tiger code:
let
var a := 1
function a (t: int) : int =
(a := a + 1;
print_int (t); print (" -> "); print_int (a); print ("\n");
a)
var b := a (1) + a (2) * a (3)
in
print_int (b); print ("\n")
end
File 91: seq-point.tig
should always produce:
$ tc -L seq-point.tig >seq-point.lir Example 92: tc -L seq-point.tig >seq-point.lir
$ havm seq-point.lir 1 -> 2 2 -> 3 3 -> 4 14 Example 93: havm seq-point.lir
independently of the what IR you ran. Note that it has nothing to do with the precedence of the operators!
In C, you have no such guarantee: the following program can give different results with different compilers and/or on different architectures.
#include <stdio.h> int _a = 1; int a (int t) { ++_a; printf ("%d -> %d\n", t, _a); return _a; } int main (void) { int b = a (1) + a (2) * a (3); printf ("%d\n", b); return 0; }
Once you have canonicalized your eseq
and call
, you have
to canonicalize cjump
s: they must always be followed by their
"false" label. This goes in two steps:
A basic block is a sequence of code starting with a label, ending with a jump (conditional or not), and with no jumps, no labels inside.
Now put all the basic blocks into a single sequence.
In the following, the result of the whole conversion is visible.
The following examples highlights the need for new labels: at least one for the entry point, and one for the exit point:
1 & 2
File 94: 1-and-2.tig
$ tc -L 1-and-2.tig /* == Low Level Intermediate representation. == */ # Routine: Main label l`Main' # Prologue # Body seq label l3 cjump ne const 1 const 0 name l0 name l1 label l1 label l2 jump name l4 label l0 jump name l2 label l4 seq end # Epilogue label end Example 95: tc -L 1-and-2.tig
The following example contains many jumps. Compare the hir to the lir:
while 10 | 20 do if 30 | 40 then break else break
File 96: broken-while.tig
$ tc -H broken-while.tig /* == High Level Intermediate representation. == */ # Routine: Main label l`Main' # Prologue # Body seq label l1 seq cjump ne const 10 const 0 name l8 name l9 label l8 cjump ne const 1 const 0 name l2 name l0 label l9 cjump ne const 20 const 0 name l2 name l0 seq end label l2 seq seq cjump ne const 30 const 0 name l6 name l7 label l6 cjump ne const 1 const 0 name l3 name l4 label l7 cjump ne const 40 const 0 name l3 name l4 seq end label l3 jump name l0 jump name l5 label l4 jump name l0 label l5 seq end jump name l1 label l0 seq end # Epilogue label end Example 97: tc -H broken-while.tig
$ tc -L broken-while.tig /* == Low Level Intermediate representation. == */ # Routine: Main label l`Main' # Prologue # Body seq label l10 label l1 cjump ne const 10 const 0 name l8 name l9 label l9 cjump ne const 20 const 0 name l2 name l0 label l0 jump name l11 label l2 cjump ne const 30 const 0 name l6 name l7 label l7 cjump ne const 40 const 0 name l3 name l4 label l4 jump name l0 label l3 jump name l0 label l6 cjump ne const 1 const 0 name l3 name l13 label l13 jump name l4 label l8 cjump ne const 1 const 0 name l2 name l14 label l14 jump name l0 label l11 seq end # Epilogue label end Example 98: tc -L broken-while.tig
Some code is provided: 2005-tc-6.1.tar.bz2
. The transition
from the previous versions can be done thanks to the following diffs:
2005-tc-4.3-6.0.diff
, 2005-tc-6.0-6.1.diff
.
It includes most of the canonicalization.
Everything you need.
Please note that the 2005-T7 delivery is an option: there will be no grade, and a single upload will be accepted. The tests from T0 to T7 tests will be run on the tarball. The goal is to help you see your mistakes, and how your T7 is running to be able to proceed in peace onto T8. There will be no penalty if you don't take advantage of this possibility.
At the end of this stage, the compiler produces the very low level intermediate representation: ASSEM. This output is target dependent, and we aim at MIPS, as we use Mipsy to run it.
Things to learn during this stage that you should remember:
The goal of T7 is straightforward: starting from LIR, generate the
MIPS instructions, except that you don't have actual registers: we
still heavily use Temp
s. Register allocation will be done in a
later stage, T9.
1 + 2 * 3
File 99: seven.tig
$ tc --inst-display seven.tig # == Final assembler ouput. == # # Routine: Main t_main: move t5, $s0 move t6, $s1 move t7, $s2 move t8, $s3 move t9, $s4 move t10, $s5 move t11, $s6 move t12, $s7 l0: li t3, 2 mul t2, t3, 3 li t4, 1 add t1, t4, t2 l1: move $s0, t5 move $s1, t6 move $s2, t7 move $s3, t8 move $s4, t9 move $s5, t10 move $s6, t11 move $s7, t12 Example 100: tc --inst-display seven.tig
Please, note that at this stage, the control flow analysis and the
liveness analysis are not performed yet, therefore the compiler cannot
know what registers are really to be saved. That's why in the previous
output it saves "uselessly" all the callee-save registers on main
entry. The next stage, which combines control flow analysis, liveness
analysis, and register allocation, will make it useless. For your
information, it results in:
$ tc -sI seven.tig # == Final assembler ouput. == # # Routine: Main t_main: sw $fp, ($sp) move $fp, $sp sub $sp, $sp, 8 sw $ra, -4 ($fp) l0: li $t0, 2 mul $t1, $t0, 3 li $t0, 1 add $t0, $t0, $t1 l1: lw $ra, -4 ($fp) move $sp, $fp lw $fp, ($fp) jr $ra Example 101: tc -sI seven.tig
A delicate part of this exercise is handling the function calls:
let function add (x: int, y: int) : int = x + y
in
print_int (add (1, (add (2, 3)))); print ("\n")
end
File 102: add.tig
$ tc -e --mipsy-display add.tig # == Final assembler ouput. == # # Routine: add l0: sw $fp, -4 ($sp) move $fp, $sp sub $sp, $sp, 12 sw $ra, -8 ($fp) sw $a0, ($fp) move t0, $a1 move t1, $a2 move t7, $s0 move t8, $s1 move t9, $s2 move t10, $s3 move t11, $s4 move t12, $s5 move t13, $s6 move t14, $s7 l2: add t6, t0, t1 move $v0, t6 l3: move $s0, t7 move $s1, t8 move $s2, t9 move $s3, t10 move $s4, t11 move $s5, t12 move $s6, t13 move $s7, t14 lw $ra, -8 ($fp) move $sp, $fp lw $fp, -4 ($fp) jr $ra .data l1: .word 1 .asciiz "\n" .text # Routine: Main t_main: sw $fp, ($sp) move $fp, $sp sub $sp, $sp, 8 sw $ra, -4 ($fp) move t19, $s0 move t20, $s1 move t21, $s2 move t22, $s3 move t23, $s4 move t24, $s5 move t25, $s6 move t26, $s7 l4: move $a0, $fp li t15, 2 move $a1, t15 li t16, 3 move $a2, t16 jal l0 move t4, $v0 move $a0, $fp li t17, 1 move $a1, t17 move $a2, t4 jal l0 move t5, $v0 move $a0, t5 jal print_int la t18, l1 move $a0, t18 jal print l5: move $s0, t19 move $s1, t20 move $s2, t21 move $s3, t22 move $s4, t23 move $s5, t24 move $s6, t25 move $s7, t26 lw $ra, -4 ($fp) move $sp, $fp lw $fp, ($fp) jr $ra Example 103: tc -e --mipsy-display add.tig
Once your function calls work properly, you can start using mipsy to check the behavior of your compiler.
$ tc -eH add.tig >add.hir Example 104: tc -eH add.tig >add.hir
$ havm add.hir 6 Example 105: havm add.hir
Unfortunately, you need to adjust the output of tc
, using
t123
, to mipsy conventions: $x123
.
$ tc -eR --mipsy-display add.tig >add.instr Example 106: tc -eR --mipsy-display add.tig >add.instr
$ sed -e's/\([^$a-z]\)t\([0-9][0-9]*\)/\1$x\2/g' add.instr >add.mipsy Example 107: sed -e's/\([^$a-z]\)t\([0-9][0-9]*\)/\1$x\2/g' add.instr >add.mipsy
$ mipsy --unlimited-regs --execute add.mipsy 6 Example 108: mipsy --unlimited-regs --execute add.mipsy
You must also complete the runtime. No difference must be observable between a run with havm and another with mipsy:
substring ("", 1, 1)
File 109: substring-0-1-1.tig
$ tc -eH substring-0-1-1.tig >substring-0-1-1.hir Example 110: tc -eH substring-0-1-1.tig >substring-0-1-1.hir
$ havm substring-0-1-1.hir substring: arguments out of bounds =>120 Example 111: havm substring-0-1-1.hir
$ tc -e --mipsy-display substring-0-1-1.tig # == Final assembler ouput. == # .data l0: .word 0 .asciiz "" .text # Routine: Main t_main: sw $fp, ($sp) move $fp, $sp sub $sp, $sp, 8 sw $ra, -4 ($fp) move t4, $s0 move t5, $s1 move t6, $s2 move t7, $s3 move t8, $s4 move t9, $s5 move t10, $s6 move t11, $s7 l1: la t1, l0 move $a0, t1 li t2, 1 move $a1, t2 li t3, 1 move $a2, t3 jal substring l2: move $s0, t4 move $s1, t5 move $s2, t6 move $s3, t7 move $s4, t8 move $s5, t9 move $s6, t10 move $s7, t11 lw $ra, -4 ($fp) move $sp, $fp lw $fp, ($fp) jr $ra Example 112: tc -e --mipsy-display substring-0-1-1.tig
$ tc -eR --mipsy-display substring-0-1-1.tig >substring-0-1-1.instr Example 113: tc -eR --mipsy-display substring-0-1-1.tig >substring-0-1-1.instr
$ sed -e's/\([^$a-z]\)t\([0-9][0-9]*\)/\1$x\2/g' substring-0-1-1.instr >substring-0-1-1.mipsy Example 114: sed -e's/\([^$a-z]\)t\([0-9][0-9]*\)/\1$x\2/g' substring-0-1-1.instr >substring-0-1-1.mipsy
$ mipsy --unlimited-regs --execute substring-0-1-1.mipsy substring: arguments out of bounds =>120 Example 115: mipsy --unlimited-regs --execute substring-0-1-1.mipsy
Below is listed where to find the tarball depending on your class. For more information about the T7 code delivered see src/target, src/assem, src/codegen.
The additional code is provided as:
2004-tc-7.6.tar.bz2
, the whole tarball.
2004-tc-5.3-7.0.diff
,
2004-tc-5.3-7.3.diff
,
2004-tc-5.3-7.4.diff
,
2004-tc-5.3-7.5.diff
,
the differences with the latest tarball that was delivered.
2004-tc-7.0-7.1.diff
,
2004-tc-7.1-7.2.diff
,
2004-tc-7.2-7.3.diff
,
2004-tc-7.3-7.4.diff
,
2004-tc-7.4-7.5.diff
,
2004-tc-7.5-7.6.diff
,
the differences with previous versions of the 2004-tc-7
tarball.
There are two ways to continue the projects:
--inst-compute
and --inst-display
be
recognized. Of course, --inst-compute
implies
--lir-compute
.
# Be in the new tarball before running this. for i in $(find .) do if test ! -f ../my-old-working-directory/$i; then cp $i ../my-old-working-directory/$i fi done
And then, build it step by step.
2005-tc-7.3.tar.bz2
,
the whole tarball.
2005-tc-6.1-7.0.diff
,
2005-tc-6.1-7.1.diff
,
the differences with the latest tarball that was delivered.
2005-tc-7.0-7.1.diff
,
2005-tc-7.1-7.2.diff
,
2005-tc-7.2-7.3.diff
,
the differences with previous versions of the 2004-tc-7
tarball.
There is not much code to write:
Codegen::munchMove
(src/codegen/mips/codegen.cc
)
SpimAssembly::move_build
(src/codegen/mips/spim-assembly.cc
):
build a move instruction using MIPS 2000 standard instruction set.
SpimAssembly::binop_build
(src/codegen/mips/spim-assembly.cc
):
build arithmetic binary operations (addition, multiplication, etc.)
using MIPS 2000 standard instruction set.
SpimAssembly::load_build
, SpimAssembly::store_build
(src/codegen/mips/spim-assembly.cc
):
build a load (respectively a store) instruction using MIPS
2000 standard instruction set. Here, the indirect addressing mode is
used.
SpimAssembly::cjump_build
(src/codegen/mips/spim-assembly.cc
):
translate conditional branch instructions (branch if equal, if lower
than, etc.) into MIPS 2000 assembly.
src/codegen/mips/runtime.s
:
strcmp
print_int
substring
concat
Information on MIPS 2000 assembly instructions may be found in
SPIM manual.
Be aware that you are not required to fill in the blanks in the following places, as they are needed during register allocation only:
Codegen::allocate_frame
(src/codegen/mips/codegen.cc
)
Codegen::rewrite_program
(src/codegen/mips/codegen.cc
)
Things to learn during this stage that you should remember:
Branching is of course a most interesting feature to exercise:
1 | 2 | 3
File 116: ors.tig
$ tc -I ors.tig # == Final assembler ouput. == # # Routine: Main t_main: move t4, $s0 move t5, $s1 move t6, $s2 move t7, $s3 move t8, $s4 move t9, $s5 move t10, $s6 move t11, $s7 l5: li t1, 1 bne t1, 0, l3 l4: li t2, 2 bne t2, 0, l0 l1: l2: j l6 l0: j l2 l3: li t3, 1 bne t3, 0, l0 l7: j l1 l6: move $s0, t4 move $s1, t5 move $s2, t6 move $s3, t7 move $s4, t8 move $s5, t9 move $s6, t10 move $s7, t11 Example 117: tc -I ors.tig
$ tc -F ors.tig Example 118: tc -F ors.tig
File 119:
Main-Main-flow.dot
$ tc -V ors.tig Example 120: tc -V ors.tig
File 121:
Main-Main-liveness.dot
$ tc -N ors.tig Example 122: tc -N ors.tig
File 123:
Main-Main-interference.dot
2004-tc-8.0.tar.bz2
,
2004-tc-8.1.tar.bz2
,
2004-tc-8.2.tar.bz2
,
the whole tarball.
2004-tc-7.5-8.0.diff
,
the differences with the latest tarball that was delivered.
2004-tc-8.0-8.1.diff
,
2004-tc-8.1-8.2.diff
,
the differences with previous versions of the 2004-tc-8
tarball.
2005-tc-8.0.tar.bz2
,
the whole tarball.
2005-tc-7.2-8.0.diff
,
2005-tc-7.3-8.0.diff
,
the differences with the latest tarball that was delivered.
To read the description of the new modules, see src/graph, src/liveness.
src/graph/graph.hh
src/graph/graph.hxx
src/liveness/flowgraph.hh
FlowGraph
is actually
constructed from the assembly fragments.
src/liveness/liveness.cc
Liveness
(a decorated
FlowGraph
) is built from assembly instructions.
src/liveness/interference-graph.cc
InterferenceGraph::compute_liveness
, build the graph.
At the end of this stage, the compiler produces code that is runnable using Mipsy.
Things to learn during this stage that you should remember:
This section will not demonstrate the output of the option
-S
, --asm-display
, since it includes the Tiger
runtime, which is quite long. We simply use -I
,
--instr-display
which has the same effect once the
registers allocated, i.e., once -s
, --asm-compute
executed. In short: we use -sI
instead of -S
to
save place.
Allocating registers in the main function, when there is no register
pressure is easy, as, in particular, there are no spills. A direct
consequence is that many move
are now useless, and have
disappeared. For instance, the file of the example 99:
$ tc -sI seven.tig # == Final assembler ouput. == # # Routine: Main t_main: sw $fp, ($sp) move $fp, $sp sub $sp, $sp, 8 sw $ra, -4 ($fp) l0: li $t0, 2 mul $t1, $t0, 3 li $t0, 1 add $t0, $t0, $t1 l1: lw $ra, -4 ($fp) move $sp, $fp lw $fp, ($fp) jr $ra Example 124: tc -sI seven.tig
$ tc -S seven.tig >seven.s Example 125: tc -S seven.tig >seven.s
$ mipsy --execute seven.s Example 126: mipsy --execute seven.s
Another means to display the result of register allocation consists in
reporting the mapping from temp
s to actual registers:
$ tc -s --tempmap-display seven.tig /* Temporary map. */ t1 -> $t0 t2 -> $t1 t3 -> $t0 t4 -> $t0 t5 -> $s0 t6 -> $s1 t7 -> $s2 t8 -> $s3 t9 -> $s4 t10 -> $s5 t11 -> $s6 t12 -> $s7 Example 127: tc -s --tempmap-display seven.tig
Of course it is much better to see what is going on:
(print_int (1 + 2 * 3); print ("\n"))
File 128: print-seven.tig
$ tc -sI print-seven.tig # == Final assembler ouput. == # .data l0: .word 1 .asciiz "\n" .text # Routine: Main t_main: sw $fp, ($sp) move $fp, $sp sub $sp, $sp, 8 sw $ra, -4 ($fp) l1: li $t0, 2 mul $t1, $t0, 3 li $t0, 1 add $a0, $t0, $t1 jal print_int la $a0, l0 jal print l2: lw $ra, -4 ($fp) move $sp, $fp lw $fp, ($fp) jr $ra Example 129: tc -sI print-seven.tig
$ tc -S print-seven.tig >print-seven.s Example 130: tc -S print-seven.tig >print-seven.s
$ mipsy --execute print-seven.s 7 Example 131: mipsy --execute print-seven.s
To torture your compiler, you ought to use many temporaries. To be honest, ours is quite slow, it spends way too much time in register allocation.
let
var a00 := 00 var a55 := 55
var a11 := 11 var a66 := 66
var a22 := 22 var a77 := 77
var a33 := 33 var a88 := 88
var a44 := 44 var a99 := 99
in
print_int (0
+ a00 + a00 + a55 + a55
+ a11 + a11 + a66 + a66
+ a22 + a22 + a77 + a77
+ a33 + a33 + a88 + a88
+ a44 + a44 + a99 + a99);
print ("\n")
end
File 132: print-many.tig
$ tc -eIs --tempmap-display -I --time-report print-many.tig error-->Execution times (seconds) error--> 6: canon-compute : 0 ( 0%) 0 ( 0%) 0.01 ( 20%) error--> 7: inst-display : 0.01 ( 20%) 0 ( 0%) 0 ( 0%) error--> 8: liveness analysis : 0.01 ( 20%) 0 ( 0%) 0.01 ( 20%) error--> 8: liveness edges : 0.01 ( 20%) 0 ( 0%) 0 ( 0%) error--> 9: assign_colors : 0.01 ( 20%) 0 ( 0%) 0 ( 0%) error--> 9: coalesce : 0 ( 0%) 0 ( 0%) 0.01 ( 20%) error--> 9: register allocation : 0 ( 0%) 0 ( 0%) 0.01 ( 20%) error--> 9: simplify : 0.01 ( 20%) 0 ( 0%) 0 ( 0%) error-->Cumulated times (seconds) error--> 6: canon-compute : 0 ( 0%) 0 ( 0%) 0.01 ( 20%) error--> 7: inst-display : 0.05 ( 100%) 0 ( 0%) 0.04 ( 80%) error--> 8: liveness analysis : 0.01 ( 20%) 0 ( 0%) 0.01 ( 20%) error--> 9: asm-compute : 0.04 ( 80%) 0 ( 0%) 0.04 ( 80%) error--> 9: coalesce : 0 ( 0%) 0 ( 0%) 0.01 ( 20%) error--> 9: register allocation : 0.04 ( 80%) 0 ( 0%) 0.04 ( 80%) error--> rest : 0.05 ( 100%) 0 ( 0%) 0.05 ( 100%) error--> TOTAL (seconds) : 0.05 user, 0 system, 0.05 wall # == Final assembler ouput. == # .data l0: .word 1 .asciiz "\n" .text # Routine: Main t_main: move t33, $s0 move t34, $s1 move t35, $s2 move t36, $s3 move t37, $s4 move t38, $s5 move t39, $s6 move t40, $s7 l1: li t0, 0 li t1, 55 li t2, 11 li t3, 66 li t4, 22 li t5, 77 li t6, 33 li t7, 88 li t8, 44 li t9, 99 li t31, 0 add t30, t31, t0 add t29, t30, t0 add t28, t29, t1 add t27, t28, t1 add t26, t27, t2 add t25, t26, t2 add t24, t25, t3 add t23, t24, t3 add t22, t23, t4 add t21, t22, t4 add t20, t21, t5 add t19, t20, t5 add t18, t19, t6 add t17, t18, t6 add t16, t17, t7 add t15, t16, t7 add t14, t15, t8 add t13, t14, t8 add t12, t13, t9 add t11, t12, t9 move $a0, t11 jal print_int la t32, l0 move $a0, t32 jal print l2: move $s0, t33 move $s1, t34 move $s2, t35 move $s3, t36 move $s4, t37 move $s5, t38 move $s6, t39 move $s7, t40 /* Temporary map. */ t0 -> $a0 t1 -> $t9 t2 -> $t8 t3 -> $t7 t4 -> $t6 t5 -> $t5 t6 -> $t4 t7 -> $t3 t8 -> $t2 t9 -> $t1 t11 -> $a0 t12 -> $t0 t13 -> $t0 t14 -> $t0 t15 -> $t0 t16 -> $t0 t17 -> $t0 t18 -> $t0 t19 -> $t0 t20 -> $t0 t21 -> $t0 t22 -> $t0 t23 -> $t0 t24 -> $t0 t25 -> $t0 t26 -> $t0 t27 -> $t0 t28 -> $t0 t29 -> $t0 t30 -> $t0 t31 -> $t0 t32 -> $a0 t33 -> $s0 t34 -> $s1 t35 -> $s2 t36 -> $s3 t37 -> $s4 t38 -> $s5 t39 -> $s6 t40 -> $s7 # == Final assembler ouput. == # .data l0: .word 1 .asciiz "\n" .text # Routine: Main t_main: sw $fp, ($sp) move $fp, $sp sub $sp, $sp, 8 sw $ra, -4 ($fp) l1: li $a0, 0 li $t9, 55 li $t8, 11 li $t7, 66 li $t6, 22 li $t5, 77 li $t4, 33 li $t3, 88 li $t2, 44 li $t1, 99 li $t0, 0 add $t0, $t0, $a0 add $t0, $t0, $a0 add $t0, $t0, $t9 add $t0, $t0, $t9 add $t0, $t0, $t8 add $t0, $t0, $t8 add $t0, $t0, $t7 add $t0, $t0, $t7 add $t0, $t0, $t6 add $t0, $t0, $t6 add $t0, $t0, $t5 add $t0, $t0, $t5 add $t0, $t0, $t4 add $t0, $t0, $t4 add $t0, $t0, $t3 add $t0, $t0, $t3 add $t0, $t0, $t2 add $t0, $t0, $t2 add $t0, $t0, $t1 add $a0, $t0, $t1 jal print_int la $a0, l0 jal print l2: lw $ra, -4 ($fp) move $sp, $fp lw $fp, ($fp) jr $ra Example 133: tc -eIs --tempmap-display -I --time-report print-many.tig
The code is provided under the following forms:
2005-tc-9.0.tar.bz2
,
2005-tc-9.1.tar.bz2
,
2005-tc-9.2.tar.bz2
,
2005-tc-9.3.tar.bz2
,
the whole tarball.
2005-tc-8.0-9.0.diff
,
2005-tc-8.0-9.1.diff
,
2005-tc-8.0-9.2.diff
,
2005-tc-8.0-9.3.diff
,
the differences with the latest tarball that was delivered.
2005-tc-9.0-9.1.diff
,
2005-tc-9.1-9.2.diff
,
2005-tc-9.1-9.3.diff
,
2005-tc-9.2-9.3.diff
,
the differences with previous versions of the tc-9
tarball.
The most significant differences are that we no longer use the
color_register
attribute for Cpu
, that the runtime
properly sets the exit status, and that its error messages are
standardized.
To read the description of the new module, see src/regalloc.
src/liveness/interference-graph.hh
src/liveness/interference-graph.cc
InterferenceGraph
was upgraded, which will require some modifications in your existing
code.
Rest assured that little work will actually be needed: the main
modification is related to the fact that moves are now encoded as a list
of pairs, while before we had a map mapping a node to the set of nodes
in its move-related to.
src/regalloc/color.hh
Pay attention to misc::set
: there is a lot of syntactic sugar
provided to implement set operations. The code of Color
can
range from ugly and obfuscated to readable and very close to its
specification.
src/regalloc/libregalloc.cc
src/codegen/mips/codegen.cc
Codegen::rewrite_program
.
rv
vs. $v0
rv
and $v0
designate a single guy, we
decided to change the implementation of rv
and fp
in the
frame module to use those of the current target: $v0
and
$fp
for MIPS. This has a strong influence on
havm, of course. It was modified to support these changes, so
make sure to use 0.18 or higher.
This chapter aims at providing some helpful information about the
various tools that you are likely to use to implement tc
. It
does not replace the reading of the genuine documentation, nevertheless,
helpful tips are given. Feel free to contribute additional information.
The single most important tool for implementing the Tiger Project is the original book, Modern Compiler Implementation in C/Java/ML, by Andrew W. Appel, published by Cambridge University Press (New York, Cambridge). ISBN 0-521-58388-8/.
It is not possible to finish this project without having at least one copy per group. We provide a convenient mini Tiger Compiler Reference Manual that contains some information about the language but it does not cover all the details, and sometimes digging into the original book is required. This is on purpose, by virtue of due respect to the author of this valuable book.
Several copies are available at the EPITA library.
There are three flavors of this book:
This book addresses many more issues than the sole Tiger Project as we implement it. In other words, it is an extremely interesting book which provides insights on garbage collection, object oriented and functional languages etc.
There is a dozen copies at the EPITA library, but buying it is a good idea.
Pay extra attention: there are several errors in the books, some of which are reported on Andrew Appel's pages (C Java, and ML), some which are not.
This book is the bible in compiler design. It has extensive insight on the whole architecture of compilers, provides a rigorous treatment for theoretical material etc. Nevertheless I would not recommend this book to EPITA students, because
Nevertheless, curious readers will find valuable information about historically important compilers, people, papers etc. Reading the last section of each chapter (Bibliographical Notes) is a real pleasure for whom is interested.
It should be noted that the French edition, "Compilateurs: Principes,
techniques et outils", was brilliantly translated by Pierre Boullier,
Philippe Deschamp, Martin Jourdan, Bernard Lorho and Monique Lazaud: the
pleasure is as good in French as it is in English.
A remarkable book that provides deep insight on the best practice with
STL. Not only does it teach what's to be done, but it clearly
shows why. A book that any C++ programmer should have read. See the
Effective STL Addison-Wesley Page.
Because the books aims at a complete treatment of Lex and Yacc on a wide
range of platforms, it provides too many details on material with little
interest for us (e.g., we don't care about portability to other Lexes
and Yacces), and too few details on material with big interest for us
(more about exclusive start condition (Flex only), more about Bison only
stuff, interaction with C++ etc.).
See Modern Compiler Implementation. In my humble opinion, most books
give way too much emphasis to scanning and parsing, leaving little
material to the rest of the compiler, or even nothing for advanced
material. This book does not suffer this flaw.
A remarkable review of all the parsing techniques. Because the book is
out of print, its authors made it freely available:
Parsing Techniques - A Practical Guide.
This book is not very interesting for us: the compiler material is not
very advanced (no real ast, not a single line on optimization,
register allocation is naive as the translation is stack based etc.),
and the C++ material is not convincing (for a start, it is not standard
C++ as it still uses #include <iostream.h>
and the like, there is
no use of STL etc.).
Automake is used to facilitate the writing of power Makefile
.
Autoconf is required by Automake: we don't not address portability
issues for this project.
You may read the Autoconf documentation, and the
Automake documentation. Using info
is pleasant: info autoconf
on any properly set up system. The
Goat Book covers the whole
GNU Build System: Autoconf, Automake and Libtool.
To set the name and version of your package, change the AC_INIT
invocation. For instance, T4 for the bardec_f
group gives
AC_INIT(bardec_f-tiger, 4)
. Warning: Autoconf 2.53 smashes the
underscores into dashes. To workaround this misfeature, use:
AC_INIT([bardec_f-tiger], 4, [bardec_f@epita.fr], [bardec_f-tiger])
If something goes wrong, or if it is simply the first time you create
configure.ac
or a Makefile.am
, you need to set up the
GNU Build System. The simplest invocation is:
$ autoreconf -fvi
The various files (configure
, Makefile.in
, etc.) are
created. There is no need to run make distclean
, or
aclocal
or whatever, before running autoreconf
: it
knows what to do.
Then invoke configure
and make
(see GCC):
$ ./configure CC=gcc-3.2 CXX=g++-3.2 $ make
Once the package autotool'ed (see Bootstrapping the Package), once
you can run a simple make
, then you should be able to run
make distcheck
to set up the package.
The mission of make distcheck
is to make sure everything will
work properly. In particular it:
make dist
)
configure
with some options (e.g., ./configure
CC=gcc-3.2 CXX=g++-3.2
), then these
options will not be taken into account here. This means that
running export CC=
gcc-3.2; export
CXX=
g++-3.2
is a better way to make sure that
these compilers will be used.
make
(and following targets) in paranoid mode. This mode
consists in forbidding any change in the source tree, because if, when
you run make
something must be changed in the sources, then it
means something is broken in the tarball. If, for instance, for some
reason it wants to run autoconf
to recreate configure
,
or if it complains that autom4te.cache
cannot be created, then it
means the tarball is broken! So track down the reason of the failure.
make check
make dist
again.
If you just run make dist
instead of make distcheck
, then
you might forget to include some files in the distribution. If you
don't even run make dist
, then not only some files might be
missing, but you have no guarantee that the tarball will compile
elsewhere (not to mention that we don't care about object files etc.).
Running make distcheck
is the only means for you to check that
the project will properly compile on our side. Not running
distcheck
is like turning off the type checking of your compiler:
you hide the errors, you avoid them, instead of actually getting rid of
them.
At this stage, if running make distcheck
does not create
bardec_f-tc-4.tar.bz2
, then something is wrong in your package.
Do not rename it, do not create the tarball by hand: something is rotten
and be sure it will break on the examiner's machine.
We use GCC 3.2, which includes both
gcc-3.2
and g++-3.2
:
the C and C++ compilers. Do not use older versions as they have poor
compliance with the C++ standard. You are welcome to use more recent
versions of GCC if you can use one, but the tests will be done
with 3.2. Using a more recent version is often a good
means to get better error messages if you can't understand what
3.2 is trying to say.
There are good patches floating around to improve GCC. In particular, you might want to use the bounds checking extension available on Herman ten Brugge Home Page.
We use Bison 1.875a which is able to produce a C++ parser. This Bison
is unpublished, as the maintainers still have issues to fix.
Nevertheless, it is usable, and perfectly functional for Tiger. It is
installed in ~akim/bin
, under the name bison
. Be aware
that Bison 1.875 produces buggy C++ parsers.
If you don't use this Bison, you will be in trouble. If you are willing
to work at home, use bison-1.875a.tar.bz2
.
The original papers on Lex and Yacc are:
The following introduction guides can help beginners:
An introduction to Lex and Yacc.
Contains information about Autoconf, Automake, Gperf, Flex, Bison, and GCC.
The Bison documentation, and the Flex documentation are available for browsing.
HAVM is a Tree
(hir or lir)
programs interpreter. It was written by Robert Anisko so that
EPITA students could exercise their compiler projects before
the final jump to assembly code. It is implemented in Haskell, a pure
non strict functional language very well suited for this kind of
symbolic processing. HAVM was coined on both Haskell, and
VM standing for Virtual Machine.
Information about HAVM can be found on HAVM Home Page, and feedback can be sent to LRDE's Projects Address.
FIXME: Ben, some words about it please.
The following is taken from the SPIM documentation itself.
SPIM S20 is a simulator that runs programs for the MIPS R2R3000 RISC computers. SPIM can read and immediately execute files containing assembly language. SPIM is a self-contained system for running these programs and contains a debugger and interface to a few operating system services.The architecture of the MIPS computers is simple and regular, which makes it easy to learn and understand. The processor contains 32 general-purpose 32-bit registers and a well-designed instruction set that make it a propitious target for generating code in a compiler.
However, the obvious question is: why use a simulator when many people have workstations that contain a hardware, and hence significantly faster, implementation of this computer? One reason is that these workstations are not generally available. Another reason is that these machine will not persist for many years because of the rapid progress leading to new and faster computers. Unfortunately, the trend is to make computers faster by executing several instructions concurrently, which makes their architecture more difficult to understand and program. The MIPS architecture may be the epitome of a simple, clean RISC machine.
In addition, simulators can provide a better environment for low-level programming than an actual machine because they can detect more errors and provide more features than an actual computer. For example, SPIM has a X-window interface that is better than most debuggers for the actual machines.
Finally, simulators are an useful tool for studying computers and the programs that run on them. Because they are implemented in software, not silicon, they can be easily modified to add new instructions, build new systems such as multiprocessors, or simply to collect data.
SPIM is written and maintained by James R. Larus.
We use Doxygen as the standard tool for producing the developer's documentation of the project. Its features must be used to produce good documentation, with an explanation of the role of the arguments etc. The quality of the documentation will be part of the notation. Details on how to use proper comments are given in the Doxygen Manual.
The documentation produced by Doxygen must not be included, but the
target html
must produce the html documentation in the
doc/html
directory.
Contributions to this section (as for the rest of this documentation) will be greatly appreciated.
frame::Frame
(see T5).
Tree
(hir or lir)
programs interpreter. See HAVM.
Copyright © 2000 Free Software Foundation, Inc. 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.
The purpose of this License is to make a manual, textbook, or other written document free in the sense of freedom: to assure everyone the effective freedom to copy and redistribute it, with or without modifying it, either commercially or noncommercially. Secondarily, this License preserves for the author and publisher a way to get credit for their work, while not being considered responsible for modifications made by others.
This License is a kind of "copyleft", which means that derivative works of the document must themselves be free in the same sense. It complements the GNU General Public License, which is a copyleft license designed for free software.
We have designed this License in order to use it for manuals for free software, because free software needs free documentation: a free program should come with manuals providing the same freedoms that the software does. But this License is not limited to software manuals; it can be used for any textual work, regardless of subject matter or whether it is published as a printed book. We recommend this License principally for works whose purpose is instruction or reference.
This License applies to any manual or other work that contains a notice placed by the copyright holder saying it can be distributed under the terms of this License. The "Document", below, refers to any such manual or work. Any member of the public is a licensee, and is addressed as "you".
A "Modified Version" of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or translated into another language.
A "Secondary Section" is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or authors of the Document to the Document's overall subject (or to related matters) and contains nothing that could fall directly within that overall subject. (For example, if the Document is in part a textbook of mathematics, a Secondary Section may not explain any mathematics.) The relationship could be a matter of historical connection with the subject or with related matters, or of legal, commercial, philosophical, ethical or political position regarding them.
The "Invariant Sections" are certain Secondary Sections whose titles are designated, as being those of Invariant Sections, in the notice that says that the Document is released under this License.
The "Cover Texts" are certain short passages of text that are listed, as Front-Cover Texts or Back-Cover Texts, in the notice that says that the Document is released under this License.
A "Transparent" copy of the Document means a machine-readable copy, represented in a format whose specification is available to the general public, whose contents can be viewed and edited directly and straightforwardly with generic text editors or (for images composed of pixels) generic paint programs or (for drawings) some widely available drawing editor, and that is suitable for input to text formatters or for automatic translation to a variety of formats suitable for input to text formatters. A copy made in an otherwise Transparent file format whose markup has been designed to thwart or discourage subsequent modification by readers is not Transparent. A copy that is not "Transparent" is called "Opaque".
Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo input format, LaTeX input format, SGML or XML using a publicly available DTD, and standard-conforming simple HTML designed for human modification. Opaque formats include PostScript, PDF, proprietary formats that can be read and edited only by proprietary word processors, SGML or XML for which the DTD and/or processing tools are not generally available, and the machine-generated HTML produced by some word processors for output purposes only.
The "Title Page" means, for a printed book, the title page itself, plus such following pages as are needed to hold, legibly, the material this License requires to appear in the title page. For works in formats which do not have any title page as such, "Title Page" means the text near the most prominent appearance of the work's title, preceding the beginning of the body of the text.
You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the copyright notices, and the license notice saying this License applies to the Document are reproduced in all copies, and that you add no other conditions whatsoever to those of this License. You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute. However, you may accept compensation in exchange for copies. If you distribute a large enough number of copies you must also follow the conditions in section 3.
You may also lend copies, under the same conditions stated above, and you may publicly display copies.
If you publish printed copies of the Document numbering more than 100, and the Document's license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the back cover. Both covers must also clearly and legibly identify you as the publisher of these copies. The front cover must present the full title with all words of the title equally prominent and visible. You may add other material on the covers in addition. Copying with changes limited to the covers, as long as they preserve the title of the Document and satisfy these conditions, can be treated as verbatim copying in other respects.
If the required texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit reasonably) on the actual cover, and continue the rest onto adjacent pages.
If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machine-readable Transparent copy along with each Opaque copy, or state in or with each Opaque copy a publicly-accessible computer-network location containing a complete Transparent copy of the Document, free of added material, which the general network-using public has access to download anonymously at no charge using public-standard network protocols. If you use the latter option, you must take reasonably prudent steps, when you begin distribution of Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or retailers) of that edition to the public.
It is requested, but not required, that you contact the authors of the Document well before redistributing any large number of copies, to give them a chance to provide you with an updated version of the Document.
You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release the Modified Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it. In addition, you must do these things in the Modified Version:
If the Modified Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain no material copied from the Document, you may at your option designate some or all of these sections as invariant. To do this, add their titles to the list of Invariant Sections in the Modified Version's license notice. These titles must be distinct from any other section titles.
You may add a section entitled "Endorsements", provided it contains nothing but endorsements of your Modified Version by various parties--for example, statements of peer review or that the text has been approved by an organization as the authoritative definition of a standard.
You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) any one entity. If the Document already includes a cover text for the same cover, previously added by you or by arrangement made by the same entity you are acting on behalf of, you may not add another; but you may replace the old one, on explicit permission from the previous publisher that added the old one.
The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply endorsement of any Modified Version.
You may combine the Document with other documents released under this License, under the terms defined in section 4 above for modified versions, provided that you include in the combination all of the Invariant Sections of all of the original documents, unmodified, and list them all as Invariant Sections of your combined work in its license notice.
The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced with a single copy. If there are multiple Invariant Sections with the same name but different contents, make the title of each such section unique by adding at the end of it, in parentheses, the name of the original author or publisher of that section if known, or else a unique number. Make the same adjustment to the section titles in the list of Invariant Sections in the license notice of the combined work.
In the combination, you must combine any sections entitled "History" in the various original documents, forming one section entitled "History"; likewise combine any sections entitled "Acknowledgments", and any sections entitled "Dedications". You must delete all sections entitled "Endorsements."
You may make a collection consisting of the Document and other documents released under this License, and replace the individual copies of this License in the various documents with a single copy that is included in the collection, provided that you follow the rules of this License for verbatim copying of each of the documents in all other respects.
You may extract a single document from such a collection, and distribute it individually under this License, provided you insert a copy of this License into the extracted document, and follow this License in all other respects regarding verbatim copying of that document.
A compilation of the Document or its derivatives with other separate and independent documents or works, in or on a volume of a storage or distribution medium, does not as a whole count as a Modified Version of the Document, provided no compilation copyright is claimed for the compilation. Such a compilation is called an "aggregate", and this License does not apply to the other self-contained works thus compiled with the Document, on account of their being thus compiled, if they are not themselves derivative works of the Document.
If the Cover Text requirement of section 3 is applicable to these copies of the Document, then if the Document is less than one quarter of the entire aggregate, the Document's Cover Texts may be placed on covers that surround only the Document within the aggregate. Otherwise they must appear on covers around the whole aggregate.
Translation is considered a kind of modification, so you may distribute translations of the Document under the terms of section 4. Replacing Invariant Sections with translations requires special permission from their copyright holders, but you may include translations of some or all Invariant Sections in addition to the original versions of these Invariant Sections. You may include a translation of this License provided that you also include the original English version of this License. In case of a disagreement between the translation and the original English version of this License, the original English version will prevail.
You may not copy, modify, sublicense, or distribute the Document except as expressly provided for under this License. Any other attempt to copy, modify, sublicense or distribute the Document is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance.
The Free Software Foundation may publish new, revised versions of the GNU Free Documentation License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. See http://www.gnu.org/copyleft/.
Each version of the License is given a distinguishing version number. If the Document specifies that a particular numbered version of this License "or any later version" applies to it, you have the option of following the terms and conditions either of that specified version or of any later version that has been published (not as a draft) by the Free Software Foundation. If the Document does not specify a version number of this License, you may choose any version ever published (not as a draft) by the Free Software Foundation.
To use this License in a document you have written, include a copy of the License in the document and put the following copyright and license notices just after the title page:
Copyright (C) year your name. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with the Invariant Sections being list their titles, with the Front-Cover Texts being list, and with the Back-Cover Texts being list. A copy of the license is included in the section entitled ``GNU Free Documentation License''.
If you have no Invariant Sections, write "with no Invariant Sections" instead of saying which ones are invariant. If you have no Front-Cover Texts, write "no Front-Cover Texts" instead of "Front-Cover Texts being list"; likewise for Back-Cover Texts.
If your document contains nontrivial examples of program code, we recommend releasing these examples in parallel under your choice of free software license, such as the GNU General Public License, to permit their use in free software.
### cheater
: Reevaluation
### reevaluation
: Reevaluation
--escapes-compute
: T3
--escapes-display
: T3
--types-check
: T4 Samples
-T
: T4 Samples
access.cc
: src/translate, src/frame
access.hh
: src/translate, src/frame
assembly.hh
: src/codegen
AUTHORS
: The Top Level
codegen-tasks.cc
: src/codegen
codegen-tasks.hh
: src/codegen
codegen.cc
: src/codegen/ia32, src/codegen/mips
codegen.hh
: src/codegen/ia32, src/codegen/mips, src/codegen
color.hh
: src/regalloc
common.hh
: src
contract.hh
: src/misc
cpu.hh
: src/target
default-visitor.hh
: src/ast
depth_get
: T3 Code To Write
distcheck
: Making a Tarball
dynamic_cast
: Use of C++ Features
escape
: src/misc
escape.hh
: src/misc
escape_set
: T3 Code To Write
escapes::EscapesVisitor
: T3 Code To Write
EscapesVisitor
: T3 Code To Write
exp.hh
: src/translate
flowgraph.hh
: src/liveness
foo_get
: Matters of Style
foo_set
: Matters of Style
fragment.cc
: src/assem
fragment.hh
: src/assem, src/translate
frame.cc
: src/frame
frame.hh
: src/frame
gas-assembly.cc
: src/codegen/ia32
gas-assembly.hh
: src/codegen/ia32
gas-layout.cc
: src/codegen/ia32
gas-layout.hh
: src/codegen/ia32
get
: T4 Code to Write
graph.hh
: src/graph
graph.hxx
: src/graph
handler.hh
: src/graph
handler.hxx
: src/graph
havm
: T5 Primitive Samples
ia32
: src/codegen
ia32-cpu.hh
: src/target
ia32-target.hh
: src/target
instr.hh
: src/assem
interference-graph.cc
: src/liveness
interference-graph.hh
: src/liveness
iterator.hh
: src/graph
iterator.hxx
: src/graph
label.hh
: src/assem, src/temp
layout.hh
: src/assem
level-entry.hh
: src/translate
level-env.hh
: src/translate
level.cc
: src/translate
level.hh
: src/translate
libassem.cc
: src/assem
libassem.hh
: src/assem
libcodegen.cc
: src/codegen
libcodegen.hh
: src/codegen
libparse.hh
: src/parse
libregalloc.cc
: src/regalloc
libregalloc.hh
: src/regalloc
libtranslate.cc
: src/translate
libtranslate.hh
: src/translate
libtype.hh
: src/type
liveness.cc
: src/liveness
liveness.hh
: src/liveness
location.hh
: src/parse, src/ast
malloc
: T5 Builtin Calls Samples
mips
: src/codegen
mips-cpu.hh
: src/target
mips-target.hh
: src/target
move.hh
: src/assem
oper.hh
: src/assem
parsetiger.yy
: src/parse
patch
: Given Tarballs
position.hh
: src/parse, src/ast
print
: T4 Code to Write
print-visitor.hh
: src/ast
put
: T4 Code to Write
regalloc-tasks.cc
: src/regalloc
regalloc-tasks.hh
: src/regalloc
regallocator.hh
: src/regalloc
runtime.cc
: src/codegen/ia32, src/codegen/mips
runtime.s
: src/codegen/ia32, src/codegen/mips
scantiger.ll
: src/parse
scope_begin
: T4 Code to Write
scope_end
: T4 Code to Write
set.hh
: src/misc
spim-assembly.cc
: src/codegen/mips
spim-assembly.hh
: src/codegen/mips
spim-layout.cc
: src/codegen/mips
spim-layout.hh
: src/codegen/mips
symbol.hh
: src/symbol
symbol::Table< class Entry_T >
: T4 Code to Write
table.hh
: src/symbol
target-tasks.cc
: src/target
target-tasks.hh
: src/target
target.hh
: src/target
tc
: src
tc.cc
: src
temp.hh
: src/temp
test-flowgraph.cc
: src/liveness
test-graph.cc
: src/graph
test-regalloc.cc
: src/regalloc
tiger-runtime.c
: src/codegen
timer.cc
: src/misc
timer.hh
: src/misc
translate-visitor.hh
: src/translate
translation.hh
: src/translate
type-entry.hh
: src/type
type-env.hh
: src/type
type::Error
: T4 Options
types.hh
: src/type
visitor.hh
: src/assem, src/ast
yaka@epita.fr
: Automated Evaluation