The phases of compilation warrant a further logical division; that of
splitting the phases into a front and back end. The front end is
concerned with the source language, and it's purpose is to preprocess
the source file and derive a parse tree and to perform lexical,
syntactic and semantic analysis on the code. This parse tree is then
converted into the intermediate language, which is passed to the back
end. The back end, on the other hand, is concerned with taking the
(language-independent) code from the front-end, and preparing it in such
a way that it can be converted into machine-dependant code. This is
represented in diagram FIXME:
Example 1: Front and back ends of a compiler
Splitting it into two such sections is an efficient way of designing a
compiler. Because the front end can pass on intermediate code to the
back end, there is no need for any kind of language dependance on the
part of the back end. All it sees is the intermediate language. You can
also apply optimizations to this intermediate language. So what
advantages do we gain from this? To begin with, applying optimization to
the compilers language-independent code means that for any language you
add to the compiler, all you have to do is parse it so that you can
produce the intermediate code, and thus optimization does not need to be
catered for specifically for each language. Also, it is easier to add
new languages to the compiler: just write a suitable front-end that
converts the source into some intermediate language (in the case of
gcc
, RTL), understood by the back end, and pass it on to
the back end. The result is that you do not have to rewrite the compiler
for each new language, because you have opted to use an intermediate
language which saves you (greatly) in the long-run. For each platform
you write a back end, all you need to know about is the intermediate
language, and how to perform machine-based optimizations on it for the
platform architecture.
This is the approach that GCC uses. When the parse tree is
produced, RTL is produced as the tree is parsed. So regardless of
whether we are compiling C, C++, Java, or any other source
file that GCC is aware of, all the back end sees is RTL. The
result is that we can provide optimizations to RTL, thus only
having to write the language-to-RTL front-end, and by providing a
machine description for each architecture, the object code can easily be
produced without much bother, from RTL. We simply apply
optimizations on the RTL based on the architecture we're using. The
end result is that for j
languages and k
architectures, we
only need to put in j+k
effort because of the benefits of
performing optimization on the intermediate language, and each
architecture will have a mapping from RTL to machine
code1.
The result is that GCC provdies a versatile collection of compilers for many different languages and architectures, neatly modularized and packaged. The next section will deal pragmatically with the GNU compiler collection, and emphasis will be placed on how the material here relates to GCC.