Programming with GNU Software

The phases of compilation warrant a further logical division; that of splitting the phases into a front and back end. The front end is concerned with the source language, and it's purpose is to preprocess the source file and derive a parse tree and to perform lexical, syntactic and semantic analysis on the code. This parse tree is then converted into the intermediate language, which is passed to the back end. The back end, on the other hand, is concerned with taking the (language-independent) code from the front-end, and preparing it in such a way that it can be converted into machine-dependant code. This is represented in diagram FIXME: figures/gcc/gcc-front-and-backend.png

Splitting it into two such sections is an efficient way of designing a compiler. Because the front end can pass on intermediate code to the back end, there is no need for any kind of language dependance on the part of the back end. All it sees is the intermediate language. You can also apply optimizations to this intermediate language. So what advantages do we gain from this? To begin with, applying optimization to the compilers language-independent code means that for any language you add to the compiler, all you have to do is parse it so that you can produce the intermediate code, and thus optimization does not need to be catered for specifically for each language. Also, it is easier to add new languages to the compiler: just write a suitable front-end that converts the source into some intermediate language (in the case of gcc, RTL), understood by the back end, and pass it on to the back end. The result is that you do not have to rewrite the compiler for each new language, because you have opted to use an intermediate language which saves you (greatly) in the long-run. For each platform you write a back end, all you need to know about is the intermediate language, and how to perform machine-based optimizations on it for the platform architecture.

This is the approach that GCC uses. When the parse tree is produced, RTL is produced as the tree is parsed. So regardless of whether we are compiling C, C++, Java, or any other source file that GCC is aware of, all the back end sees is RTL. The result is that we can provide optimizations to RTL, thus only having to write the language-to-RTL front-end, and by providing a machine description for each architecture, the object code can easily be produced without much bother, from RTL. We simply apply optimizations on the RTL based on the architecture we're using. The end result is that for j languages and k architectures, we only need to put in j+k effort because of the benefits of performing optimization on the intermediate language, and each architecture will have a mapping from RTL to machine code¹.

The result is that GCC provdies a versatile collection of compilers for many different languages and architectures, neatly modularized and packaged. The next section will deal pragmatically with the GNU compiler collection, and emphasis will be placed on how the material here relates to GCC.

Notes de bas de page

Contrast this with having to write an optimzier for each langauge for each architecture - for j languages and k architectures, we'll have to put in j*k effort.

Front and Back Ends

Notes de bas de page