Writing a small compiler with SableCC and Java
Introduction
This article covers the implementation of a compiler using the SableCC framework and the Java programming language. The language we are going to implement is a subset of Pascal that we will call SmallPascal. This is a very small language that can be used as a calculator. In other words, the language supports the integer data type, reading from standard input and writing to the standard output. For those that come from a Microsoft Windows background, the standard input in Unix based operating systems is normally represented by the keyboard, and the standard output by the screen. In other words, our compiler will work in the command line and will read data from the keyboard and display text in the screen.
The SableCC framework is a piece of software that simplifies compiler writing. It does include support for writing lexical analysers (or lexers) and syntactic analysers (or parsers). The parser generates a tree that represents your grammar. This tree can then be manipulated by tree walkers known as visitors (for more information about this, please read Gagnon (1998)), in order to create a semantic analyser, code generator and code optimizer. In this article we will only cover the semantic analyser and the code generator. In future articles I will demonstrate more advanced compiler writing techniques.
Now that we know what this article is all about, let’s define the steps required to implement our compiler. Gagnon (1998) defines the following 5 steps one must follow in order to implement our compiler:
- We create a SableCC specification file containing the lexical definitions and the grammar for the language being designed.
- After creating the SableCC specification file, we generate the framework by launching SableCC on the specification file.
- Once the code is generated, we generate the working classes. This is the only code we have to write in JavaTM. It is in this step that we write the semantic analyser, the code generator, and possibly the code optimizer. In the case of an interpreter, we write a single class. These working classes may be subclasses of one of the classes from the analysis subfolder generated by SableCC in the previous step.
- Following our tree walkers, we need to create our main class, known as the driver class of the compiler. It is used to activate the lexer, parser and working classes.
- Finally, after implementing the driver class, we compile the compiler with a JavaTM compiler.
Once the steps are described, we are now going to follow one by one in turn, in the following sections.
