If a purely mechanical translation from Kenya to Java is carried out, and the resulting Java code presented to the Java compiler, any mistakes in the Kenya code will be echoed or amplified in the Java code and cause compilation errors. The errors reported by the Java compiler will be in relation to the Java code only and so will be useless, and probably confusing, to the novice programmer who is working with Kenya. They may not be interested in the Java stage at all. Errors during compilation will therefore signify nothing more useful than "There is a bug in your program." To aid this situation, the system should make checks for the correctness of the Kenya program before carrying out the translation.
The Dragon book[8] says "A compiler must check that the source program follows both the syntactic and semantic conventions of the source language. This checking, called static checking (to distinguish it from dynamic checking during execution of the target program), ensures that certain kinds of programming errors will be detected and reported." In the Kenya system two different static checks are considered: uniqueness checks and type checks.
Variables, constants, functions and types must have unique names in order that the compiler can tell which the programmer is referring to. For instance, the following Kenya code should not be considered to be a valid program.
int a; a = 4; int a; a = 6; print a;
Here two variables with the name "a" have been declared. This error should be reported. Generating Java from this code blindly, without checking for the uniqueness of identifiers, would produce Java code that will not compile properly.
It is possible to have variables with the same name in different scopes. A local variable inside a function may have the same name as a variable in the main body of the program. Any use of the variable name inside the function body will refer to the local version, leaving the previously declared version unaltered. Once the program's execution has left the scope of the function, references to that variable name will refer to the global version again.
It is also possible to give a field of a class the same name as a global variable. When referring to the field the programmer will add a qualifier which will distinguish it from the global variable, as in the following example:
class Dog { int age; } Dog rover; int age; age = 22; rover.age = 6;
The possibility of having the same name refer to different variables in these situations must be taken into account when performing uniqueness checks.
Type checking is the process of analysing a program to ensure that the types of expressions are consistent. For instance if a variable is declared as being of type int then it should not be assigned a real value (or a string or any other type). To perform these checks the verifier needs to keep a record of the type associated with each name and check this type each time the name is referenced in the program.
Another check which should be made when a variable is declared is that it is assigned a type which exists. If the programmer declares a variable of a user defined type, then they must also have defined the type. This check will also pick up simple typographical errors in type names.
There are two possible approaches which can be taken to verify that the types in a program are correct. One is known as type checking and the other type inference. These are best illustrated in terms of an example. Say we are checking the following code fragment:
int a; a = 4 + 5.7;
Let us firstly take a type checking approach. Here, the variable a
has been declared
as having type int
. We consequently examine the assignment statement. We know that
a
has type int
so we check that the assignment which is being assigned
to a
also has type int
. For an addition to have type int
both of the addends must also be of type int
. First
we check 4, which is an integer as we expect, but then we check 5.7, which is a real number, and
therefore we have a type error.
A type inference approach works the other way around. We expand the expression as far as possible
and find the types of the end nodes, in this case 4 and 5.7, int
and real
respectively. These two are added, and so we infer that the result must have type real
.
This real value is assigned to a variable, and so this variable should also have type real
,
but we find that it does not, and so again we have a type error.
I have chosen to take a type checking approach in implementing the Kenya type checker.
The parser creates an abstract syntax tree representing the program as described in a previous section. The code which SableCC produces promotes the use of the Visitor pattern. This means that a TypeChecker object can be passed to the nodes of the abstract syntax tree in order to process it without writing any code inside the classes from which the tree is built. This offers the advantage of being able to alter the implementation of the type checker independently of the tree. If the grammar of the language is changed and the parser and AST classes regenerated, then only code in the TypeChecker class needs to be altered to take account of this change, rather than having to change code distributed throughout a number of different classes.
Each subclass of Node generated by SableCC has a method apply()
. This method is void, (i.e. it
does not return a value). Initially this seems to present a problem when trying to write a type checker as an
obvious approach would be to process the nodes of the tree recursively by calling a method which returned
a type. This is especially true if a type inference approach is taken. However, it is possible to overcome this
when using a type checking approach by including an attribute in the TypeChecker class in which the expected
type is stored. The type checker object is passed to the node to be checked, its type is compared to that
which is expected, and another boolean attribute is set, depending on whether or not the types match.
A record needs to be kept of the type associated with each name referenced in the program. This is done by constructing a symbol table, and adding an entry to this table each time that a variable declaration is encountered while traversing the tree. The symbol table is implemented using a hashtable, and each entry is indexed by the name. The symbol table contains information about the type associated with each name, and also whether it was declared as a variable or a constant. It should be ensured that assignments are not made to constant values. Uniqueness checks are performed by checking that the key does not already appear in the hashtable before adding a new entry.
When a variable of a type which is a class is defined, as well as noting the type of this variable in the symbol table, the types of all of the attributes inside the class must also be recorded. This could be done by including the fully qualified names of all accessible attributes in the symbol table, but this is an inefficient implementation. It should not be necessary to repeat the information about the types of attributes in the table every time a variable of a user defined type is declared, as each of these will have the same set of attributes with the same types, even if they are assigned different values.
To avoid this, the approach which I have taken
is, whenever a new class is defined, to create a separate symbol table for the scope of that class, and whenever
a variable of that type is declared, to insert a reference to that local symbol table into the global symbol table.
When a lookup is required, the qualified name can be split where the dots appear (as a qualified name will
be of the form varname.attribute
), the left hand side used to index the global symbol table,
and the right hand side used to index the class specific one to find the attribute information. This indirection
can be applied recursively so that classes may be nested to any depth desired by the programmer.
A set of known types is also maintained, so that whenever a declaration is encountered a check can be made to ensure that the type to be associated with the name is valid.