Language Design


The main concerns in the design of the language are:

Data Types

It makes sense to share the concrete data types of Java so that no type conversion has to be done at the translation or compilation stage. The basic types supplied by Java are:

Are all of these needed? It is probably not necessary to have all of these when writing introductory programs. The difference between a float and a double, or a short and a long is unlikely to be something that an introductory programmer should need to understand.

It would be useful to have a string type. Strings in Java are not basic types but objects. This can probably be made transparent to the introductory programmer.

A more suitable set of basic types might be:

The term real has been used instead of double. It is necessary to have a floating point data type, but over complicated to have more than one at different precisions. A double is more flexible than a float, so it makes sense to use a double if only one of the two types is going to be included. It would be confusing to use the word float to represent this, as it would get translated to a Java double rather than a Java float. However, it does not seem sensible to call it double if there are no other floating point precisions available. This will only lead to the question "What is a single?". Calling the type real seems a good solution. An introductory programmer is much more likely to be familiar with the concept of a real number than a double precision floating point number.

Contentious point

Should string have a capital letter? In the introductory language a string is presented as a basic type. None of the other basic types have capital letters. In Java, String is a class, not a basic type, and classes start with capital letters (String). If string is left uncapitalised then it may be confusing when moving to Java. If it is capitalised, it will be incongruous, and require a complex explanation as to why it is that way. At the current time, I am of the opinion that string should not be capitalised.

Variables and Constants

In Java variables are declared like this:

int a;
int b = 100;

This seems good and simple, and will be used in the introductory language. The use of the semi-colon as a statement separator will be considered later.

In Java constants are declared like this:

{ public } static final int a = 100;

This seems complicated and difficult to explain. In the introductory language a const keyword will be provided, which will map to static final in the translation to Java.

const int a = 100

Records and User Defined Types

Records (structures containing a set of other types) e.g. a Point containing an x and a y co-ordinate, both of a basic type, are useful. In Java this would be a class.

A large part of programming and software engineering is about finding a good abstraction model. Writing a program to fit this model is helped a lot by the provision for user defined Abstract Data Types. These are again implemented using classes in Java.

A point class in Java:

class Point
{
   int x;
   int y;
}

In Java, the contents of the record (object) are accessed in the following way, assuming p is a Point:

p.x = 4;
p.y = 3;

This method of defining a class (using a class keyword), and the use of the dot operator to access member variables, is fairly simple and widely used across a variety of languages, so it seems a good idea to keep these for the teaching language. The use of curly brackets to delimit the class definition seems an acceptable approach. Taking this approach here suggests using curly brackets for delimiting all blocks of statements (loops ...) as they are used in Java.

In C or C++ there is provision to provide a new name for a type. For instance the name "age" could be assigned as a synonym for "int". This is done using a typedef. Although this helps with modelling a problem, there is no provision for using typedefs in Java. It would be difficult to write code in Java which a typedef would map to, so unfortunately it will not be included in the introductory language at the moment.

Declaring Variables of User Defined Types

In Java, objects (instances of classes) are created using the new operator, in the following way.

Point p = new Point();
p.x = 2;

This creates an object on the heap, assigning memory dynamically at runtime. In C++ objects can be created on the heap in the same way, or they can be created on the stack in the following way:

Point p;
p.x = 2;

This uses the same syntax as declaring a variable of a basic type. As the introductory programmer should not need to know the difference between creating an object on the heap and creating it on the stack, it seems sensible to use the C++ stack creation syntax for the creation of variables of all types. The differences between basic and non basic types are thus made as transparent as possible to the programmer, who can use them in the same way. As objects cannot be created on the stack in Java, a declaration such as Point p would have to be converted to Point p = new Point() by the translator.

Arrays

Arrays are fairly fundamental data structures. They are usually referenced by an identifier and an index. For instance in C or Java, a number (or expression) is used in square brackets after the identifier, eg

numbers[4] = 12; 
This seems a fairly straightforward notation. In Turing, a similar syntax is used, but with round brackets instead of square. To preserve consistency with Java, square brackets will be used.

In Java, arrays are objects, and need to be created using the new operator. eg

int numbers[] = new int[12];

String names[];
names = new String[12];

In the introductory language, arrays will be declared without this, in a C style:

int numbers[12];

and converted to the correct Java by the translator.

Statement Separators

To determine where one statement ends and another begins, some sort of delimiter is needed. It would be possible to use a newline for this, putting each statement on a different line. However, it is quite often desirable to change the layout of the program code to make it easier to read and show its structure more clearly. Using newlines to define new statements makes this difficult. Adding white space may cause a working program not to function. An alternative, as used in C, C++ and Java, is to use a semicolon (or some other special character) to separate statements. This allows arbitrary newlines to be inserted without affecting the function of the program, although it does mean that an extra character has to be added after each statement, and this does not look as tidy as it would without. The fact that Java uses a semicolon as its statement separator swings the decision to use a semicolon in the teaching language also. It is often the case that a lot of compiler errors in Java, and even more in C/C++, are caused by the programmer forgetting a semicolon at the end of a line. If the programmer gets into the habit of putting semicolons in right from the start then this may be reduced.

Conditionals

Conditionals are a fundamental part of any programming language. The most useful and generic construct is if .. then .. else. The Java syntax for this seems straightforward enough to include in a language for teaching, so the following syntax will be used:

if ( condition - a boolean expression )
{
   statements ...
}
else
{
   statements ... 
}

Case

Case is a useful construct to prevent programmers having to write:

if ( a == 1 ) { statements }
if ( a == 2 ) { statements }
if ( a == 3 ) { statements }
if ( a == 4 ) { statements }
In Java the case construct is called switch and has the following syntax:
switch ( a )
{
  case 1 : 
           text = "first case";
           break;
  case 2 :
           text = "second case";
           break;
  case default :
           text = "no other cases match";
           break;
}

If a does not match any of the cases, the default case is selected. The break statements are used to stop execution at the end of each case and jump to the closing curly bracket. If the break statement was not included at the end of case 1, and case 1 was matched, after case 1's statements had been executed, case 2's statements would also be executed, and so on until a break statement was reached. Some programmers find it an annoyance to have to include a break at the end of each of their cases, but it does allow for the possibility of leaving them out on purpose in order to let the execution drop through to the next case. In order not to prevent programmers from using this technique if they want to, it will be left to the programmer to put in the break statements rather than having the translator put them in automatically.

Loops

There are four common types of loops: while loops, repeat .. until loops, generalised loops and for loops. The first two are quite similar. With while the test for exiting the loop is done at the top (so the body of the loop may not be executed) and with repeat .. until the test is done at the end, so the body is always executed at least once. The Turing programming language provides a generalised loop in which the programmer can put the exit condition at the top or the bottom (or anywhere in the middle!) to determine the way in which the loop works. Most repeat .. until loops can be rewritten as while loops, so to aid consistency only a while will be provided.

while ( condition - boolean expression )
{
    statements
}

The other sort of loop to be considered is the for loop. This gives a number of iterations using an index variable. For example for i from 1 to 10 or for i from 10 to 1. The syntax for this in Java is:

for ( i = 1 ; i <= 10 ; i++ )
{
    statements
} 

This provides maximum flexibility from one construct without increasing complexity to do more sophisticated things (e.g. increment in different steps or have complex loop termination conditions).

Turing takes a slightly less sophisticated approach, using:

for i : 0 .. 9
    put i
end for

The Java approach provides much more flexibility, but the Turing syntax is far simpler. It is a difficult decision which of these is the more important consideration. I would argue for simplicity, consistent with all of the features of this new language, but at the same time it would be easy to make the language too restrictive, therefore not allowing more sophisticated programs to be written. While the language should be kept simple, it is also important that a large number of problems can be solved and techniques applied using it. My proposed syntax for the Kenya for loop is as follows:

for i = 0 to 9
{
    print i;
}

or

for decreasing i = 9 to 0 step 2
{
    print i;
}

The second case gives a loop counter which is decremented by 2 at each iteration of the loop.

It may be possible to allow both this format and the Java style format of the loop in order to let the programmer (or the teacher) choose which they feel is more useful or applicable in a certain case.

Procedures and Functions

Java calls procedures and functions "methods". Procedures are just functions which do not return a value (they return "void"). All methods are members of classes. Unless a method is declared static, it can only be called if an object of the class of which it is a member has been created. As at the current time the introductory language will not support object-oriented programming, all methods should be members of the class in which main is defined, and be declared static with package access, so that they can be called from any part of the program.

Input and Output

Textual output in Java, on the console at least, is most easily achieved by using the library functions System.out.println() and System.out.print() to print a line of text (with a newline at the end in the case of println()). To hide the library, the teaching language will provide functions print and println() which will translate to calls of System.out.print() and System.out.println() respectively in the Java.

Doing console input in Java is not simple. The cleanest way found to read say an integer from the keyboard into an integer variable is to do something like the following.

        try {
           java.io.BufferedReader stdin = new java.io.BufferedReader(new java.io.InputStreamReader(System.in));
           String line = stdin.readLine();
           int i = Integer.parseInt(line);
        }
        catch ( java.io.IOException e ){ System.out.println(e); }
        catch ( NumberFormatException e ){ System.out.println(e); }

At the moment I think that the best way to deal with input is to provide functions called things like readInt() and readString(). If the user includes these in their program, a function wrapping code similar to the above will be included in the Java source code, and called at the relevant point. Another option would be to have a generic read() function which would translate to different Java functions depending on the type of the variable to which the result of the read() is being assigned. This is more complex to implement.

Operators

The following operators will be provided:

Generics

Generics[14] are a sophisticated concept. Java is an object-oriented language and (almost) everything is an object (i.e. it extends the class Object). When programs deal with large numbers of objects, they tend to hold them in various kinds of containers (like Vectors, HashMaps etc). Any sort of object can be put into a container and got out again later. However, if you put in say a Dog, where Dog is a class that you or someone else has defined, and try to get it out again, you get out an Object. This is because containers hold Objects. They do not remember the more explicit type of each Object put into the container, and so they can only give an Object back. It is up to the programmer to remember the type of the objects they put into the container, and convert them back to this type using a cast.

In Java this looks like:

Vector v = new Vector();
Dog d = new Dog();
v.add(d); 

Dog e;
e = (Dog)v.elementAt(0);

The cast is the bracketed (Dog) after the assignment operator on the last line. This coerces the Object which comes out of the vector to a Dog, so that it can be assigned to e.

It is somewhat annoying for the programmer to have to remember the type of the objects in a container and cast them whenever they are extracted. A solution to this problem would be to have a class called DogVector which only contained Dogs. We could then be sure that any object extracted from a DogVector would be a Dog and therefore no cast would be necessary. However, there will be other types of objects that programmers will want to store in vectors as well as Dogs (in fact anything that is an Object) and using this approach a different class would have to be written for each container for each type of object to be contained. Every time a programmer defined a new type they would have to define a new set of containers to put them in.

Generics offer a solution to this problem by providing the possibility of having containers that are parameterised by type. That is, we can say we want a Vector < A > . This means we want a Vector, but that everything it contains will be of type A (where A could be Dog, Date, String ... ). The parameterised container then deals with any type coercion necessary.

C++ offers generics in the form of templates. At the moment[15] Java does not have generics, but compilers are available which will compile a superset of Java, including parameterised types. GJ (Generic Java)[16] is such a compiler. It would not be difficult to produce GJ code as the translation from Kenya rather than Java (GJ is a superset of Java, so the code would be pure Java if generics were not used in the Kenya code). This would allow programmers to use the feature, removing much of the need for casting, one of the less elegant features of Java.

The use of parameterised types is quite an advanced concept, and it is questionable whether they should be included in a teaching language. However, I think that they should be included as the novice can choose not to use them. When they do come to work with containers, the concept of the parameterised type can be explained just as easily as the need to cast objects when they are extracted from containers.