The University of Iowa Department of Computer Science

The One Address Machine Programming Language

Higher-Level Programming Languages

Previously, we managed to write a fairly sophisticated program in OAM assembly language. While better than pure machine language (i.e., no labels or mnemonic instructions), writing assembly language is still tedious at best. Moreover, our OAM programs are still strictly tied to the OAM architecture, and can only run on the OAM machine. What we'd really like is a way to write our programs at an even more abstract level, and then automatically "translate" these programs into the appropriate machine instructions (just as the assembler translated assembly language into machine language). If our new language were suitably structured, our resulting programs would be able (by providing alternative translators) to run on a variety of machines, and not just on the OAM. This important feature is usually called "portability," and implies that your programs can run on just about any machine without having to specialize or tune them to a particular architecture.

In this section, we'll be working with a simple translator for an equally simple higher-level programming language called the "one address machine programming language," or OAMPL (pronounced "Oh, Ample!"). OAMPL is a fairly silly programming language, but it does give you the feel for how a translator for a more realistic programming language -- like Python -- might work. The neat thing about OAMPL is that it ties together how programs are created in higher-level languages with what you already know about executing low-level language programs such as OAM machine code. So, at least to this admittedly simplistic level, you can see how your programs written in Fortran, C, C++, Java, Lisp, Python or Perl (all higher-level programming languages) end up running on a real machine.

The OAMPL Programming Language

An OAMPL program consists of a series of OAMPL instructions. In some sense, OAMPL programs look a lot like OAM assembly code programs; one difference is in how complex the instructions are: a single OAMPL instruction may, in fact, correspond to many OAM assembly code instructions.

In our discussion of OAMPL, we're not really interested in creating a practically useful language; we just want to develop enough of OAMPL so that it is possible to understand the whole coding process, from writing the program in a higher-order language to executing the program on the OAM.

The easiest way to introduce OAMPL is with an example. Consider the following simple OAMPL program:

1. PRINT "Enter your age:"
2. READ AGE
3. PRINT "Years left until retirement:"
4. PRINT (- 65 AGE)
5. EXIT

As before, we'll number each line with a consecutive line number, which isn't really part of the program, but is there to aide our discussion. And, as before, we'll use the "#" character to indicate a comment meant for the reader, and not the machine.

This program begins by prompting the user to provide a value which is then stored in a "variable" (or designated memory location) called AGE. After reading the value, the program prints out the difference between 65 (the standard retirement age in the United States) and the value just read following a printed banner announcing the result.

Even just by looking at this somewhat silly program, we can tell that writing OAMPL programs is going to be a lot easier than writing OAM programs. Of course, the real trick is to get the program written in OAMPL to run on our standard OAM hardware.

No real machine executes higher-level programs directly; rather they can only operate with low-level programs written in the machine's own instruction set (in this case, OAM machine language). So once we write our OAMPL program, we need an appropriate means of automatically translating the program -- indeed, ANY program in OAMPL -- into OAM machine instructions without compromising the integrity of the algorithm the OAMPL program embodies.

A typically used automated translator is called a compiler. A compiler is itself a computer program that takes as input a program in one language (here, OAMPL) and produces as output an equivalent program in another language (here, OAM). Thus when we feed our OAMPL program as input to the compiler, the output is the equivalent OAM program, that is, the translation from OAMPL to OAM. My own simple OAMPL compiler produces the following 16 OAM machine language instructions when compiling these 4 OAMPL instructions.

# Emitted by the OAMPL compiler
1.      BR  3   # branch to program block
# Variable storage allocation block
2. AGE, NOOP    # variable AGE
# Begin OAM program block
# OAMPL: PRINT "Enter your age:"
3.      SET "Enter your age:"
4.      STA stdout
# OAMPL: READ AGE
5.      LDA stdin
6.      STA AGE
# OAMPL: PRINT "Years left until retirement:"
7.      SET "Years left until retirement:"
8.      STA stdout
# OAMPL: PRINT (- 65 AGE)
9.      LDA AGE
10.     STA 12
11.     BR 13
12.     NOOP    # intermediate value
13.     SET 65
14.     SUB 12
15.     STA stdout
# OAMPL: EXIT
16.     HLT

Without worrying too much for now about how these instructions were actually generated, one should note that many different, equally valid, such translations exist. All things being equal, we'd prefer a shorter translation, since fewer OAM instructions that accomplish the same thing imply (this is only approximately correct; we'll return to this topic later) that the resulting program will require less time to run. On the other hand, since the compiler must be able to compile arbitrary programs written in OAMPL, the OAM code actually emitted may seem quirky or obtuse to a human reader.

But we're getting ahead of ourselves. Let's describe a bit more of the OAMPL language, then get back to how the compiler is actually working.

OAMPL Instructions

OAMPL instructions fall into three basic categories; I/O statements, assignment statements and control statements. In the program just above, READ and PRINT are I/O statements, and the '=' sign signals an assignment statement (in fact, '=' is the only assignment statement in the OAMPL language). There aren't any control statements in the simple program above; we'll get to those later.

OAMPL I/O Statements

The OAMPL I/O statements are READ and PRINT. The READ statment takes a single argument, which must be a variable name. Variable names can be made up from any number of upper and/or lower case characters (programmers can therefore feel free to use descriptive names to indicate what the value of the variable means in their program). The READ statement causes a value to be read from the keyboard (using LDA) and then stored in an appropriately-selected memory location. We can then use the same variable name elsewhere in the program to refer to the value stored in the memory location, much like we used labels when writing assembly language code.

Just like the use of address labels in the OAM assembler, the introduction of named variables makes OAMPL a more convenient language to use, but it also adds some complication. One of the primary tasks of any compiler is to recognize the use of variables, decide where each variable should be stored in memory (usually called "memory allocation") and then make sure every reference to each variable refers explicitly to the correct memory location.

Memory allocation is actually not as simple as it may sound. In our first example of OAM assembly code (above), we allocated variable storage space at an arbitrary location (location 100). Unfortunately, we can't reliably get away with that strategy anymore, since we would have to know how long the OAM program generated by the OAMPL compiler will be! If our OAM program was more than 99 instructions long, storing a variable's value might destroy part of the loaded program.

A compiler generally makes more than one pass through the program; in the first pass, the compiler figures out how many variables are used in the program, and then decides where to allocate space for the variables in such a way so as not to clash with the OAM program. This is the approach we take in our OAMPL compiler as well: the compiler scans the whole OAMPL program first, detects the variables used, and then reserves enough memory locations to store the variable values.

Our compiler elects to allocate variable storage at the very beginning of the program. Thus the first instruction we will emit will always be an explicit branch around the memory reserved for variable storage. Here, since there is only one variable (AGE), there is only one memory location (location 2) assigned to contain a value of the variable (note the NOOP placeholder and the use of a label as well as an automatically-generated comment reminding us what is to be stored there).

# Emitted by the OAMPL compiler
1.      BR  3   # branch to program block
# Variable storage allocation block
2. AGE, NOOP    # variable AGE

If the AGE variable is encountered later in the OAMPL code, the compiler can choose to directly map the variable name to its storage address, or it might opt to instead leave the variable name and let the assembler perform the mapping. Here, we'll elect to leave the variable name alone in the interest of making the OAM code more humanly accessible.

Continuing to examine the output of the compiler on the sample program above, we can see that the OAM instructions corresponding to the READ statement are (note that the compiler emits a comment line containing each OAMPL statement as it emits OAM code):

# OAMPL: READ AGE
5.      LDA stdin
6.      STA AGE

Just as the OAMPL READ statement is associated with an OAM LDA from stdin, each OAMPL PRINT statement always has a corresponding OAM STA statment to stdout. This can be easily ascertained by looking at the OAM program produced by the OAMPL compiler above; the 3 OAMPL PRINT statements map onto exactly 3 OAM STA statements. Yet these three PRINT statements differ in the "kind" of value they actually print: the first two print some text, while the last one prints the result of a calculation.

The OAMPL PRINT statement takes a single argument, which might be (1) an explicit number, (2) a bit of text, (3) a variable name, or (4) a more complex expression composed of numbers, variable names, and arithmetic operations.

In the previous material on the OAM, we assumed the only data manipulated by OAM were integers, since the STA command copies the contents of the ACC to output, and since the ACC can only handle integers, the only values ever printed would also be integers. Here, we'll loosen that assumption just a bit, and assume that the OAM architecture allows us to store data other than just an integer in a given register or memory location. More precisely, we'll now assume that an OAM register or memory location can hold either a number or a string of characters, denoted by enclosing the string in double quote marks.

With this in mind, the following OAMPL PRINT statements:

PRINT 5
PRINT "Foo!"
PRINT A
PRINT (+ A A)

are all valid: in each case, the compiler's strategy is to ensure the appropriate value is placed in the ACC, followed by the appropriate OAM STA instruction to print that value to the screen. But how we get these values into the ACC depends on the types of the PRINT arguments.

In the first and second examples, the argument directly represents what it is to print (e.g., the number 5, or the brief nonsensical bit of text "Foo!", respectively). These are just like the first PRINT statment in our sample OAMPL program above:

1. PRINT "Enter your age:"

where the corresponding OAM code emitted is:

# OAMPL: PRINT "Enter your age:"
3.      SET "Enter your age:"
4.      STA stdout

Notice how the character string is simply SET into the ACC, and then written to the screen with the usual STA statment. The same procedure would work even if the thing you are printing is, e.g., the number 5 rather than a bit of text.

The second print statement in our sample OAMPL program:

3. PRINT "Years left until retirement:"

works exactly the same way, producing a similar set of OAM instructions:

# OAMPL: PRINT "Years left until retirement:"
7.      SET "Years left until retirement:"
8.      STA stdout

But the last of the three PRINT statements is quite different:

4. PRINT (- 65 AGE)

When the argument to an OAMPL PRINT statment is either a variable name or an arithmetic expression (as in this case), what we really want to print is the resulting value -- and not the variable name or the expression. So the job of the compiler in these cases is to parse the argument and emit appropriate OAM instructions that access the appropriate memory location (in the case of a variable) or that perform the appropriate arithmetic (in the case of an expression), leaving the result in the ACC.

Handling OAMPL Expressions

An OAMPL expression can be a number, a variable, or an operator with one or two operands (themselves expressions) enclosed in parenthesis. OAMPL supports four arithmetic operators (+, -, *, and /); operators generally require two operands (the exception is -, which may optionally take only one operand) and are enclosed in parenthesis. The following are all legal OAMPL expressions:

13
AGE
(+ 13 B)
(- 22)
(* (+ 2 C) (/ (- A) 3))

Note that OAMPL expressions feature prefix operators, that is, the operator is written before the operands. All binary operators (those taking two operands) have the form

(operator operand1 operand2)

while unary operators have the form

(operator operand)

Of course, the operands may themselves be OAMPL expressions, so things can get nested pretty deeply.

To produce the appropriate set of OAM instructions, it is necessary to parse the expression given as the argument of the PRINT statement. Writing a parser for a compiler is quite a bit easier than writing a parser for a natural language such as English or Italian, but it can still be quite difficult. Fortunately, it's not so hard here in part because of our choice of prefix operators.

Here's how we go about this parsing process; let's consider the OAMPL statement from our sample program above:

4. PRINT (- 65 AGE)

The idea is that as the compiler ``reads'' this expression, it should produce the appropriate OAM code that, when executed, computes the value of the expression and leaves result in the accumulator. The value is then written to the screen with an STA 0. Our general strategy is to emit the OAM instructions to compute the value of the second operand, store this value (called the intermediate result) in a temporary memory location (as if it were a variable), compute the value of the first operand, and then subtract the intermediate result (the stored value of the second operand).

Whew! In other words, the OAM code emitted follows a template that looks like:

...instructions to place value of second operand for SUB instruction in ACC...

n.   STA n+2       # Place intermediate result in location n+2
n+1. BR  n+3       # Skip intermediate result
n+2. NOOP          # Place holder

...instructions to place value of first operand for SUB instruction in ACC...

m.   SUB n+2       # Subtract intermediate result from n+2
m+1. STA 0

Which leaves the value of the overall expression stored in the accumulator, suitable for writing out to the screen with the final STA instruction.

In our sample program, the OAMPL PRINT instruction above would expand to the following OAM instructions, assuming the AGE variable had been seen before and its value stored in memory location 2:

# OAMPL: PRINT (- 65 AGE)
9.      LDA AGE
10.     STA 12
11.     BR 13
12.     NOOP    # intermediate value
13.     SET 65
14.     SUB 12
15.     STA stdout

The emitted code seems pretty inefficient. We could have as easily accomplished the same thing using the following (much shorter) sequence of OAM statements:

# OAMPL: PRINT (- 65 AGE)
9.      SET 65
10.     SUB AGE
11.     STA stdout

given that the second operator, here AGE, is already occupying a memory location. But when writing a compiler, it is essential that the compiler work for any legal input program, and not just a few examples. The solution we use is completely general; even if the second operand were itself a complicated expression (rather than a simple variable name), the code emitted would always work, even if it isn't quite as parsimonius as that which one would craft by hand.

Assignment Statements

The assignment statement takes two arguments, where the first argument, to the left of the equal sign, is always a variable name, and the second argument, to the right of the equal sign, is an OAMPL expression whose value is computed and stored in the named variable.

Control Statements

Aside from the previously mentioned OAMPL EXIT command, which simply emits the OAM HLT instruction, OAMPL provides LOOP/END and IF/ELSE/ENDIF control constructs.

The OAMPL control statements LOOP and END support the notion of iteration. For example:

1. READ A
2. LOOP A
3. PRINT "Foo!"
4. END
5. EXIT

will echo the string "Foo!" to the screen a number of times equivalent to the value read for A.

There can be any number of OAMPL instructions between LOOP and END; we'll call those instructions the body of the LOOP. As you might have guessed, the OAMPL LOOP statement will expand into machine code that relies on the OAM BRP instruction. Our strategy is to evaluate the expression given as the argument to the LOOP statement and store its value in a memory location we'll informally call the loop counter. We then repeat the body of the loop, decrementing the loop counter each time we complete a pass through the body of the loop. When the loop counter gets to zero, we quit the loop and proceed.

There are two complications. First, the LOOP should never perform the instructions in the body of the LOOP if the LOOP expression evaluates to zero or less. Second, the compiler should allow LOOP/END pairs to be nested, one loop inside another.

The compilation proceeds as follows. First, compute the value of the expression that is the argument to LOOP, leaving the result in the accumulator. Now things get a little complicated. While we might like to store this value in memory and then proceed through the body of the loop, we must allow for the situation where the input value A is 0 or less. In this case, we don't want to execute the instructions in the loop body at all.

To allow for this possibility, we're going to branch directly ahead to the OAM code that corresponds to the loop's matching END statement, where we'll take care of storing and checking the value of the counter. If the value is zero or less, we won't come back and execute the loop body at all.

This will all be clearer with an example. Here is the OAM code emitted for the OAMPL program just shown:

# Emitted by the OAMPL compiler
1.      BR  3   # branch to program block
# Variable storage allocation block
2. A,   NOOP    # variable A
# Begin OAM program block
# OAMPL: READ A
3.      LDA stdin
4.      STA A
# OAMPL: LOOP A
5.      LDA A
6.      BR L7
7.      NOOP    # loop counter
# OAMPL: PRINT "Foo!"
8.      SET "Foo!"
9.      STA stdout
# OAMPL: END
10.     LDA 7
11.     DEC
12. L7, STA 7
13.     BRP 8
# OAMPL: EXIT
14.     HLT

In statements 5-7, we compute the initial value of the loop counter (here, just the value of variable A), leaving the value in the ACC, and branch ahead to automatically-generated label L7. We also insert a NOOP statement in memory location 7 to contain the loop counter.

In statements 12-13 (where 12 is label L7), we store the value of the ACC into memory location 7, and, if the value is greater than zero, branch back to the first statement in the body of the loop, statements 8 and 9.

Once we execute the loop body the first time, we reload the loop counter into the ACC, reduce its value by one using DEC, and then store the new value back into the loop counter. After checking the termination condition, we branch back and repeat the loop body again if need be.

IF/ELSE/ENDIF and, more simply, IF/ENDIF, work in a similar fashion. The expression part of the IF statement is evaluated, and, if the value is not zero, the OAMPL instructions following the IF are executed until the ELSE or ENDIF are encountered. If the expression instead evaluates to zero, the instructions following the ELSE (if present) are executed instead. Execution then continues starting with the OAMPL instruction immediately following the ENDIF. Like the LOOP, IF constructs can be nested one within the other.

Of Optimizing Compilers, Interpreters, and Errors

There's much more to the OAMPL language and the OAMPL compiler than what we've described: writing a compiler is well beyond the scope of this book. But what we've seen so far should suffice to illustrate the challenge of compilation in the general sense, and the examples we've explored should serve to illustrate the entire process from higher-level language program to execution on a particular machine. Indeed, the OAMPL compiler we've seen is actually pretty simple as far as compilers go. Things get even more complicated as the language we're compiling gets more complex.

But well beyond the complexity of the source language (OAMPL in this case), most compilers must also deal with performance issues. Assuming that each OAM instruction takes roughly the same amount of time to execute, a shorter OAM program will run faster than an equivalent but longer OAM program. We've seen an example of this before; recall how the OAMPL instruction

4. PRINT (- 65 AGE)

generated 7 instructions, when 3 would instead suffice:

# OAMPL: PRINT (- 65 AGE)
9.      SET 65
10.     SUB AGE
11.     STA stdout

How is it possible to shorten the OAM code produced and still get the same effect? As noted previously, in the second version, we never store the value of AGE anywhere except in the accumulator. In order to use the shorter OAM program, we would have to ascertain that (1) we have no further use for the AGE variable's value anywhere else in the program, and (2) we aren't doing anything between the READ and the PRINT that compromises the integrity of AGE's value which was temporarily parked in the ACC.

In this example, the second condition is trivially true; there are no other OAMPL statements between the READ and the PRINT. The first condition is also true, and it is relatively easy to verify in this short a program. It gets much harder to verify in a longer OAMPL program. Normally, optimizing compilers produce a first draft of the compiled code, and then reexamine the code they produce in search of simplifying transformations. Some transformations simply take advantage of special instructions in the machine architecture (using INC, for example, instead of ADD to increment by 1), while other simplifying transformations can be quite complex, even involving what amounts to a reordering of the original OAMPL instructions (where such reorderings don't compromise the program's behavior). Yet other transformations are very machine-dependent; in other words, the optimizing compiler needs to know a lot about the exact structure of the target computer in order to make effective changes. While good optimizing compilers are hard to write, they offer enormous benefits in terms of speed of programs produced. The tradeoff is that they usually need to make multiple passes through the source program and therefore tend to be themselves slower (i.e., take longer to produce compiled code) than normal compilers. But we are generally happy to wait a bit longer on the compiler in exchange for a compiled program that runs more quickly, as we are likely to run the compiled code more than once.

Just the opposite is true of another type of language translation system called an interpreter. Interpreters, like compilers, also translate your source program into assembly instructions so that they can be executed by the underlying hardware. The difference is that interpreters do so on the fly; they don't make multiple passes through the source data, since they translate the code into machine code and execute the resulting machine instructions at the same time!

Among the advantages of interpreters is the fact that you don't have to wait to try your program until after you've compiled it. Remember, if an optimizing compiler is making multiple passes through your program you may have to wait quite a long time to try it; interpreters give you instant gratification and lead to a more interactive type of programming experience. The disadvantage is that machine code produced by an interpreter tends to be much less efficient (no optimizations!) and always slower in the long term; you only need compile your program once, but you must continue to interpret it each time you want to run it.

Another advantage of an interpreter is evident only when you write poor OAMPL code. Let's say you make a mistake, and attempt to divide a number by zero. Ideally, your computer shouldn't shut itself off; instead, it should issue an error and fail in a graceful fashion. If you get an error running interpreted code, you know exactly where the error was; it occurred in the last instruction you translated! If you're running code produced by an optimizing compiler, you may not even be able to tell what OAMPL statement caused the error due to the many optimizing transformations applied to the compiler's original ``rough draft.''

There are other tricks we can play with compiler technology. A retargetable compiler is a compiler that is constructed in a modular fashion, where the ``front end'' handles reading the program (including checking the program for legal syntax) and the ``back end'' handles generating code for the underlying machine. By building more than one back end, you can use the same front end to build more than one compiler. In a similar fashion, a cross compiler uses a back end that is for a machine other than the one running the compiler. With a cross compiler, you can develop code on your PC that is meant to run on a Cray supercomputer. The cost benefits are obvious!

Finally, a note about the popular higher-level language Java. One of the reasons behind the success of Java is its portability: it is availabile on multiple platforms (OK, so I'm ignoring other aspects, such as its network operability and so on, but our topic is, after all, compilation). For Java, this portability is achieved using an interesting combination of compilers and interpreters. When compiling Java, the compiler is actually producing an idealized machine language for a hypothetical Java Virtual Machine, or JVM. This JVM code, which is quite a bit more limited than the Java language, is then interpreted by a simple platform-dependent JVM interpreter. So in this case, as long as a JVM interpreter exists for your target machine, you can execute compiled Java code (e.g., JVM code) without further compilation.

Last modified: Fri Nov 5 2010

The University of Iowa / Department of Computer Science / alberto-segre@uiowa.edu