Implementing a Language

Scanning

Search for tokens: the expression sum=a+b contains 5 tokens - sum, =, a, +, and b.

Tokens identification algorithm:

repeat until the end of the line:
skip over blanks in the source string

if a letter is seen then
   save it and all following characters, up to a blank or end of line
else if a digit or "-" or "+" is seen then
   save it and all following characters, up to a blank or end of line
else if a special symbol (like =) is seen then
   save that symbol
else
   something wrong is happened - send an error message

send what has been saved to token storage

After the tokens have been identified, they are encoded by some internal code.

Parsing

The translator scant the list of tokens and tries to make is a sense. It constructs so-called parse-tree.

Examples:

A parse tree for x*(2+y) A parse tree for sum=a+b
       2   y
        \ /
     x   +
      \ /
       *
       a   b
        \ /
    sum  +
      \ /
       =

Here for each operation we have a box and one or two dependent tokens, that are the arguments for this operation.

Code Generation

The translator uses the parse trees and the following algorithm to generate code (for sum=a+b):

  1. Using the table of symbols produces during the scanning phase, recognize that a, b, and sum are variables and assign a memory location to each of them.

  2. The top box correspionds to STO. The right side is a memory location, say 6. Generate
       STO 6
    
  3. The right side is an instruction. Its right side corresponds to b, which we have assigned memory cell 5.
       ADD 5
    
  4. The left side is a. The variable a has been assigned cell 4, so we generate
       LOD 4
    
    to put it in the accumulator.

  5. Output the generated instructions in the backwards order.
       LOD 4   ;place a into ACC
       ADD 5   ;ACC=a+b
       STO 6   ;save the result
    
  6. Encode the obtained instructions in binary.
       00000100 00000100
       00000000 00000101
       00000101 00000110