# README
compiler
The compiler
package is used to compile text in the APP language into
bytecode that can be executed using the bytecode
package. This allows for
compiled scripts to be integrated into the application, and run repeatedly
without incurring the overhead of parsing and semantic analysis each time.
The APP language is loosely based on Go but with some important differences. Some important attributes of APP programs are:
- There are no pointer types, and no dynamic memory allocation.
- All objects are passed by value in function calls.
- Variables are untyped, but can be cast explicitly or will be type converted automatically when possible.
The program stream executes at the topmost scope. You can define one or more functions in that topmost scope, or execute commands directly. Each function runs in its own scope; it can access variables from outer scopes but cannot set them. Functions defined within another function only exist as long as that function is running.
Example
Here is a trivial example of compiling and running some APP code in your Go program.
// String containing arbitrary _APP_ statements.
src := "..."
bc, err := compiler.CompileString(src)
if err != nil {
// Handle compile-time errors
}
syms := symbols.NewSymbolTable("test program")
ctx := bytecode.NewContext(syms, bc)
err := ctx.Run()
if err != nil {
// Handle run-time errors
}
The general pattern is to pass a string containing the program text to the compiler. The compiler generates a bytecode object containing the pseudocode for the program and any predefined symbols (constants or functions) from the compilation.
The caller then creates a new symbol table (or can re-use an existing one if symbols are meant to be persistent between compilation units). A new runtime context is created (which contains the program counter, stack, error handling stack, etc.) and uses the existing symbol table and bytecode. This allows bytecode to be persisted or re-used and can be executed multiple times on multiple threads, each with it's own context.
Finally, the context is run, which executes the bytecode instructions.
If the instructions are meant to return a value, that value is left on
the stack for the context, and you can use ctx.Pop()
to remove items
from the stack. The return values are opaque interface{}
objects, and
you can use the util.Get*() functions to extract the integer, float,
string, or bool object.
Data types
APP support six data types, plus limited support for function pointers as values.
type | description |
---|---|
int | 64-bit integer value |
float | 64-bit floating point value |
string | Unicode string |
bool | Boolean value of true or false |
[...] | Array of values |
{...} | Structure of fields |
An array is a list of zero or more values. The array values can be of any
type, including other arrays. The array elements are always indexed starting
at a value of zero. You can also reference a range (slice) of an array by using
the notation a[b:e]
which returns an array containing elements from b
to
e
from array a
. You cannot reference a subscript that has not been allocated.
You can use the array
statement to initialize an array, and the array()
function to change the size of an existing array.
You can also define an array constant, and assign it to a value, where the initial array is defined by the constant.
names := [ "Tim", "Sue", "Bob", "Robin" ]
This results in names
being assigned an array containing the four string
values given.
Structures contain zero or more labeled fields. Each field label must be unique, and can reference a value of any type. You can create a struct using a literal:
employee := { name: "Susan", rate: 23.50, active:true }
This creates a structure with three members (name
, rate
, and active
).
Once this is stored in a variable, you can use dot-notation to reference
a field directly,
pay := 40 * employee.rate
If the member does not exist, an error is generated. You can however add new fields to a structure simply by naming them in an assignment, such as
employee.weekly = pay
If there isn't already a field named weekly
in the structure, it is
created automatically. The field is then set to the value of pay
.
Note that structures and arrays are always passed around by reference.
That is, if you create a structure a
, assign it to b
, and then change
a field in b
, it will also be changed in a
. To make an exact duplicate
of an existing structure, use the new()
function which creates a new
instance of the existing type. For example,
a := { age: 55, name: "Timmy"}
b := a
b.age = 4
fmt.Println( a.age ) // Prints the value 4
c := { age: 55, name: "Timmy"}
d := new(c)
d.age = 4
fmt.Println( c.age ) // Prints the value 55
Scope
The console input (when ego
is run with no arguments or parameters) or
the source file named when ego run
is used creates the main symbol table,
available to any statement entered by the user or in the source file.
The first time a symbol is created, you must use the := notation to create a variable as well as set its value. All subsequent sets of that variable should use the = notation to store in an existing symbol.
Whenever a { }
block is used, a new symbol table is created for use during
that block. Any symbols created within the block are deleted when the block
exits. The block can of course reference symbols in outer blocks using
standard "=" notation. For example,
x := 55
{
y := 66
x = 42
}
After this code runs, the value of x
will be 42 (because it was changed
within the block) and the symbol y
will not be defined (because it went
out of scope at the end of the basic block.)
array
The array
statement is used to allocate an array. An array can also be
created as an array constant and stored in a variable. The array statement
identifies the name of the array and the size, and optionally an initial
value for each member of the array.
array x[5]
array y[2] = 10
The first example creates an array of 5 elements, but the elements are
<nil>
which means they do not have a usable value yet. The array elements
must have a value stored in them before they can be used in an expression.
The second example assigns an initial value to each element of the array,
so the second statement is really identical to y := [10,10]
.
const
The const
statement can define constant values in the current scope. These
values are always readonly values and you cannot use a constant name as a
variable name. You can specify a single constant or a group of them; to specify
more than one in a single statement enclose the list in parenthesis:
const answer = 42
const (
first = "a"
last = "z"
)
This defines three constant values. Note that the value is set using an
=
character since a symbols is not actually being created.
if
The if
statement provides conditional execution. The statement must start
with a expression which can be cast as a boolean value. That value is
tested; if it is true then the following statement (or statement block)
is execued. By convention, even if the conditional code is a single
statement, it is enclosed in a statement block. For example,
if age >= 50 {
call aarp(name)
}
This tests the variable age to determine if it is greater than or
equal to the integer value 50, and if so, it calls the function
named aarp
with the value of the name
symbol.
You can optionally include an "else" clause to execute if the condition is false, as in
if flag == "-d" {
call debug()
} else {
call regular()
}
If the value of flag
does not equal the string "-d" then the
code will call the function regular()
instead of debug()
.
func
The func
statement defines a function. This must have a name
which is a valid symbol, followed by an argument list.
The argument list is a list of names which become local variables in the running
function, set to the value of the arguments from the caller. After each argument
name in the func
statement, you can specify a type of int
, string
, float
,
bool
, struct
, or array
, in which case the value is
coerced to that type regardless of the value passed into the function.
After the (possibly empty) argument list you must specify the type of the
function's return value. This can be one of the base
types (int
, float
, string
, or bool
). It can also be []
which
denotes a return of an array type, or {}
which denotes the return
of a struct
type. Finally, the type can be any
which means any
type can be returned, or void
which means no value is returned from
this function (it is intended to be invoked as a call
statement).
The type declaration is then followed by
a statement or block defining the code to execute when the function
is used in an expression or in a call
statement. For example,
func double(x) float {
return x * 2
}
This accepts a single value, named x
when the function is running.
The function returns that value multiplied by 2. The type of the result
is coerced to be a float value. Note that the braces are not required
in the above example since the function consists of a single return
statement, but by convention braces are always used to indicate the
body fo the function. The function just created can
then be used in an expression, such as:
fun := 2
moreFun := double(fun)
After this code executes, moreFun
will contain the value 4.0 as a
float value.
return
The return
statement contains an expression that is identified as
the result of the function value. The generated code adds the value
to the runtime stack, and then exits the function. The caller can
then retrieve the value from the stack to use in an expression or
statement.
return salary/12.0
This statement returns the value of the expression salary/12.0
as
the result of the function.
If you use the return
statement with no value, then the function
simply stops without leaving a value on the arithmetic stack. This is
the appropriate behavior for a function that is meant to be invoked
with a call
statement.
for
The for
statement defines a looping construct. A single statement
or a statement block is executed based on the definition of the
loop. There are two kinds of loops.
x := [101, 232,363]
for n:=0; n < len(x); n = n + 1 {
fmt.Printf("element %d is %d\n", n, x[n])
}
This example creates an array, and then uses a loop to read all
the values of the array. The for
statement is followed by three
clauses, each separated by a ";" character. The first clause must
be a valid assignment that initializes the loop value. The second
clause is a condition which is tested at the start of each loop;
when the condition results in a false value, the loop stop
executing. The third clause must be a statement that updates the
loop value. This is followed by a block containing the statement(s)
to execute each time through the loop.
When using a loop to index over an array, you can use a short hand version of this.
x := [ 101, 232, 363 ]
for n := range x {
fmt.Println( "The value is ", n)
}
In this example, the value of n
will take on each element of
the array in turn as the body of the loop executes. You can
have the range
option give you both the index number and
the value.
x := [ 101, 232, 363 ]
for i, n := range x {
fmt.Println( "Element ", i, " is ", n )
}
Here, the array index is stored in i
and the value of
the array index is stored in n
. This is symantically
identical to the following more explicit loop structure:
for i := 1; i <= len(x); i = i + 1 {
n := x[i]
fmt.Println( "Element ", i, " is ", n )
}
break
The break
statement exits from the currently running loop, as if the
loop had terminated normally.
for i := 0; i < 10; i = i + 1 {
if i == 5 {
break
}
fmt.Println( i )
}
This loop run run only five times, printing the values 0..4. On the next
iteration, because the index i
is equal to 5, the loop is terminated.
Note that a break
will only exit the current loop; if there are nested
loops the break
only exits the loop in which it occurred and all outer
loops continue to run.
The break
statement cannot be used outside of a for
loop.
continue
The continue
statement exits from the current iteration of the loop,
as if the loop had restarted with the next iteration.
for i := 0; i < 10; i = i + 1 {
if i == 5 {
continue
}
fmt.Println( i )
}
This loop run run only all ten times, but will only output the values
0..4 and 6..9. When the index i
is equal to 5, the loop starts again
at the top of the loop with the next index value.
The continue
statement cannot be used outside of a for
loop.
Error handling
You can use the try
statement to run a block of code (in the same
scope as the enclosing statement) and catch any runtime errors that
occur during the execution of that block. The error causes the code
to execute the code in the catch
block of the statement.
If there are no errors, execution continues after the catch block.
x := 0
try {
x = pay / hours
} catch {
fmt.Println( "Hours were zero!" )
}
fmt.Println( "The result is ", x )
If the value of hours
is non-zero, the assignment statement will assign
the dividend to x
. However, if hours is zero it will trigger a runtime
divide-by-zero error. When this happens, the remainder of the statements
(if any) in the try
block are skipped, and the catch
block is executed.
Within this block, there is a variable _error_
that is set to the
value of the error that was signalled. This can be used in the catch
block if it needs handle more than one possible error, for example.
package
Use the package
statement to define a set of related functions in
a package in the current source file. A give source file can only
contain one package statement and it must be the first statement.
package factor
This defines that all the functions and constants in this module will
be defined in the factor
package, and must be referenced with the
factor
prefix, as in
y := factor.intfact(55)
This calls the function intfact()
defined in the factor
package.
import
Use the import
statement to include other files in the compilation
of this program. The import
statement cannot appear within any other
block or function definition. Logically, the statement stops the
current compilation, compiles the named object (adding any function
and constant definitions to the named package) and then resuming the
in-progress compilation.
import factor
import "factor"
import "factor.ego"
All three of these have the same effect. The first assumes a file named "factor.ego" is found in the current directory. The second and third examples assume the quoted string contains a file path. If the suffix ".ego" is not included it is assumed.
If the import name cannot be found in the current directory, then the
compiler uses the environment variables APP_PATH to form a directory
path, and adds the "lib" directory to that path to locate the import.
So the above statement could resolve to /Users/cole/ego/lib/factor.ego
if the APP_PATH was set to "~/ego".
Finally, the import
statement can read an entire directory of source
files that all contribute to the same package. If the target of the
import is a directory in the $APP_PATH/lib location, then all the
source files within that directory area read and processed as part
of one package.
@error
You can generate a runtime error by adding in a @error
directive,
which is followed by a string expression that is used to formulate
the error message text.
v = "unknonwn"
@error "unrecognized value: " + v
This will result in a runtime error being generated with the error text "unrecognized value: unknown". This error can be intercepted in a try/catch block if desired.
@global
You can store a value in the Root symbol table (the table that is the ultimate parent of all other symbols). You cannot modify an existing readonly value, but you can create new readonly values, or values that can be changed by the user.
@global base "http://localhost:8080"
This creates a variable named base
that is in the root symbol table,
with the value of the given expression. If you do not specify an expression,
the variable is created as an empty-string.
@template
You can store away a named Go template as inline code. The template can reference any other templates defined.
@template hello "Greetings, {{.Name}}"
The resulting templates are available to the template() function, whose first parameter is the template name and the second optional parameter is a record containing all the named values that might be substituted into the template. For example,
fmt.Println( strings.template(hello, { Name: "Tom"}))
This results in the string "Greetings, Tom" being printed on the
stdout console. Note that hello
becomes a global variable in the program, and
is a pointer to the template that was previously compiled. This
global value can only be used with template functions.