Heftza

From Erights

Revision as of 18:26, 18 June 2008 by Dglibicki (Talk)
(diff) ←Older revision | Current revision (diff) | Newer revision→ (diff)
Jump to: navigation, search

---+!! Heftza

  • Status:* Draft (as of 18 June 2008)
  • Authors:* Main.DanielLibicki

---++++!! Contents

%TOC{depth="5"}%

---++ Objective

Build the Great Object-Oriented Programming Language. This language should be more powerful, more abstract, more reliable, more secure, and simpler than Java and C++.

Then, take over the world.

---++ Overview

---+++ Wonko the Sane

Heftza is the realization of the dream of the Great Synthesis of object-oriented programming and type systems. It is powerful, abstract, aggressively simple, and aggressively writeable. Industry standard languages provide developers a way to implement object-oriented components, but force those components to interoperate with a non-object environement. Like [[1][Wonko]], who put the universe into an asylum, Heftza undertakes the ambitious project of wrapping the environment, so that applications deal in objects and deal with objects.

---+++ Object Persistence

The current industry standard of "classes in memory, tables on disk" creates impedance mismatch. For instance, hibernate configuration files are tedious, brittle, and often fail in confusing ways. In Heftza, objects live forever, so there is no need for application code to concern itself with the memory/disk distinction.

---+++ Concurrent Programming, Easy and Safe

Reasoning about mulithreaded systems is easy in theory. "Execute statements a and b at the same time" should be as easy to understand as "execute statement a n times." However, the industry standard languages do not have sufficient abstraction to express mutlithreaded designs in a simple fashion. Here at Google, we have trouble running multiple services on a single machine for fear that they will interfere with each other. Heftza offers transactional memory, object capabilities, side-effect management, and parallel control structures to make concurrency easy and safe. At Google, the need for aggressively concurrent, aggressively isolated applications is particularly acute, and it will only become more so as servers add more cores.

---+++ Static Safety

Heftza catches almost language exceptions at compile time.

---+++ Distributed Capabilities

See [[2][Object-capability model]].

---++ Detailed Design

---+++ A Few Notes on Syntax

As Richard Gabriel once said, "Usually the flavor of the language's syntax can be gotten across more briefly than you expect. A good rule of thumb is that no one cares about syntax." You might want to skip this section and return to it if you have trouble following one of the code examples.

Comments are enclosed in curly braces.

<verbatim> {This is a Heftza comment.} </verbatim>

Many characters that are considered special characters in most programming languages, such as +-*/ etc, are considered regular characters in Heftza.

<verbatim> my-integer : 0; {ok} </verbatim>

The assignment operator is :

<verbatim> i : 0; {assign i to refer to a new integer with the value of 0} </verbatim>

In Heftza, you must dereference in order to use an object, for instance, to call a method or pass a parameter.

<verbatim> Integer i : 0; {declaration} i : $i.+(1); {method call} </verbatim>

Method calls with empty parameters lists have no parentheses.

<verbatim> $program.run; </verbatim>

Object creation on the heap looks like creating an object on the stack in C++.

<verbatim> list : List(0, 1, 2); </verbatim>

(The keyword "new" in Heftza is used for schema evolution. It is unrelated to object creation.) Type parameters are passed in brackets (aka "square brackets").

<verbatim> class List[T] end List; </verbatim>

---+++ Transactional Memory

---++++ Isolated, Likwid, and Durable Methods

By default, a Heftza method executes inside a transaction. A method that always executes in a transaction may be called "isolated". "isolated" is not a keyword in Heftza because it is the default.

Transactions are not started and committed explicitly in Heftza. Rather, if an isolated method is called and there is no running transaction, a transaction is implicitly started before entering the method, and is implicitly committed after returning from the method. There are two kinds of methods in Heftza that are not transactional: "likwid" and "durable" methods. Likwid methods are indifferent to transactions; if there is no running transaction when a likwid method is called, the message neither starts or commits a transaction. Durable methods may not be called during a running transaction. This is a compile error, not a runtime error, so a durable method may not be called unless it is statically known that there is no running transaction. Durable methods may only be called from other durable methods, from completions, from scripts, or from the interpreter.

Transactions in Heftza are serializable, which means that semantically, they are executed serially. While a transaction is running in some thread, all other threads are semantically blocked. The implementation must do something that is equivalent to a serial execution, and a good implementation will do it as fast as possible. Nowadays, it is reasonable to expect of an implementation that it will use a transactional protocol to produce a parallel execution that is semantically equivalent to a serial execution.

Transactions are crash-resistant in Heftza like they are in databases. Heftza ensures that, on restart after a crash, all updates to persistent objects by non-committed transactions are rolled back; similarly, all committed updates are preserved.

It goes without saying that when the implementation reorders statements in a durable or likwid method, it must take into account the fact that other threads may be executing concurrently. That limits statement reordering but does not rule out the possibility. The smarter the implementation, the more reordering it can do even given this restriction. If this goes without saying, why did I say it? Because Java does reorder statements as if no other threads are concurrently executing, even though in Java, other threads can always be concurrently executing. And if you look at the Java spec on this subject, they basically say to the programmer, "If this bites you, it's your own fault." Thanks, Java. (Sorry, I had to vent a little about that.)

---++++ Completions

But what if you are writing a non-durable method and you want to call a durable method? What if you want to output a message to the console, for instance, to log the status of the method? In Heftza, the answer to such questions is often, "you can't", but in this case, Heftza does give you a way out, with the keyword "complete". Consider the following code:

<verbatim> A.do-something(B obj, OutputStream out){

 $obj.do-something-else;
 complete
   $out.put("something interesting just happened");
 end complete;

end do-something; </verbatim>

The above method is not modified by the keyword "likwid" or the keyword "durable", so it is an isolated method, and it always executes in a transaction. That transaction will commit when the lowest isolated method is ready to be popped off the call stack. A "complete" block defines a completion, that is, a series of statements that will be executed after the current transaction commits.

A completion may not appear in a durable method. That is because it is statically known that a durable method is not executing inside a transaction, so there is no commit to wait for. However, a completion may appear in a likwid method. Recall that a likwid method may or may not be running inside a transaction. If a completion in a likwid method is reached, and there is no current transaction, the statements in the block are executed in place; if there is a running transaction, the statements in the block are put on hold until after the transaction commits.

The body of a completion is like the body of a durable method. Durable methods may be called in a completion, and completions may not appear in a completion. The statements in a completion, like the statements in a durable method, do not execute in a transaction and are not isolated from other threads.

After a transaction commits, all completions that were reached during the course of the transaction are executed in the order they were reached. While likwid and durable methods provide ways to avoid starting a transaction, completions finally provide a way to escape from a running transaction.

Without completions, all methods below a durable method on the call stack would necessarily be durable. Because of completions, it is possible for a durable method to be above a likwid method (it is still impossible for a durable method to be above an isolated method), for instance, if a durable method is called inside a completion which itself is inside a likwid method.

If a transaction never commits due to an exception, the completions will never be executed. For atomicity reasons, if a transaction never commits, Heftza pretends that it never happened it all, which means that its completions were never reached. Information from the running transaction can flow into its completions. If a transaction never commits, completions must die, because they know too much.

---++++ Transactions and Exceptions

When an exception is thrown in Heftza, the current transaction is rolled back to the beginning of the try block or the beginning of the transaction, whichever comes first. I realize the phase "whichever comes first" is ambiguous, so consider the following code:

<verbatim> A.isolated-method(B obj):

 $obj.do-something-safe;
 try
   $obj.do-something-dangerous;
   FeelingGrumpyException.throw; {throw a new FeelingGrumpyException}
 catch Exception x
   $obj.lick-wounds;
 end try;

end isolated-method; </verbatim>

Note that A.isolated-method is not marked likwid or durable, so it always executes in a transaction. Here, the beginning of the try block is higher on the call stack than the beginning of the transaction, so the current transaction is rolled back only up to the beginning of the try block. Thus, all the updates of me.do-something-dangerous are reversed, but the updates of me.do-something-safe are not. Now consider another method:

<verbatim> likwid A.likwid-method(B obj):

 $obj.do-something-safe;
 try
   $obj.do-something-dangerous;
   FeelingGrumpyException.throw; {throw a new FeelingGrumpyException}
 catch Exception x
   $obj.lick-wounds;
 end try;

end likwid-method; </verbatim>

When the FeelingGrumpyException is thrown, what gets rolled back? Well, that depends whether there is an isolated method on the call stack under the call to A.likwid-method. If there is such an isolated method, the entire try block executes within a transaction, and Heftza can roll back to the beginning of the try block. If there is no isolated method under A.likwid-method on the call stack, the try block executes outside of a transaction, and thus nothing can be rolled back at all; by the time the exception is thrown, me.do-something-dangerous has already committed, whether me.do-something-dangerous is likwid or isolated.

You might object that this creates ambiguity, because the programmer must keep two kinds of behavior in mind when writing the method. Here's the way I look at it. When you catch an exception, you never know where the exception was thrown from; you should certainly program as if you don't know. If a try block runs inside a transaction and an exception is thrown, it is as if the exception was thrown before the first statement in the block was executed, presumably because the runtime environment presciently knew that the block would not complete. (Of course, the environment figured that out by speculatively trying to execute the block and then pretending that it hadn't done anything at all.)

Thus, try blocks in Heftza offer a way for a programmer to specify that an exception should only roll back part of a transaction, not the whole thing.

One more code sample:

<verbatim> A.another-isolated-method(B obj):

 $obj.do-something-dangerous;
 FeelingGrumpyException.throw; {throw a new FeelingGrumpyException}

end another-isolated-method;

durable A.durable-method(B obj):

 $obj.do-something-safe;
 try
   $obj.do-something-dangerous;
   $obj.another-isolated-method;
 catch Exception x
   $obj.lick-wounds;
 end try;

end durable-method; </verbatim>

When me.another-isolated-method is called from within A.durable-method, it begins a transaction. When the FeelingGrumpyException is thrown here, the beginning of the transaction is higher on the call stack than the nearest try block, so the current transaction is rolled back and nothing else.

The integration of exception and rollback makes Heftza's exception handling very powerful, and may lead to less work for the programmer. In pre-Heftza programs, a good amount of exception handling is dedicated to ensuring that the updates in try block occur atomically by manually rolling back incomplete changes. Heftza will do that for you without you having to think a single thought. Exception-rollback integration is also necessary for true transactional memory semantics. The ACID principle states that transactions are Consistent, meaning that they never transition the store to an illegal state. In Heftza, the store is defined as everything, and illegal states are defined by exceptions.

When an exception is thrown in a subtransaction, and it is not caught in the subtransaction, all of the subtransactions in the thread block are rolled back and the exception propagates to the supertransaction.

When an exception is thrown inside a transaction and it propagates outside the transaction, thus rolling back the entire transaction, the transaction never commits, and any completions reached in the course of the transaction never execute.

The class Exception is a constant class, meaning that Exception objects are semantically, if not formally, immutable. Therefore, an Exception object escapes, in a sense from a rolled-back transaction, but the state of the Exception object does not reflect changes to active objects that were involved in the transaction.


---+++ Nullability

By default, references are definite, in other words, they are guaranteed by the type system to never refer to null. (The value null is represented in Heftza, like Visual Basic, by the keyword "nothing".) The type system generates a compile error if a reference is assigned to a nullable expression. Methods are also definite by default: the type system will generate a compile error if a nullable expression is returned by a definite method.

The following expressions are considered nullable:

  * null itself, that is, the expression "nothing"
  * a dereference of a nullable reference
  * a method call to a nullable method

All other expressions are considered definite, including:

  * a dereference of a definite reference
  * a method call to a definite method
  * a "new" expression
  * a literal expression

To declare a nullable reference or method, the reference or method must be prefixed with the keyword "maybe".

<verbatim> class C

 maybe Integer i : nothing;

end C; </verbatim>

Nullable refereces may be dereferenced or cast to definite. Derefenrencing a nullable reference evaluates to a nullable expression. The only thing you can do with a nullable expression is assign it to a nullable reference (or pass it as a nullable parameter or return it from a method with a nullable return type.)

<verbatim> maybe Integer C.ok(maybe Integer p)

 Integer result : nothing; {a nullable reference}
 result : $p; {assign a nullable expression to a nullable reference}

return $result; {return a nullable expression from a nullable method} </verbatim>

To do anything else with a nullable reference (e.g. call a method, assign to a definite reference, iterate over), you must cast the nullable reference. This is how you cast:

<verbatim> Integer i : 1; Integer j : 2; Integer k : 3; if i

 j : $i; {ok, in this scope i is definite}

else

 j : $k;

end cast; </verbatim>

In the else clause here, i becomes a definite reference.

For convenience, you can cast more than one reference at a time:

<verbatim> String Example.give(maybe Integer i, maybe Integer j)

 if i, j
   result : $i.+($j);
 else
   result : "i and j are both nothing";
 end cast;

return $result;

</verbatim> ---+++ Side-effects and Read-only Interfaces

A method without side effects is called a "pure function". Since Heftza methods have no side effects by default, the term "function" is not a keyword; it is simply assumed when no other keyword is used.

In a pure function, the following statements will generate a compiler error:

  * Reassignment of a field
  * A call to a method that is not a pure function

Heftza has no notion of global or Java-static references, so these conditions are sufficient to guarantee the lack of side effects.

There are two kinds of methods that are allowed to have side effects: the "function" and the "command" (these are both keywords). A command is simply a method with side effects. A function is a method that formally has side effects but semantically has no side effects. That is, the compiler will verify that a function has side effects, but the contract with the caller states that it does not make any changes of semantic significance. The classic example is the reordering of elements in a set. For whatever reason, the set implementation may want to reorder the elements, but with respect to the caller, the set has not changed in any way. Note that the distinction between a function and a command is not only documentational. The implementation of Heftza is allowed to implicitly call a function whenever it wants (for instance, to maintain an index), or to not call a function at all if it determines that the result is not being used. In terms of method overriding, a function may override a pure function and a pure function may override a function, but only a command may override a command, and vice versa.

By default, classes are read-only, which means that they don't declare commands (only pure functions or functions). Declaring a command on a read-only class is a compile error. A writable class is called a "handle" class. The class declaration is prefixed by the keyword "handle". A handle class can declare commands, functions, and pure functions.

Given a class C that is not a handle class, an object created by calling new[C] is considered a value object. ("value" is not a keyword.) The heftza implementation is allowed to implicitly copy value objects, for instance over a network connection. It is worthwhile to note that, for a class C which is not a handle class, a reference of type C may or may not point to a value object. Why is that? Because class C may have some subclass D which is a handle class. In such a case, an object created by calling new[D] is not a value object. So a reference of type C may point to a new[D], which is not a value object. (If the heftza compiler can prove that copying an object will have the same semantics as not copying it, the implementation is always allowed to copy. However, in the case of a value object, the compiler does not need to prove equivalence.)

---+++ Parallelism

When a method is called with multiple parameters, the implementation may evaluate the parameters in any order or in parallel.

---++++ Threads

Note that when the Heftza environment does things in parallel, it is never under any obligation of how to parallelize: it can interleave statements, it can start an OS thread, it can put a task on a queue, it can pull a thread out of a pool, it can send work to another server, or whichever one seems like a good idea at the time. Heftza code always specifies semantics, the runtime environment chooses the implementation.

Consider the following Heftza code:

<verbatim> thread

 {statements A1...AN}

thread

 {statements B1...BN}

thread

 {statements C1...CN}

in parallel; </verbatim>

This means: "do A, B, and C at the same time". Imagine how you would write something like that in Java: you'd have to create three inner classes and join them all. I'm not even going to bother writing it out.

Each such thread can be considered a "child" thread, because the "parent" thread is suspended until all children complete. These children do not run in isolation from one another, that is, they see each others' changes. If an isolated method is called from one of the children, that child becomes isolated from its "siblings" until that isolated method returns.

break statements are allowed inside parallel thread blocks. When a break statement is reached in a thread, the execution of all sibling threads is terminated and the parent thread resumes.

Now consider the following code:

<verbatim> thread

 {statements A1...AN}

thread

 {statements B1...BN}

thread

 {statements C1...CN}

in serial; </verbatim>

This means: "do A, B, and C at the same time, isolated from one another." break statements are not allowed in serial thread blocks, because this creates some ambiguity. If a break statement were to be reached in a thread of a serial thread blocks, what should happen to the sibling threads? If the execution were to halt with no rollback, this would violate atomicity. Should the sibling threads be rolled back? Should they run to completion? None of these seems satisfying or outstandingly correct; the correct thing is not to allow breaks at all, since a break statement in a thread block is essentially a form of interference between sibling threads.

When a thread that is in the middle of an executing transaction has child threads that are isolated from one another, the child threads enter transactions which are considered child transactions of the child thread's transaction. Child transactions are isolated from one another, but they are not isolated from parent or ancestor transactions, because the parent/ancestor transactions are suspended until all children/descendants complete. Thereupon, all changes made by the child transactions are visible to the parent transaction.

---++++ For Blocks

The ordered "for" block in Heftza is similar to the "for" block in Python. Consider the following Heftza code:

<verbatim> letters : List("x", "y", "z"); List reverse : List; for c in $letters

 reverse : $reverse.+($c);

in order; </verbatim>

The body of the for block executes once for each item in "letters", in order, just like in Python. However, Heftza for blocks can also execute in parallel and in serial, as in the following code:

<verbatim> for c in $letters

 $me.send($c.+("@google.com"), "hi");

in parallel;

Set letter-set : Set;

{the following code is plagued by a race condition} for c in $letters

 letter-set : $letter-set.+($c);

in parallel;

{but this constructs a set correctly} for c in $letters

 letter-set : $letter-set.+($c);

in serial; </verbatim>

A parallel for loop is analogous to a parallel thread block, and a serial for loop is analogous to a serial thread block. Similarly, break statements are allowed in parallel for blocks but not serial for blocks.

---++++ Processes

Consider the following Heftza code:

<verbatim> process

 {statements}

end process; </verbatim>

This is equivalent to the following Java code:

<verbatim> new Thread() {

 void run() {
   // statements
 }

}.start(); </verbatim>

Note that Heftza has no notion of a "process" in the traditional sense. All Heftza code semantically runs in a single, shared memory space. In Heftza, unlike in Java, all local variables may be referenced inside a Heftza process. (Reassigning a local variable inside a process and outside a process does introduce a kind of race condition, but so does calling a method inside a process and outside a process, which would, of course, be entirely legal in Java. If the programmer specifies such a thing, it is an indication that such a race condition is acceptable.) A process has the same statement type as a durable method, and may only occur where durable methods may occur, in a durable method, a script, or a completion.

---+++ Folders and Persistence

---++++ Folders

One of the fundamental classes of the Heftza standard library is the Folder class. Here is a simplified declaration of the Folder class:

<verbatim> class Folder

 maybe T get-object[T](maybe List[String] path, String name);
 maybe Folder get-child(maybe List[String] path, String name);

end Folder; </verbatim>

Every directory on the file system corresponds to a single object of type Folder. Given that f represents some directory on the file system, calling $f.get-child(List("hello"), "world") returns a folder with the relative path hello/world (or nothing if no such subfolder/file exists). In Heftza, there is no such thing as an absolute path. It's all relative. $f.get-object[StringHandle]("hello.txt") returns a character text file with the relative path "hello.txt" (or nothing if no such file exists), and $f.get-object[ListHandle[Bit]]("world.bin") returns a binary file at the relative path "world.bin" (or nothing). In general, for some arbitrary class C, $f.get-object[C]("hello.c") returns an object of class C at relative path "hello.c", or nothing if no such object exists. As we shall see in the next section, any Heftza object of any type can be stored in the file system.

---++++ FolderHandles

Directories on the file system are actually represented by instances of the FolderHandle class, which is a subclass of Folder. Here is a simplified declaration of the FolderHandle class:

<verbatim> handle class FolderHandle is Folder

 command set-object[T](maybe List[String] path, String name, maybe T object);
 command set-child-handle(maybe List[String] path, String name, maybe FolderHandle child);
 maybe FolderHandle get-child-handle(maybe List[String] path, String name);

end FolderHandle; </verbatim>

set-object puts an object of any type into the file system. Consider the following Heftza script:

<verbatim> $objects.set-object("zero.integer", 0); </verbatim>

"objects" refers to the current directory, so this script inserts an Integer object into the current directory with the filename "zero.integer". Now consider the following script:

<verbatim> i : $objects.get-object[Integer]("zero.integer"); if i

 $console.print($i.to-string);

else

 $console.print("zero.integer not found");

end cast; </verbatim>

If the above script is run immediately after the previous script, and nothing interferes in between, the output will be "0", if both are run from the same directory. Consider another script:

<verbatim> s : $objects.get-object[String]("zero.integer"); if s

 $console.print($s);

else

 $console.print("zero.integer not found");

end cast; </verbatim>

If this script is run in the same directory, where the file zero.integer has type Integer, the output will be "zero.integer not found".

The method set-folder inserts a child FolderHandle into a FolderHandle. This child can be retrieved as type FolderHandle with get-child-handle, or as type Folder with get-child. In Heftza, there is no distinction between a file and a symlink to a file. When a file is in a Folder, it simply participates in a containment relationship, like an object in a list. Just as an object can be in one list, two lists, or no lists, a file can be in one folder, two folders, or no folders. The same is true with FolderHandles: a FolderHandle can be in one FolderHandle, two FolderHandles, or no FolderHandles. Consider the following Heftza script:

<verbatim> f : <NOP>FolderHandle</NOP>; i : $objects.get[Integer]("zero.integer"); if i

 $f.set-object(new("i.integer"), $i);

else

 $console.print("zero.integer not found");

end cast; </verbatim>

After this script exits, the new FolderHandle created by the script is not reachable by anything, so it is garbage collected, whether or not it contains a file of type Integer. Now consider the following script:

<verbatim> f : <NOP>FolderHandle</NOP>; g : <NOP>FolderHandle</NOP>; h : <NOP>FolderHandle</NOP>; $f.set-child-handle("g", $g); $objects.set-child-handle("h", $h); </verbatim>

After this script exits, f is not reachable by anything, so f is garbage collected. g is not reachable by anything but f, and f has been collected, so g is collected. However, from the file system's point of view, a symlink called "h" has been created in the current directory to an empty directory located in some arbitrary location such as /home/heftza; from Heftza's point of view, h is persistent.

---++++ Type Equivalence

Heftza has no cast operator, so if Folder.get-object were to return an Object, the client would not be able to do anything with it. Instead, Folder.set-object and Folder.get-object are generic. Folder.set-object takes a type parameter T that becomes the type of the object parameter (so the expression passed as the object parameter must be of type T or a subtype). The key to the object really has two parts: the name string and the type. Folder.get-object finds the object by name, and then compares the type key of the object with the type key passed Folder.get-object for equivalence. If the types are not equivalent, Folder.get-object returns nothing.

A type equivalence check is a bit harder for Heftza than it is for Java, though. Because Objects in Heftza are persistent, they can travel between processes, between code bases, between physical machines, even between networks. So the object in the Folder at the given key may be from a different code base, and type equivalence is serious work, not just a name comparison or a pointer comparison.

Folders essentially check for name and structural equivalence of the two types, with no subtyping. The definition of type structural equivalence is easy: two types are equivalent iff their classes are equivalent, and each type parameter to the class is equivalent. B1[C1] = B2[C2] iff B1 = B2 and C1 = C2. Defining structural equivalence of classes is more involved.

Let structure(C) be a function that takes the class declaration of C and erases the abstract and parallel keywords (not durable), erases field initialization code, and the parameters (type and object parameters) to the class, and sorts the parent classes section by the name of the parent class.

Let depends(C, D) be true of two classes iff D’s name appears as an identifier in structure(C), and, when C was compiled, the name referred to D.

Let dependencies(C) be the transitive closure of the depends relationship taking C as the point of departure.

Let version(C, N), be a class with the name N obtained by starting from C and following dependency relationships.

Let match(D1, D2) be true iff the string interpretation of structure(D1) is equivalent to the string interpretation of structure(D2).

Then, a class C1 is structurally equivalent to a class C2 iff, for every class name N of every class D in dependencies(C1), match(version(C1, N), version(C2, N) is true.

An informal proof of the type safety of this strategy: say that a given compilation unit, B1 takes a reference and binds it to an object from a Folder, like so: "r1 : m.get-object[T1]($k)" etc. The object at key k was created in the context of a different compilation unit, of some type T2 defined in that compilation unit. But Heftza runs a type equivalence check and allows the binding. What can the program do with this reference? It can dereference it and bind its value to a reference r2 of the same type. The program could do about the same things with r1 and r2. The program can call methods to r1 or r2, which is safe, because T1 and T2 support the same methods, with the same names, with the same number of parameters, and all the parameters have the same names and the same type modifiers. The program could bind a reference to an object returned by a message to r1 or r2. The names of the types returned by methods to T1 are the same as the names of the types returned by methods to T2. The code which extracted the object was compiled against some class U1 which is version(T1, N), and the object referred to by r1 was compiled against some class U2 which is version(T2, N). So U1 and U2 are just as equivalent as T1 and T2, and code compiled against U1 will be just as safe executing against an object of type U2 as code compiled against T1 will be executing against an object of type T2, so extracting a return object does not jeopardize type safety. The argument regarding sending parameters to r1 or r2 is very similar, so it will be omitted.

What this is basically saying is that in order to typecheck two different code bases using the same type, they must be compiled against the same header files. (If you don’t know what header files are, that’s okay, skip to the next paragraph.) Heftza doesn’t have header files, but it has class declarations, which are pretty close. After all, it is really the header file that is used for type checking, not the code for the method bodies. In order for two code bases to use the same type, they must be compiled against the same “header file” for the common type, as well as all types reachable from that type.

---++++ Persistence

All objects reachable from the file system are persistent. When an object becomes unreachable from all running processes and unreachable from the file system, it is garbage collected.

---+++ Machines

---++++ Invoking Heftza

Assuming that the Heftza runtime environment is in the system path, invoking heftza with no parameters opens a Heftza interactive interpreter.

<verbatim> dglibicki@libicki:~$ heftza heftza$ </verbatim>

In an interpreter generated by a no-parameter invocation of heftza, the standard library classes are available, but no user-defined classes are available.

If heftza is invoked with a single parameter, the parameter is assumed to be a file name/path. If no file can be found at the specified path, Heftza will complain and exit. If a file can be found, Heftza checks if the file is really a Heftza object of type Program or Machine. The class Program has an abstract method called run (no parameters, no return value). If Heftza is invoked on a Program object, heftza will simply call the object's run method.

A Heftza Machine represents a mapping from identifiers to class definitions. To start out, you can think of a Heftza Machine as kind of like a Java classpath. Although the two are extremely different, a classpath acts as a function that takes an identifier and returns a class definition. So invoking heftza on a Machine opens up an interpreter just like invoking heftza with no parameters, but the interpreter will look up class names in the given Machine and compile/execute the commands entered into the interpreter based on the class definitions in the Machine.

<verbatim> dglibicki@libicki:~$ heftza centurion centurion$ </verbatim>

If heftza is invoked with a single paramter, and the parameter names a text file, Heftza assumes that it containins a script -- that is, a series of commands -- in the Heftza language. If the script does not contain compile-time errors, Heftza will run the script. The way that this works is that first, Heftza will check to see if it already has an up-to-date, compiled version of the script. If not, Heftza will attempt to compile the script (presumably to binary, though the implementation is allowed to define an intermediate bytecode if it wants to), and then run it in the heftza process.

heftza may also be invoked with three parameters. When heftza is invoked with three parameters, the first is assumed to be a the path to a Machine file, and the second is assumed to be the path to a script file. If these assumptions are correct, the script is compiled and run against the identifier-to-definition mapping in the Machine. If the first parameters is not the path to a Machine, or the second parameter is not found or does not compile, heftza will complain and exit.

heftza may not be invoked with four or more parameters. There are no other parameters to the Heftza runtime environment.

---++++ <NOP>MachineHandles<NOP>

One of the fundamental classes of the Heftza standard library is the <NOP>MachineHandle</NOP> class, a read-write interface to the Machine class. Here is a simplified declaration of the <NOP>MachineHandle</NOP> class:

<verbatim> handle class <NOP>MachineHandle</NOP> is Machine

 command set-source-file(String name, maybe String file);
 command set-parent(String name, maybe Machine parent);
 command set-dependency(String name, maybe Machine dependency);
 command set-class(Machine source, String old-name, maybe String new-name);
 command set-value(Machine source, String class-name, String old-value-name, String new-value-name);
 command compile;

end <NOP>MachineHandle</NOP>; </verbatim>

Machine objects contain a string-to-string mapping of source file name to source file text. (When I say source file "name", I mean the "arbitrary" name that it has with respect to a given Machine. This "name" has no necessary relationship to the "name" of the file in your favorite file system. The "file" does not have to actually appear in the file system at all; also, two different Machines could refer to a single String, with two different names or with the same name.) When you concatenate the files together, you get the text of the Machine. To explain this in Heftza code:

<verbatim> class Machine

 Map[String, String] files : Map;
 String get-text;

end Machine;

String Machine.get-text

 for file in $files.values
   result : $result.+($file);
 in serial;

return $result; </verbatim>

<NOP>MachineHandle</NOP>.set-class is kind of like a Java import statement, except that it applies to the entire Machine and not just a single "class file". However, notice that you can declare a new name for the class, so that the class is visible in the Machine by a different name than in the source machine. Let's face it: "namespaces" and "package names" are ugly. One of the ugliest phenomena in source code is the long list of imports in the header of a .java file. In Heftza, all this is unnecessary, because the binding of an identifier to a class definition is local to a given Machine; in a different Machine, the same class can be bound to a different identifier.

Calling <NOP>MachineHandle</NOP>.set-parent creates an inheritance relationship between Machines. All classes defined by all ancestors of a Machine are in scope in a Machine, bound to their "original" names, that is, the name of the class as it is defined in the source code defining it. Calling <NOP>MachineHandle</NOP>.set-dependency "imports" all classes that are defined in all ancestors of the dependency, but does not create an inheritance relationship. To give an example:

<verbatim> Example.give(MachineHandle a, MachineHandle b, MachineHandle c, MachineHandle d)

 $b.set-parent("a", $a); {a is now a parent of b}
 $c.set-dependency("b", $b); {all the classes defined in a are visible in c}
 $d.set-parent("c", $c); 
 {all the classes defined in c are visible in d, but the classes defined in a and b are not visible in d, because a and b are not ancestors of c}

end give; </verbatim>

Inheritance is transitive visibility, dependency is non-transitive visibility. Dependency is analogous to import "*" in Java. Inheritance is analogous to #include in C++.

<NOP>MachineHandle</NOP>.compile attempts to compile the text of the <NOP>MachineHandle</NOP> against the ancestors, dependencies, and "imports" of the <NOP>MachineHandle</NOP>. Unsuccessful compilation throws an exception, whereas successful compilation creates a new set of bindings wherein the classes defined in the source text of the <NOP>MachineHanlde</NOP> are bound to their "original" names. <NOP>MachineHandle</NOP>.set-value changes the name of a member of an enumeration. The given enumeration class must already be an imported class or be defined in a machine dependency.

---+++ Schema Evolution

---++++ Method Bodies

Consider the following interaction:

<verbatim> dglibicki@libicki:~$ heftza centurion centurion$ $objects.put("object.c", C); {put a new object of class C into the current directory} </verbatim>

This creates an instance of class C and puts it in the file system. Let's say that class C is defined in the Machine file "centurion". Now, I edit the definition of C, recompile centurion, and enter the following:

<verbatim> dglibicki@libicki:~$ heftza centurion centurion$ c : $objects.get[C]("object.c"); centurion$ if c centurion$ $console.print($c.main); centurion$ else centurion$ $console.print("c is nothing"); centurion$ end cast; new implementation centurion$ </verbatim>

What gets called here is the new implementation of C.main. In general, all updates to method bodies of classes are "pushed out" to extent instances of those classes. That's because a change to method bodies, type safety speaking, is a backwards compatible change. Some changes to a class are not backwards compatible, and that's where schema evolution comes in. To explain schema evolution, I will go through every possible change to a class and explain how schema evolution works for each change.

---++++ Renaming

To rename an identifier (a field, a method, a type parameter, or the class itself), simply suffix the new name with "was old-name". For instance, take the following class declaration:

<verbatim> class C

 Integer i : 0;

end C; </verbatim>

Say you want to rename i to j, and you have extant objects of class C:

<verbatim> class C

 Integer j was i : 0;

end C; </verbatim>

The nice thing about this is that you only have to compile once with the "was" syntax. That will give the compiler the chance to change the names on all the extant objects. The next time you compile, you can omit the "was" clause.

---++++ Method Signatures

To add a new method is a backwards compatible change.

To remove a method is not backwards compatible. If you remove a method of a class with extant objects, you will get a compile error. Instead of removing the undesired method, you need to prefix it with the keyword "old". An "old" method is considered undefined in the sense that code that refers to an old method will generate a compile error. However, old methods can be called on extant objects that were created before the method was old.

This may sound a bit confusing at first; after all, how can you call a method on an old object if code that calls that method won't even compile? Well, suppose you had the following two files, in two different Machines:

(file #1) <verbatim> handle class C

 command ++;

end C;

command C.++

 {do something}

end ++; </verbatim>

(file #2) <verbatim> handle class D(C value)

 C c : $value;
 command endless;

end D;

command D.endless

 loop
   $c.++;
 end loop;

end endless; </verbatim>

(Note that the second Machine must be a descendant or a dependent of the first Machine.) If I try to simply remove C.++, and I try to recompile the first Machine while there are extant D objects, I get an error:

<verbatim> handle class C

 {error; the D objects need C.++}

end C; </verbatim>

So instead of removing C.++, I need to declare it old:

<verbatim> handle class C

 old command ++;

end C;

old command C.++

 {do something; could be something different than before}

end ++; </verbatim>

Now, all D objects out there can still call C.++. But if I try to recompile the second Machine without changing the code, I get an error, because from the compiler's point of view, C.++ is undefined.

Changing the type signature of a method of a class with extant objects is illegal. This is because you'd have to define an old method and a new method, and they'd both have the same name, and overloading is illegal in Heftza. Fortunately, if you want to change the type signature of a method, all you need to do is change the name of the old method, declare it old, and create a new method.

---++++ Fields

Adding a "maybe" field is backwards compatible. Such a field will be added to extant objects and set to "nothing". Adding a definite (non-nullable) field is not backwards compatible, so if you add a new field and there are extant objects that do not have that field, you must declare it "maybe".

Removing a field is backwards compatible, kind of. You might want to keep the field around to be used in old methods. To do that, you can declare the field "old". old fields may only be accessed in old methods. You can never update (reassign) them, and you must test them before using them like maybe and new references.

If you have a "maybe" field that happens to be non-null on all extant objects, you can make it into a definite field. Adding maybe to a field that was definite is backwards compatible.

For a field of type T, changing the type to some supertype of T is backwards compatible. To change the type to any type that is not a supertype of T, you must rename the old field and create a new field.

---++++ Type Parameters

Adding a type parameter is backwards compatible.

Removing a type parameter is not backwards compatible. If there are extant objects, you must mark the type parameter "old" instead of removing it. old type parameters are only in scope in old methods.

---++++ Parent Classes

Adding a parent class is equivalent to adding a set of new methods (at least, as far as schema evolution is concerned). Thus, adding a parent class is backwards compatible.

Removing a parent class is not backwards compatible. Removing a parent class is like removing a set of methods, and you could deal with it as such, by defining old methods for the methods that were supported by the parent class and are no longer supported. However, that's somewhat limiting, because you may want to use the old parent class's implementation. If so, you can mark the old parent class as "old". An old parent class is not recognized as a supertype by the compiler, but if you need an old method and the old parent class defines such a method, you don't have to override the method. Sadly, you can't parameterize "me" with an old parent class (even in an old method, you can't count on any given extant object to actually have that class as a parent). That's limiting, but it's the only approach that is type safe.

---+++ Remote Method Invocation

---++++ Capabilities

Each system resources is represented in Heftza as an object. Heftza is a pure local language; there is no such thing as a global or "static" reference. Therefore, in order to access any resource or data, code needs a reference to the object representing it. That is what is called object-capability security. For instance, a text file is represented in Heftza as a StringHandle that has been mapped in a Folder. In order for object "a" to access StringHandle "b", "a" must have a reference to some object "c" which has a reference to "b", and "c" must give that reference to "a". In other words, in order for you to access something, someone who has access to it must give you access to it. Also, access can be restricted in an arbitrary, turing-complete fashion. Let's say you have some object and you want to give someone else some limited access to it. You create a new class, make the private object a field of the new class, define methods on the new class to access the private object, and give an instance of the new class to the principal who you want to have limited access.

In order to preserve object capability guarantees, Heftza does not allow access to private fields of an object other than "me", even an object of the same class as "me". (Back in 2003, when I was writing Visual Basic, this was true in Visual Basic. But I haven't looked at the current version of VB, things may have changed.) In principle, the private state of an object should only be accessible to the code of the object itself and not to the code of any other object, even an object of the same class. Furthermore, because of Heftza's support for type equivalence, if access to private fields was allowed in the defining class, an attacker could define a class with the same name that is type equivalent and yet malicious, and then use that class to take a "victim" object out of a Folder and break its encapsulation.

Remote method invocation in Heftza works under the assumption that you can't trust the remote machine to be running a legal Heftza runtime environment. Heftza's remote method invocation preserves the object-capability security properties under this assumption.

---++++ Services

A method that may be invoked remotely must be prefixed by the keyword "request". Only classes prefixed by the keyword "service" may declare "request" methods. ("service" and "handle" are mutually exclusive.) A subclass of a service class that is not itself a service class may override request methods declared by its superclass, but it may not declare new service methods. request methods are like durable methods in that they may not be called in the middle of a transaction (unless they are inside a completion), but "request" and "durable" are not mutually exclusive -- by default, a request defines a new transaction.

The reason that request methods can't be called in the middle of a transaction is that, in case of a rollback, the local heftza runtime environment can't trust the remote heftza runtime environment to rollback (as a legal environment would). As Mark Miller put it, that would be kind of like an ATM dispensing money, reaching an error condition, and then asking for the money back.

Every parameter and return type of a request must be either a service type or a serializable type. A subtype of a service type is not considered a service type; a subtype of a service type must be serializable. As you might expect, when a request is invoked, the serializable parameters are serialized, and the services are passed by reference. This is true even in a "round-trip"; that is, even if the call to a request method turns out to be a "local call", the serializable parameters are still serialized.

If a request is invoked and the remote system is unreachable, a LostConnectionException is thrown. This is one of the few exceptions thrown by the Heftza language or the Heftza standard library.

---++++ Serialization in Heftza

Serialization works differently in Heftza than in (say) Java. In Java, the class itself defines how it is serialized. In Heftza, it is the service that defines how types are serialized; thus, a given type may be "serializable" by some service and not "serializable" by another service.

In order for a type T to be serializable by a service S, the S must have one, and only one, "marshal" method for type T, and exactly one "unmarshal" method for type T. "marshal" and "unmarshal" are keywords that prefix a method, and each special method has a strict signature that must be adhered to. (marshal and unmarshal both operate on a Stack, so they always have side effects. Thus, the keyword "command" and the keywords "marshal/unmarshal" are mutually exclusive. In a sense, marshal/unmarshal is a subtype of command.)

Here is a service that marshals and unmarshals Points:

<verbatim> service class PointService extends Service

 marshal marshal-point(Point input, Stack[String] medium);
 unmarshal Point unmarshal-point(Stack[String] medium);
 abstract Point what-is-the-point;

end PointService;

marshal PointService.marshal-point(Point input, Stack[String] medium)

 $me.marshal-integer($input.get-x, $medium);
 $me.marshal-integer($input.get-y, $medium);

end marshal-point;

unmarshal Point PointService.unmarshall-point(Stack[String] medium)

 y : $me.unmarshal-integer($medium);
 x : $me.unmarshal-integer($medium);

return Point(x, y); </verbatim>

In case you were wondering where the methods marshal-integer and unmarshal-integer came from, they are methods of the standard library class Service, which also defines marshal-string, marshal-bit, etc.

In addition to all parameters and return types of a service being services or serializable, all fields of a service must be services or serializable as well. If a marshal or unmarshal method refers to a non-service field of a service, the field is first serialized and then accessed.

In Heftza, objects are serialized by an outside class, not by their own class. The consequence of this is that the serialization can only access the public state of the object being serialized. Therefore, an object cannot pass more capabilities to a service than it actually has. (If private state could be serialized, a malicious remote runtime environment could request an object and then break its encapsulation.)

---++++ Local and Remote Interfaces

Another property of Heftza's service types is that local clients of an object can see a local interface that remote clients cannot see. Recall that all methods of a service class must be requests, marshal methods, or unmarshal methods. That ensures that all remote calls to an object that is an instance of a service class will be outside of a transaction. However, subclasses of service classes may define methods that are not requests (in fact, if the subclasses declare new methods, those methods must not be requests.) Consider the following example:

<verbatim> service class C

 request ++;

end C;

handle class D is C

 Integer i : 0;
 command set-value(Integer value);

end D;

Main.run(FolderHandle f)

 d : D; {create a new object of class D}
 $d.set-value(1);
 $f.set-object("d", d);

end run; </verbatim>

Local code can call local method on "d", for instance, D.set-value. However, if the local code wants to pass a reference to "d" to a remote system, it can't pass a reference of type D -- an object of type D would be passed by serialization, not by reference. It must pass a reference of type C. The remote system now has a reference of type C, which does not support the method set-value. Of course, the remote system can't cast the reference to type D, since you can't cast in Heftza (except for casting maybe to definite).

Non-service subclasses of a service class are not allowed to override marshal/unmarshal methods. If they could do that, two bad things could happen: first of all, non-serializable non-service fields could be referred to in the marshal method, and secondly, the subclass could provide a local interface to itself in an unmarshal method, potentially giving a local interface to itself to an object that is actually on a remote system.

---++++ <NOP>FolderServices</NOP>

The standard library includes a special Service class called <NOP>FolderService</NOP>, along with a local read-write interface to the <NOP>FolderService</NOP> called <NOP>FolderServiceHandle</NOP>. Here are simplified definitions of <NOP>FolderService</NOP> and <NOP>FolderServiceHandle</NOP>:

<verbatim> service class FolderService

 request T get-object[T](String name);

end FolderService;

handle class FolderServiceHandle

 command set-object[T](String name, T value);
 command set-encrypted(Bit b);
 command set-signed(Bit b);

end FolderServiceHandle; </verbatim>

A <NOP>FolderService</NOP> is similar to a <NOP>Folder</NOP> except that get-object is a request method. So if you get a service object from a <NOP>FolderService</NOP>, you get a reference to the service object itself. If you get any object that is not a service from a <NOP>FolderService</NOP>, the object is serialized. A <NOP>FolderService</NOP> can serialize anything, but it does so naively, by copying the entire object graph reachable from the serialized object. So passing non-service objects through a <NOP>FolderService</NOP> should be done with caution.

<NOP>FolderServiceHandle</NOP> also has methods for setting the communication protocol. By default, communication with a <NOP>FolderService</NOP> is encrypted and signed, but you can relax the security guarantees by calling the <NOP>FolderServiceHanlde</NOP> methods.

Each remote capability to a service object has a communication protocol associated with it, but you can't set it directly. The communication protocol on the object is "strengthened" every time is passes across the wire. In other words, if an unencrypted capability passes along an encrypted connection, that capability becomes "encrypted", in the sense that connections using that capability will themselves be encrypted. The same is true with respect to signing.

---+++ Other Features of Heftza

---++++ Type Inference

Heftza does not do type inference on parameters and fields. However, Heftza does do type inference on local references (that is, method-scoped references), unless the type is provided explicitly.

By default, local references are "final" in the Java sense that they may not be reassigned. final references don't need to be declared, they just need to be initialized before they are used. to make a non-final local reference, the reference must be declared by explicitly providing the type. (Parameters and for-block iterators are always "final", and fields are never "final".)

---++++ Generics

Heftza generics are best defined as an improvement on Java generics. Here are the differences between Heftza generics and Java generics:

Needless to say, Heftza does not have "raw types". The only reason that Java allowed "raw types" was for backwards compatibility.

Heftza will infer type parameters within method bodies (and scripts). Here is a simple example:

<verbatim> list : List; list : list.+("hello world"); for str in list

 $console.print($str);

in parallel; </verbatim>

By looking at the code, you can tell List "list" is a List[String]. A String is appended to the list, and Strings are read from the list. The type parameter "String" can be inferred, so you don't need to write List[String]; just write List and the compiler will figure it out. If the compiler can infer the type parameter, it is legal to specify it explicitly. If the compiler cannot infer a type, it will fail and ask you to pass the type parameter explicitly.

Finally, there is no "extends", "super", or ?. That means that, if you have a generic reference, you cannot call methods on it. All you can do is assign it, pass it as a parameter, or return it. That sounds rather limiting, but it's actually not; if you need to call methods on your generic reference, define a non-generic subclass. The canonical example is the comparator, which can be defined like this:

<verbatim> class Comparator[T]

 Integer compare-to(T left, T right);

end Comparator; </verbatim>

Then you can write two subclasses, one which compares Strings, one which compares Integers:

<verbatim> class StringComaprator

 Integer compare(String left, String right);

end StringComparator;

Integer StringComparator.compare(String left, String right) return $left.compare-to($right);

class IntegerComaprator

 Integer compare(Integer left, Integer right);

end IntegerComparator;

Integer IntegerComparator.compare(Integer left, Integer right) return $left.-($right); </verbatim>

This does introduce some code duplication, but I think that it greatly reduces the complexity of the generics feature, and makes the code much more understandable.

---++++ Multiple Inheritance

There is no such thing as a Java "interface" in Heftza. All classes can have fields and implementations etc. Similarly, all classes can have multiple inheritance. This is subject to certain restrictions. First of all, Heftza does not support method overloading. So, if multiple inheritance would result in method overloading, the inheritance is illegal, as in the following code:

<verbatim> class C

 String main(Integer i);

end C;

class D

 String main(Bit b);

end D;

class E is C, D

 {illegal; the inheritance creates two methods with the same name and different signatures}

end E; </verbatim>

When more than one parent of a class defines a method with the same name and the same signature, the inheriting class must override the method.

There is no special keyword like Java's "super" to call a superclass implementation of a method. Instead, calling a superclass implementation is done by passing a type parameter to "me". Thus, when there are multiple parent classes, it is easy to call a method on a specific parent class.

<verbatim> class C is D, E

 Integer get;

end C;

Integer C.get return $me[D].get.+($me[E].get); </verbatim>

---++++ Enumerations

An enumeration class is any class that inherits from the class Enumeration. The members of the enumeration are passed as if they are type parameters to the Enumeration class. For example, here is a simplified definition of the Bit class:

<verbatim> class Bit(Bit prototype) is Enumeration[True, False]

 Bit and(Bit right);
 Bit or(Bit right);
 Bit not;

end Bit; </verbatim>

Given this definition, "True" is an expression that evaluates to an object of type Bit, as is "False":

<verbatim> a : True; b : False; c : $a.and($b); </verbatim>

The class Enumeration supports a single method, Enumeration.to-integer. As you might expect, True.to-integer.=(1), and False.to-integer.=(0), because of the position of the values in the class declaration. So if I define a class

<verbatim> class Day(Day prototype) is Enumeration[Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday] end Day; </verbatim>

then Sunday.to-integer.=(0), Monday.to-integer.=(1), etc.

Given enumeration class E, it is required that E declare a constructor that takes a single object of type E. This is so that an object created by calling the constructor will take on one of the defined enumeration values.

---++++ Parameter Association

When you call a method in Heftza and pass parameters to the method, the meaning in which you pass the parameters is meaningless. You are allowed to name parameters explicitly. If you do not provide the names, the compiler will try to associate the parameters by type, but if there is any ambiguity, you will get an error and be asked to provide the names. Example:

<verbatim> $objects.set-object(name : "hello.bit", object : True, path : List("google", "heftza")); </verbatim>

If you look at the above script, you can see that I didn't really need to explicitly name the parameters, because each was a different type.

This default behavior of Heftza can avoid a lot of errors and make code more readable and writable. However, sometimes it's very cumbersome to have to always name parameters. For instance, do you really want to write "comparator.compare-to(right : hello, left: world);" ? You can override this default behavior by using special parameter names. For any method up to 16 parameters, you can name the parameters "first", "second" etc. in that order. When a method has these ordinally named parameters, you get the kind of association-by-order that we're used to in traditional languages, and the caller is not allowed to provide the names or pass the parameters out of order. (Note that "first", "second", "third" etc. are not keywords; rather, they are regular identifiers that cause special behavior when used as parameter names.)

---++ State of the Project

A proof-of-concept/prototype of Heftza does exist, but not in a form that would be usable to build a real application (first of all, it only runs on Windows).

Heftza is currently a Google "20% project". My current thinking is to build a Google internal release first (that would be dependent on Google's infrastructure) and then to build a general purpose release. If a lot of interest develops outside Google I might rethink my plans, but I don't see how that would happen, as I'm not doing much marketing outside of Google at this point.

Personal tools
more tools