Walnut/Distributed Computing
From Erights
Distributed Computing
Multiple threads form the basis of conventional models of concurrent programming. Remarkably, human beings engage in concurrent distributed computing every day even though people are generally single threaded (there are exceptions: Walter Cronkite routinely listened to one broadcast with one ear, a different broadcast with the other, while simultaneously taking notes on another topic and speaking authoritatively to an audience of millions. But most people, including this author, live single-threaded lives).
How do we simple creatures pull off the feat of distributed computing without multiple threads? Why do we not see groups of people, deadlocked in frozen tableau around the coffepot, one person holding the sugar waiting for milk, the other holding milk waiting for sugar?
The answer is, we use a sophisticated computational mechanism known as a nonblocking promise.
Let us look at a conventional human distributed computation. Alice, the CEO of Evernet Computing, needs a new version of the budget including R&D numbers from the VP of Engineering, Bob. Alice calls Bob: "Could you get me those numbers?"
Bob jots Alice's request on his to-do list. "Sure thing, Alice, I promise I'll get them for you after I solve this engineering problem."
Bob has handed Alice a promise for the answer. He has not handed her the answer. But neither Bob nor Alice sits on their hands, blocked, waiting for the resolution.
Rather, Bob continues to work his current problem. And Alice goes to Carol, the CFO: "Carol, when Bob gets those numbers, plug 'em into the spreadsheet and give me the new budget,okay?"
Carol: "No problem." Carol writes Alice's request on her own to-do list, but does not put it either first or last in the list. Rather, she puts it in the conditional part of the list, to be done when the condition is met--in this case, when Bob fulfills his promise.
Conceptually, Alice has handed to Carol a copy of Bob's promise for numbers, and Carol has handed to Alice a promise for a new integrated spreadsheet. Once again, no one waits around, blocked. Carol ambles down the hall for a contract negotiation, Alice goes back to preparing for the IPO.
When Bob finishes his calculations, he signals that his promise has been fulfilled; when Carol receives the signal, she uses Bob's fulfilled promise to fulfill her own promise; when Carol fulfills her promise, Alice gets her spreadsheet. A sophisticated distributed computation has been completed so simply that no one realizes an advanced degree in computer science should have been required.
In this simple example, we see why human beings never get trapped in the thread-based deadlock situations described endlessly in books on concurrent programming. People don't deadlock because they live in a concurrent world managed with a promise-based architecture. And so it is with E.
This chapter on distributed computing is the longest and most challenging part of the book. The Promise Architecture of E is very different from the threads, synchronized methods, guarded objects, and RMI of Java. One might expect the Security chapter and its Capability paradigm to be equally new and challenging. But in fact, the implementation of security is embedded so deeply in E's infrastructure that much of the effort merely involves realizing the power inherent in constructs we have already learned. Security with E is more a matter of architecture than of coding.
Distributed computation, however, still requires gritty effort. We start simply, with the eventually operator.
Eventually Operator
All distributed computing in E starts with the eventually operator, "<-":
# E syntax pragma.syntax("0.9") car <- moveTo(2,3) println("car will eventually move to 2,3. But not yet.")
The first statement can be read as, "car, eventually, please moveTo(2,3)". As soon as this eventual send has been made to the car, the program immediately moves on to the next sequential statement. The program does not wait for the car to move. Indeed, as discussed further under Game Turns below, it is guaranteed that the following statement (println in this case) will execute before the car moves, even if the car is in fact running on the same computer as this code. The request to move the car has been entered on a to-do list (actually sent to the appropriate event queue, for those already familiar with event loops).
Since the program does not wait around for the eventual send, an eventual send is very different from a traditional object-oriented method call (referred to here as an immediate call to distinguish it from an eventual send; the statement "car.moveTo(2,3)" is an immediate call). In general, you do not know, and cannot know, exactly when the car will move; indeed, if the car is on a remote computer, and the communication link is lost, the car may never move at all (and creates a broken promise that you can catch and process, described later).
This brings us to the most interesting feature of an eventual send: just as you do not know when the operation will complete, you also do not know, and do not need to know, which computer in your distributed system is executing the car's code. The car could be a local object running on the same machine with the program... or it could be on a computer a thousand miles away across the Internet. Regardless, E keeps track of the car's Universal Resource Identifier (URI) and delivers the message for you. Moreover, if the car is indeed remote, E sets up a secure communication link between the program and the car; and since the URI for the car includes an unguessable random string of characters, no one can send a message to the car or extract information from the car except someone who has explicitly and intentionally received a reference to it from someone who already has the reference. Thus a computation running on five computers scattered across five continents, all publicly accessible by the whole world of the Web, can be as secure as a computation running on a box locked in your basement.
Another very interesting feature of the eventual send: because the program continues on to the next statement immediately, without waiting for the eventual send to finish, deadlock can never occur.
Finally, note that the eventual send invoked an ordinary object method ("moveTo(x,y)") in the car. The programmer who created the makeCar constructor defines the cars to have ordinary methods, with ordinary returns of values. He does not know and does not care whether objects that invoke those methods use calls or sends, and neither knows nor cares whether those invoking objects are local or remote.
Promises
When you make an eventual send to an object (referred to hereafter simply as a send, which contrasts with a call to a local object that waits for the action to complete), even though the action may not occur for a long time, you immediately get back a promise for the result of the action:
# E syntax def carVow := makeCar <- ("Mercedes") carVow <- moveTo(2,3)
In this example, we have sent the makeCar function the default "run" message. Eventually the carMaker will create the new car on the same computer where the carMaker resides; in the meantime, we get back a promise for the car.
Once we've got a promise, pipelining comes into the picture: We can immediately start making eventual sends to the promise just as if it were indeed the car. Enormous queues of operations can be shipped across the poor-latency comm network while waiting for the car to be created, confident that those operations will be executed as soon as possible.
But don't try to make an immediate call on the promise. Don't try it even if the car maker (and therefore the car it creates) actually live on the same computer with the program. To make immediate calls on the promised object, you must set up an action to occur when the promise resolves, using a when-catch construct:
# E syntax def temperatureVow := carVow <- getEngineTemperature() when (temperatureVow) -> { println(`The temperature of the car engine is: $temperatureVow`) } catch prob { println(`Could not get engine temperature: $prob`) } println("execution of the when-catch waits for resolution of the promise,") println("but the program moves on immediately to these printlns")
We can read the when-catch statement as, "when the promise for a temperature becomes done, and therefore the temperature is locally available, perform the main action block... but if something goes wrong, catch the problem in variable prob and perform the problem block". In this example, we have requested the engine temperature from the carVow. Eventually the carVow resolves to a car and receives the request for engine temperature; then eventually the temperatureVow resolves into an actual temperature. The when-catch construct waits for the temperature to resolve into an actual (integer) temperature, but only the when-catch construct waits (i.e., the when catch will execute later, out of order, when the promise resolves). The program itself does not wait: rather, it proceeds on, with methodical determination, to the next statement following the when-catch.
Inside the when-catch statement, we say that the promise has been resolved. A resolved promise is either fulfilled or broken. If the promise is fulfilled, the main body of the when-catch is activated. If the promise is broken, the catch clause is activated.
Notice that after temperatureVow resolves to temperature, the when clause treats temperature as a local object. This is always safe to do when using vows. A vow is a reference that always resolves to a local object. In the example, variable names suffixed with "vow" remind us and warn others that this reference is a vow. We can reliably make eventual sends on the vow and immediate calls on what the vow resolves to, but we cannot reliably make immediate calls to the vow itself. A more detailed discussion of these types of warnings can be found in the Naming Conventions section later.
Not all references can guarantee to resolve to a local object. A receiver is a reference that can resolve to a local or remote object. Since we cannot ascertain beforehand whether it is local or remote, we must treat it as a remote object, i.e., we must interact with it only using eventual sends. We can use either the suffix Rcvr on the variable name as a reminder, or we can use the rcvr guard when defining the variable as a reminder. If you use Rcvr as a suffix, you will be more reliably reminded not to use immediate calls, but it does produce long variable names.
# E syntax def car :rcvr := makeCarRcvr <- ("Mercedes") car <- moveTo(2,3)
In this example, makeCarRcvr is a reference to a makeCar function that we have reason to believe may be remote, and therefore can only be spoken to reliably with eventual sends. We could name the vehicle a carRcvr, since it's a reference to a car local to the makeCar function, which therefore is also probably remote to us (if makeCar is remote, so is the car). Instead, we have chosen to use the rcvr guard to remind us to only use eventual sends when interacting with this car.
When-catch, like try-catch, has an optional finally clause that always executes:
# E syntax when (tempVow) -> { #...use tempVow } catch prob { #.... report problem } finally { #....log event }
Multiway when-catch
If you need to resolve several references before making a computation, use a multiway when-catch, in which you specify several arguments in the structure:
# E syntax def maxTempVow := vehicleSpecsRcvr <- getMaxEngineTemperature(carType) def engineTempVow := carRcvr <- getEngineTemperature() when (engineTempVow,maxTempVow) -> { if (engineTempVow > maxTempVow) {println("Car overheating!")} } catch e {println(`Lost car or vehicleSpecs: $e`)}
In this when-catch, since all the arguments are vows, the corresponding parameters (engineTemp and maxTemp, to the right of the "->") are simply local objects (integers in this case). How do we know that these integers will be local, not remote, even though we retrieved them from Rcvrs? Because integers are passed by construction, as discussed next.
Pass By Construction and Pass By Proxy, References Far and Near
In the above example, the temperature is an integer and is guaranteed to resolve to a local integer object upon which you can make immediate calls, even if the car is remote. Transparent immutable objects, such as integers, floating point numbers, booleans, strings, and the E data structures ConstLists and ConstMaps, are always passed by copy(which is the most common form of Pass By Construction), so you always get a local copy of the object on resolution if one of these is returned by a method call or send. This local copy can of course accept immediate calls.
Mutable objects, like the car in this example, reside on the machine upon which they were constructed. Such an object is passed by proxy. If such an object is on a different machine from the one on which your piece of the system is running, even when the promise resolves it will only resolve into a far reference (as opposed to a near reference for a local object). Far references accept eventual sends, but cannot accept immediate calls.
So in general, if you created the car using an eventual send to a possibly remote object (a receiver), you would probably have to interact with the car using eventual sends forever, since only such sends are guaranteed to work with remote objects.
# E syntax def carRcvr := carMakerRcvr <- ("Mercedes") carRcvr <- moveTo(2,3) carRcvr <- moveTo(5,6) carRcvr <- moveTo(7,3) def fuelVow := carRcvr <- fuelRemaining()
Partial Ordering Of Sent Messages
The above example moves the carRcvr around repeatedly and then asks for the amount of fuel remaining. This displays another important property of eventual sends: if object A sends several messages via a single reference to object B, it is guaranteed that those messages will arrive and be processed in the order of sending. This is only a partial ordering, however. From B's point of view, there are no guarantees that the messages from A will not be interspersed with messages from C, D, etc. Also, from A's point of view, there are no guarantees that, if A sends messages to both B and C, the message to B will arrive before (or after) the message to C, regardless of the order in which A initiates the messages. Furthermore, if A happens to have 2 different references to B (such as 2 promises that both will eventually resolve to B), the guarantee only applies to messages being sent down one reference.
Despite those uncertainties, however, in the example the partial ordering is sufficient to guarantee that fuelPromise will resolve to the quantity of fuel remaining after all three of the moveTo() operations have occurred. It also guarantees that those moveTo()s will have been performed in the specified sequence.
On first introduction, it is hard to fully appreciate the relationship between partial ordering and when-catch delayed activation. Here is a more sophisticated example.
# E syntax def cars := [car1,car2,car3] for each in cars { def nameVow := each <- getName() def moveVow := each <- moveTo(3,4) when (nameVow, moveVow) -> { println(`Car $nameVow has arrived`) } catch e {println("A car did not make it")} } println ("Cars have been told to move, but no print of any arrivals has yet occurred")
In this example, getName and moveTo messages are sent to all the cars in the list in rapid-fire succession, never waiting for any answers to get back. The "Cars have been told to move" message at the bottom is guaranteed to print before any "has arrived" messages print. We cannot predict which of the 3 cars will arrive first, second, or third. It all depends on where they are running, how slow the connections are, how loaded the processors are. We cannot even predict which will receive the moveTo message first: though the program is guaranteed to fire off the message to car1 first, the processor upon which car1 executes might be several thousand miles away across the Internet, across a dozen bottlenecked routers. And car2 could be running on a computer two feet away, a short hop down an Ethernet connection. In that case car2 would probably be first to receive its message, and be first to actually arrive, and be the first to return its resolution so the arrival message could print--all three of those events being independent, and the first car to do one is not necessarily the first car to do the next.
Meanwhile, it is guaranteed that each car will get the getName message before it gets the moveTo message. It is not guaranteed that the nameVow will resolve to an answer before the moveVow has resolved, so you must use a multi-vow when-catch to use the name inside the when body with confidence. And if the moveTo promise is broken, which activates the catch clause, we do not know the resolution of the name--it could have fulfilled before the moveTo promise broke, or the name promise could be waiting (and on the verge of being broken or fulfilled) as well. We could use the E isBroken() function described later if we wanted to try to print the name of the car in the catch clause in those situations where the name was fulfilled though the moveTo was broken.
Also it is worth noting that the moveTo method, when used in an eventual send, returns a promise even though the moveTo method is of type "void" (i.e., "does not return a value"). This promise, if fulfilled, will always become the uninteresting value of null...but as shown in this example, sometimes it is valuable to know that fulfillment has occurred, or that fulfillment has failed, even if you do not care what the fulfilled value may be.
Sometimes you want to ensure that car1 arrives before car2, and car2 before car3. In a simple problem you could use nested when-catches, described later. In this example, where you don't necessarily know how many cars there are, you would use the recursive when-while pattern or the sendValve utility, both described at the end of the chapter.
Naming conventions
In the examples up to this point we have been using a number of naming conventions. Here are some common conventions.
car -- an object that you can treat as a near object, using immediate calls or eventual sends as you wish.
carVow -- a car that will be near, available for immediate calls, once you resolve the vow inside a when-catch clause.
carRcvr -- a car that must be treated as a remote object, spoken to only with eventual sends, never with immediate calls, because although it might sometimes be a near car, it can also be remote, and only eventual sends are safe. As noted earlier, another way of denoting that a car may be remote is to use the ":rcvr" guard.
getCar(licensePlate) -- a function or a method that returns a local car (local to the object implementing getCar)
getCarVow(licensePlate) -- a function or a method that returns a vow for a car, not the car itself. If you are calling this method on a local object, once you resolve the vow, the car is local.
getCarRcvr (licensePlate) -- This method returns a car which the author of the method has reason to believe may be a remote car. Consequently, it is a warning that you should only interact with the returned object with eventual sends.
getCarVow(licensePlateVow, titlePapersRcvr) -- this method explicitly names the parameters licensePlateVow and titlePapersRcvr. Whereas the suffixes Vow and Rcvr are warnings when used in a method or variable name (warnings to avoid the use of immediate calls), when used as parameter suffixes they are commitments. In this example, the developer of the getCarVow method is making the commitment to users of the method that he will not make immediate calls on the licensePlate until he has resolved the plate in a when-catch block. And he is making the commitment that he will only interact with the titlePapers using eventual sends. In general, if at least one of the parameters for a method is either a Vow or a Rcvr, the method itself cannot return a near object: if the method must wait on eventual sends to gather the info to return an object, it cannot reasonably return the object immediately. No doubt, however, someone reading this will be the first person to invent a situation that breaks this general rule. We congratulate you in absentia.
makeCar (name) -- a carMaker that is local, can be immediately called, and makes cars that can be immediately called.
makeCarVow (name) -- an object maker like a carMaker, but which is dependent on construction that may only resolve eventually, so it can only send you a promise for the car.
makeCarRcvr(name) -- a local carMaker that may make the car on a remote system, and therefore the cars should be treated as Rcvrs.
carMakerRcvr(name) -- a carMaker that may itself be on a remote system, that should only be spoken to with eventual sends. Rather than having a very wordy term for remote Makers that make remote cars (which are local to the Maker), since such a Maker is almost always going to create cars that you must treat as Rcvrs, we use carMakerRcvr to also designate makers that we might be tempted to name "carRcvrMakerRcvr". This name, carMakerRcvr, is a first and only example you will see here of the more generalized naming convention used when constructors and authorizers and rcvrs are interrelated in a complex way. Rather than depending on a prefix that might refer to any chunk of the name, we stack up the descriptions as suffixes. Multiple levels of constructors and authorizers play an important role in achieving E's security goals, so while these complicated structures are uncommon in most of the code in a system, there are likely to be a few parts in a secure system where such naming conventions play a role.
Testing for Equality in a Distributed System
As noted above, you can send eventual messages to both promises and far references. So if you plan to treat the fulfilled object as a far reference, using only eventual sends, there is usually no reason to use a when-catch to wait for resolution. Just go ahead and start sending messages, uncaring of whether the promise has yet been fulfilled.
There are four reasons to use a when-catch construct to resolve the promise for a far object.
- To find out that the original request succeeded, or otherwise, to get the problem
- To flush out all previous messages sent via a reference to an object before taking an action
- To use the result as a key in a hash table (a promise cannot be used as a hash key)
- To test for equality, i.e., to see whether the promised object is the same object that you received from another activity.
Here is a test for equality:
# E syntax def polePositionCarRcvr := raceTrack <- getPolePositionCar() when (polePositionCarRcvr) -> { if (polePositionCarRcvr == myCar) { println("My car is in pole position") } } catch prob {}
Note that the catch clause is empty. Because remote references across a network are unreliable, you will usually want to handle the problem that triggered the catch clause. However, if you have many pending actions with a single remote object, all the catch clauses for all those actions will start up if the connection fails, so you may want to disregard most of the catch clauses and focus on handling the larger problem (loss of connection) in one place. Often in this situation you will want to set up a whenBroken message, described later.
But in this example, there is no compelling special action to take if you can't find out whether your car is in pole position, so no action is taken.
Making your own Promises
If you write a function or method that must get values from a far object before returning an answer, your function should return a promise for the answer, and fulfill that answer later. You can create your own promises using Ref.promise() in E:
# E sample def calcMilesBeforeEmptyVow(carRcvr) { def [milesBeforeEmptyPromise, milesBeforeEmptyResolver] := Ref.promise() def fuelRemainingVow := carRcvr <- getFuelRemaining() def mpgVow := carRcvr <- getEfficiency() when (fuelRemainingVow, mpgVow) -> { milesBeforeEmptyResolver.resolve(mpgVow * fuelRemainingVow) } catch prob { milesBeforeEmptyResolver.smash(`Car Lost: $prob`) } return milesBeforeEmptyPromise } #....Meanwhile, somewhere else in the program.... def myCar { to getFuelRemaining() {return 5} to getEfficiency() {return 2} } def milesVow := calcMilesBeforeEmptyVow(myCar) when (milesVow) -> { println(`miles before empty = $milesVow`) } catch e {println(`Error: $e`)}
This example highlights several different features of promises. The basic idea of the example is that the function calcMilesBeforeEmptyVow(carRcvr) will eventually compute the number of miles the car can still travel before it is out of fuel; later in the program, when this computation is resolved, the program will print the value. However, before we can do the computation, we must first get both the fuelRemaining and the milesPerGallon values from the remote car.
The promise and the resolver for that promise are created as a pair using Ref.promise(). They are "normal" variables in the sense that they can be passed as arguments, returned from methods as answers, and sent eventual messages. Resolvers truly are ordinary objects. There are 3 restrictions on the promises: they cannot accept immediate calls, they cannot be tested for equality, and they cannot be used as keys in hash tables.
In this example, the function returns the milesBeforeEmptyPromise to the caller just as it would return any other kind of value. To cause the promise to resolve, the function calls the resolver (or sends to the resolver if the resolver may be remote to your program) with the "resolve(value)" method. To break the promise (which produces a problem that will cause the catch clause of a when-catch to execute), call the resolver with the "smash(problem)" method.
It is possible to chain resolutions: you can resolve a promise by handing the resolver another promise (though if you hand the resolver the promise which it is intended to resolve, E will detect the problem and automatically break the promise--a promise resolved to itself can never be fulfilled). If you resolve a promise with another promise, the original promise is not yet considered resolved, i.e., the when-catch body will not be activated by such a resolution. The when-catch body will only be activated when the entire promise chain is resolved, and there is an actual reference to an actual object available.
Letting When-Catch make a Promise For You
As noted in chapter 1 under the Secret Lives of Flow Control Structures, control structures return values. When used as described up to this point, the return value of the when--catch expression has been ignored. In fact, the when-catch expression returns a promise for the value that the done function will return. If the done function does not return a value, then it will always resolve to null. However, if the done(value) function of the when-catch returns a value, the when-catch returns a promise that will be fulfilled by the last value in the executed clause. Rewriting the calcMilesBeforeEmptyVow using this feature, we get: not right, not right, fix
# E sample def calcMilesBeforeEmptyVow(carRcvr) { def fuelRemainingVow := carRcvr <- getFuelRemaining() def mpgVow := carRcvr <- getEfficiency() return when (fuelRemainingVow, mpgVow) -> { mpgVow * fuelRemainingVow } catch prob { throw(`Car Lost: $prob`) } }
Here the when-catch creates the promise which is returned by the function, and resolution is automatic at the end of the when-catch execution. What does the promise resolve to during a catch and/or a finally?
Broken Promise Contagion
When chaining promises, what happens if one of the promises in the middle or the far end breaks? The problem that breaks the far promise is propagated down all the promises dependent on it. As a consequence, broken promises work their way through the system in a similar way to exceptions being caught in try/catch blocks.
State Transitions for References
Having worked our way through near, far, broken, and promised references, let us put all the transitions among these together in a single place. The following diagram shows all the kinds of references E distinguishes between, and it shows the two possible transitions:
The 2 transitions are:
- A promise may transition (resolve) into a resolved reference, which can be near or far or broken..
- A far ref may transition to a broken ref.
That is the complete collection of transitions.
Guards for distributed systems
Very early in the book, we presented a list of E types, including any, integer, float64, etc. In that list were "rcvr", "vow", "pbc" and "near". "pbc" means that the type of the value is required to be "pass by construction". Transparent immutables are generally pbc, and can be passed through this type declaration. Strings, integers, floats, booleans, ConstMaps and ConstLists are all pbc. Objects that are pbc are copied when they are sent, so that even if the object originated on a remote computer, when you get it you can use immediate calls on it. Also, since pbc objects do not embody any code, pbc objects passed across a trust boundary can be used with far fewer security concerns: the pbc object may contain bad data, but it cannot play any tricky games in its execution.
"Near" means that the value must be local. Promises and far references will not pass the near guard. Anything that passes a near guard can be sent messages with immediate calls.
At this time, "rcvr" and "vow" have the same algorithmic meaning as "any", but can be used as hints to the programmer about how to interact with the object (a vow should be referenced only eventually until it is resolved; a rcvr should always be referenced only eventually).
Nested When-Catch constructs
As discussed earlier, we can guarantee the ordering of messages sent from one object to another object. But when a program sends messages to several objects, on different machines, the order in which answers will be received cannot be predicted. Suppose you need to perform a computation involving two far answers, and the algorithm depends on the result from the first answer. You can nest when-catch clauses to await the outcome of the dependency:
# E syntax def fileRcvr := fileCacheRcvr <- getFile() when (def exists := fileRcvr <- exists()) -> { if (exists) { when (def text := fileRcvr <- getText()) -> { println(`File has text: $text`) } catch p2 {println(`Problem with existing file: $p2`)} } else {println("File does not exist")} } catch prob {println(`Problem reaching file: $prob`)}
Here, we want to print the text of a farFile, but we want to be sure the file exists first. So we ask the file if it exists, and only after we determine that it does indeed exist do we proceed to get the text.
As noted earlier, the "done(variable)" clause of the when-catch construct is actually the declaration of a function and a series of parameters. The reasons for this are beyond the scope of this book, but the operational consequence is that you must use different function names if you have multiple when-catch clauses in the same scope.
Nested when-catch clauses put the "done" functions into the same scope. In the example here, we gave the inner when-catch "done" function the name "done2" to prevent name collision. We also gave the error-containing variables in the catch clauses different names, e and e2.
Live Refs, Sturdy Refs, and URIs
Up to this point we have carefully dodged a critical question. How does a program acquire its very first reference to an object on a different computer? In all our examples up to this point, we have always started out with a reference to at least one remote object, and we retrieved other remote objects by asking that object for other objects: for example, we asked a remote carMakerRcvr for a new (remote) carRcvr, and were able to immediately work with the car just like any other remote object. How did we get the reference to the carMakerRcvr in the first place?
In E, the reference to an object can be encoded as a Universal Resource Identifier string, known as a uri (the familiar url of the Web is a type of uri). This uri string can be passed around in many fashions; one good secure way of passing a uri is to save it as a text file, encrypt and sign it with PGP, and send it in email. If you wish to run a seriously secure distributed E system, encrypting the uris is crucial: indeed, the passing of the uris from machine to machine is the main security issue that E cannot address for you (and is a problem shared by all security systems, a problem that will diminish with time as secure systems like PGP email are deployed). Other ways uris have been passed in operational E systems have been to send the uri over an ssh connection, and (less securely) by reading the uri off over a telephone! If you are using E on a local area network and have no security concerns, but are using E simply because it is simpler, safer, and more maintainable for distributed computing, the uris can be stored in files on a shared file system and read directly by the programs on different computers as they start up.
The functions makeURI(object) and objFromURI(uri) detailed in the E Quick Reference Card perform the basic transformations you need to hook up objects on multiple computers. These routines use sturdy refs and liverefs in their computations. A sturdyRef is an object that contains an enduring description of where an object can be found; sturdyRefs and URIs are simple transformations of one another. LiveRefs are actual connections to objects that disappear any time a connection goes down. LiveRefs carry the actual traffic in E communication. When you request a remote object from another remote object (as in farCarMaker <- ("mercedes")), what you actually get is a liveRef. If you want to continue to connect with that particular car across multiple sessions, you will need to explicitly get a sturdyRef from the car (and the car will have to have an explicit method for granting the sturdyRef. The sturdyref function is an important capability, not one handed out by default).
Each program that expects to work with remote objects needs to invoke the command
# E sample introducer.onTheAir()
before starting any remote connections, including the making or using of uris. An example of these functions working together can be found in eChat.
blockAtTop and continueAtTop
E programs never block. They never just sit and wait around for results. However, most programs will toss out some windows, or set up some services, and then desire to wait for someone or some thing to use them. Often, the first version of a program written by a new E programmer will set everything up, execute the last line of the program, and then shut down. It is quite disconcerting to watch your windows briefly light up and then disappear, along with the java virtual machine underlying the entire program.
The command
interp.blockAtTop()
causes the E interpreter to wait for requests to come in. The command
interp.continueAtTop()
causes the E interpreter to stop waiting. These commands are used in the eChat sample at the end of this chapter: the last line of the program initiates blockAtTop, and continueAtTop is invoked when the user closes his chat window.
A bit of Philosophy when using Promises for the first time
If a method accepts a promise or far reference as a parameter, the returned value almost certainly cannot be a local object. Why is this? If the computation of the returned object depends on something eventual, the computation will have to wait for an answer, which means the best the object can do for immediately returning something is to return a promise. So a method like getCar(licensePlateVow) has almost certainly violated the naming convention, one way or the other.
As a consequence of this and other characteristics of promises, beginning E programmers often experience a moment of breathlessness soon after they first try to use them. This moment occurs when the programmer erroneously concludes that, once you let a promise into your system, the program has tumbled down a slippery slope from which there is no recovery. It feels as if you will never be able to have a real value in your hands again; you'll spend the rest of your life making eventual sends, coupled with endless when-catch constructs just to get to the next step.
The breathlessness can metastasize into a feeling of having lost control of the software: everything is off computing somewhere else and all you have is a bunch of promises. Since E never (really, never!) blocks, you can never tell things, "just stop for a moment while I catch my breath and gather up all the values" :-)
In fact, there are several patterns and tools for getting control of things again. The basic multi-promise when-catch, and the promiseAllResolved and Summoner patterns at the end of this chapter are just a few examples of structures specifically designed to reduce the frequency with which you need when-catch. They allow you to get quickly back into local computation after a distributed action. You really can catch your breath.
And once the moment of breathlessness passes, you will feel freedom, for you will never have to deal with thread synchronization again; you will never lie awake at night wondering if an unexplored race condition will cause deadlock a few days after the production load level hits your system. How early can I put this paragraph easily? AHK proposes right after Promises.
Additional Promise-Related Operations
- whenBroken
- E isBroken
- whenResolved Use this when your action will be the same once the promise resolves, regardless of whether it resolves broken or not.
- E isResolved
- E send(object,verb,arguments) This function is identical to the E call() function described earlier, but does an eventual send rather than an immediate call.
- E sendOnly
- E join(a,b) Guarantees sends to result of join are delivered only after messages to a and b have been delivered, sort of a test for equality on-the-fly
Under the Covers: Vats and Game Turns
Each vat has a separate event loop, each game turn corresponds to the processing of a single event to conclusion with no interruptions. There are frontend vats, which all share the swing event queue for queuing messages to their event loops, and there are backend queues, each of which has its own separate event queue and runs on a separate thread. Frontend vats have the advantage that they can interact directly with user interface widgets. The initial vat created by default at the start of an E program is a frontend vat. You might want multiple frontend vats if you want to shut down whole subsystems in one fell swoop: you can tell a vat to shut down, and it will terminate all its connections to the outside world, making it available for garbage collection. Have markm review this and tell me what is wrong with this description.
creating a second frontend vat on a single jvm. creating a backend vat on a jvm
Example: Alice, Bob, and Carol as Code
We started this chapter with a description of a human interaction, with Alice as the CEO of Evernet Computing in search of a new spreadsheet for her IPO preparation. What would that distributed problem look like in code? Though we present only a skeleton of this human operation here, it is perhaps nonetheless informative. It might also be informative, after one has read how to do the example in E, to think about how it would be done with threads, in Java or C++. That, however, is left as an exercise for the reader.
# E sample #---- Bob Code def bobRcvr { to handleExplosionInNetworkServerRoom() { # handle the explosion } to getCostNumbers() :pbc { return ["R&D" => 10000] } } #------------ Carol Code def carolRcvr { to attendMeeting(room, time) { # attend the meeting } to integrateSpreadsheetVow(RDNumbersVow) { def sheetVow := when (RDNumbersVow) -> { ["Marketing" => 50000] | RDNumbersVow } catch prob { println("No numbers!") prob } return sheetVow } } #--------- Alice Code def aliceRcvr { to prepareIPOVow() { def RDNumbersVow := bobRcvr <- getCostNumbers() def spreadsheet := carolRcvr <- integrateSpreadsheetVow(RDNumbersVow) return when (spreadsheet) -> { println(`Appendix: Budget Spreadsheet: $\n$spreadsheet`) } catch prob { aliceRcvr <- fireSomebody([bobRcvr, carolRcvr]) } # do other IPO preparations } to fireSomebody(candidateRcvrs) { # get rid of culprit } } bobRcvr <- handleExplosionInNetworkServerRoom() carolRcvr <- attendMeeting(1,2) aliceRcvr <- prepareIPOVow()
Example: minChat version 1
In order to focus on the distributed computation features of this chat tool, we have gone to extreme lengths to make the user interface machinery a very short piece of code. Hence, this should properly be called "minChat", because the user interface is so brutally minimal. Even with such minimization, however, the user interface still requires more lines of code than the actual distributed computation!
The chatController at the bottom of the example is the heart and soul of the computation.
<nowiki> #?? in new vat minChatVat.e-swt ## E sample #!/usr/bin/env rune #eChat with minimalist user interface pragma.syntax("0.9") def <widget> := <swt:widgets.*> def SWT := <swt:SWT> introducer.onTheAir() # return the object represented by the URI def getObjectFromURI(uri) {return introducer.sturdyFromURI(uri).getRcvr()} def makeURIFromObject(obj) :String { # This implementation assumes a non-persistent single incarnation def [sr, _, _] := identityMgr.makeKnown(obj)#XXX not a uri if bracketed, bug, markm?
def bracketed := introducer.sturdyToURI(sr) if (bracketed =~ `<@uri>`) {return uri} return bracketed } def chatController def chatArea def chatUI { to show() { def frame := <widget:Shell>(currentDisplay) frame.setText("eChat"); frame.setBounds(30, 30, 600, 300) def winDisposeListener {to widgetDisposed(event) {interp.continueAtTop()}} frame.addDisposeListener(winDisposeListener) bind chatArea := <widget:Text>(frame, (SWT.getMULTI() | SWT.getWRAP()) | (SWT.getBORDER() | SWT.getV_SCROLL())) def commandLine := <widget:Text>(frame, SWT.getSINGLE() | SWT.getBORDER()) def enterKeyListener { to keyPressed (event) { if (event.getKeyCode() == 27) { if (commandLine.getText() =~ `@command @argument`) { commandLine.setText("") switch (command) { match == "save" {chatController.save(<file: argument>)} match == "load" {chatController.load(<file: argument>)} match == "send" {chatController.send(argument)} match _ {commandLine.setText("Error. Try save load, or send")} } } else {commandLine.setText("Error. Try save, load, or send")} } } match [verb, args] {} } commandLine.addKeyListener(enterKeyListener) def label := <widget:Label>(frame, SWT.getLEFT()) label.setText("save, load, send. No quotes for path. Use Escape to start operation. ") swtGrid`$frame: $chatArea.X.Y $label.X $commandLine.X` frame.open() } to showMessage(initiator, text) {chatArea.append(`$initiator: $text $\n`)} } def friend bind chatController { to send(message) { when (friend<-receive(message)) -> { chatUI.showMessage("self", message) } catch prob {chatUI.showMessage("system", "connection lost")} } to receive(message) {chatUI.showMessage("friend", message)} to receiveFriend(friendRcvr) { bind friend := friendRcvr chatUI.showMessage("system", "friend has arrived") } to save(file) {file.setText(makeURIFromObject(chatController))} to load(file) { bind friend := getObjectFromURI(file.getText()) friend <- receiveFriend(chatController) } } chatUI.show() # In actual code, the following line would not be commented out # interp.blockAtTop()</nowiki>
There are only 5 methods in making chat happen: sending a message to the friend, receiving a message from the friend, being informed that the friend has found you (the receivedFriend method), saving the URI where your friend can find you, and loading the URI that describes where you can find your friend.
A quick run through can be done by putting the source in a file (file extension ".e-swt", this example uses SWT for user interface) and launching the file twice to get 2 chat windows. In one window, type "save ~/me.minchat" and press the Escape key. This creates a file (in your home directory) that contains a cap: URI describing where your chat session can be found (a chat session that can only be found by someone to whom this file has been given--the location cannot be found or guessed by anyone else). In the other window, type "load ~/me.minchat" and press Escape. In the first window you should see a message pop up that the "friend has arrived". Now in either window type "send hi y'all", and you will see the message appear in both windows.
In minChat , every time you start up the program, it comes to life with a new uri that must be sent to the friend with whom you wish to chat; the old uri from the previous session is useless. MinChat would be far more useful if you could create a minChat session with Bob, and continue that session even after you have rebooted your computer. We look at building such a persistent chat in the chapter on Persistent Secure Distributed Computing--after first looking at minChat here through the eyes of a security reviewer (or a cybercracker) in the chapter on Secure Distributed Computing.
Patterns of Promises
resolveAllVow
Suppose you have a computation that cannot be performed until you have received answers from multiple remote objects. Your first thought would be to use a multi-vow when-catch. But further suppose you do not know beforehand how many vows there are (i.e., suppose you have an arbitrary list of promises). In this case neither the multi-vow when-catch construct nor the nested when-catch pattern will work. For this situation, you can use the resolveAllVow utility described here.
The resolveAllVow utility has another distinction compared to a multi-vow when-catch: whereas the multi-vow activates the catch clause as soon as it hits a broken promise, resolveAllVow sends no resolution until everything in the list is resolved one way or the other. So, even if several of the promises were broken, with resolveAllVow you can be sure that the rest have been fulfilled when the catch clause is activated.
In this example, we use resolveAllVow to sum a list of contributions. We do not know beforehand how many contributions there are (perhaps thousands), and we want to sum them up even if some of the requests for donations return broken promises.
E isBroken(obj) does not work. what does?
# E sample # Given a List of promises, when all the promises have been resolved, # resolve the returned vow, either fulfilling it with the list of fulfillments (if # all the promises were fulfilled) or by smashing the returned promise # with one of the broken promises (if at least one promise was broken). def resolveAllVow(promises) { def [resultVow, resolver] := Ref.promise() var count := promises.size() var resolution := promises var noneBroken := true if (count == 0) {resolver.resolve(promises)} for promise in promises { when (promise) -> { # success processed in finally clause } catch prob { resolution := prob noneBroken := false } finally { count -= 1 if (count == 0) { if (noneBroken) { resolver.resolve(resolution) } else {resolver.smash(resolution)} } } } return resultVow } #now use the promiseAllResolved to sum contributions, where the #number of contributions is unknown prior to execution def printTotalContributions(amountsList) { var total := 0 for each in amountsList { if (!(Ref.isBroken(each))) {total += each} } println(`Total contributions are: $total`) } # scaffold contributors: real contributors would be remote Rcvrs def makeContributer(donation) { return def contributor { to getDonation() :int {return donation} } } def contributorRcvrsList := [makeContributer(5), makeContributer(6)] def amountsVows := [].diverge() for each in contributorRcvrsList { amountsVows.push(each <- getDonation()) } when (def amounts := resolveAllVow(amountsVows)) -> { printTotalContributions(amounts) } catch prob { #due to the nature of resolveAllVows, in this catch clause #we are guaranteed everything in amountsVows is resolved to #either an amount or a broken reference printTotalContributions(amountsVows) }
The basic behavior of resolveAllVow is this: when initiated, the function takes a count of how many promises there are, then immediately spins off when-catches for all of them. Each time a promise resolves, the count of outstanding promises is decremented. When the counter drops to zero, implying all the promises have been resolved, the single promise initially returned by resolveAllVow is finally resolved.
dialogVowMaker
Java's standard dialog boxes, found in JOptionPane, are seriously flawed for use in a distributed E system: those convenience dialogs are fully blocking. As a consequence, if your computer is part of a distributed computation, and you go home for the night, if a Java dialog box pops up, your machine is effectively offline until you return in the morning. Included in the E distribution is a dialog box tool, the dialogVowMaker:
code here
Recursion for send sequencing
As discussed earlier, putting a when-catch inside a for loop does not give you any guarantees on which when-catch will activate first: it only guarantees which message is sent first, which is mostly uninteresting. For general-purpose ensured sequencing of resolution, you must use recursion.
Earlier, we had an example in which we told a list of cars to moveTo the same location. Suppose we wished to ensure that the cars arrived in the order in which they appear in the list:
# E sample def moveAllCarsInSequence(carRcvrs,toX,toY) { var carI := 0 def moveRemainingCars() { if (carI < carRcvrs.size()) { def nextCarRcvr := carRcvrs[carI] carI += 1 def nameVow := nextCarRcvr <- getName() when (nameVow,nextCarRcvr <- moveTo(toX,toY)) -> { println(nameVow + " arrived, next car about to start moving") } catch e { println(`car died: $e`) } finally {moveRemainingCars()} } } moveRemainingCars() } # scaffold carmaker def makeCarRcvr(name) { def carRcvr { to getName() :String {return name} to moveTo(x, y) { #move the car } } return carRcvr } moveAllCarsInSequence([makeCarRcvr("car1"),makeCarRcvr("car2")],3,4)
Inside the moveAllCarsInSequence function, we declare the recursive moveRemainingCars function, then simply invoke moveRemainingCars. The function moveRemainingCars tells the next car to move; once that movement has resolved, moveRemainingCars invokes itself once again.
In this version of moveAllCarsInSequence, if a car is lost (i.e., its promise to move is broken), the recursion continues. If we wanted to abort after a problem occurred, we would simply delete the call to moveRemainingCars() from the finally clause, and place it in the when clause.
sendValve
As we have seen, it is possible to spin off large numbers of eventual sends in the blink of an eye. In fact, tossing off vast numbers of such sends can sometimes consume a lot of resources for no good purpose. In the eDesk program at the end of the book, it is possible for the user to request the transfer of whole directories full of files. It would be possible for eDesk to initiate transfer of all of those files simultaneously. But the processing bottleneck is probably bandwidth, so initiating them all at once won't get the whole transfer done any faster, and meanwhile each individual file transfer consumes buffer space once initiated. Starting all the transfers at once could actually slow down the transfer if enough buffers thrash virtual memory.
What you want is a valve, which can be opened and closed depending on the situation. The program can still set up all the operations in a single swoop, but the valve will constrain the number of such operations that are actually active at a given moment. The sendValve below performs this function.
#sendValve. If you have numerous eventual sends to initiate, # but initiating them all at once would consume vast resources # and/or would actually slow down processing, queue the actions # through a valve. #An ActionTrio is the list of [obj,"verb",[arguments]] # that can be used in an E send() #The actions are guaranteed to be initiated in the sequence in # which they are placed in the valve, though of course there is # no guarantee as to which will terminate first (unless you have # special knowledge of such sequencing outside the valve). #The numSimultaneousAllowed is the number of actions that can # be run concurrently through this valve. # E sample def makeSendValve (numSimultaneousAllowed) { var actionQueue := [] var numRunning := 0 def startNext() { if (!(actionQueue.size()==0)) { def [actionTrio, resolver] := actionQueue[0] actionQueue := actionQueue(1, actionQueue.size()) numRunning += 1 def completeVow := E.send(actionTrio[0],actionTrio[1],actionTrio[2]) resolver.resolve(completeVow) when (completeVow) -> { } catch prob { } finally { numRunning -= 1 startNext() } } } def valve { to makeActionVow(actionTrio) { def [completeVow, resolver] := Ref.promise() actionQueue := actionQueue.with([actionTrio, resolver]) if (numRunning < numSimultaneousAllowed) { startNext() } return completeVow } } return valve }
The sendValve utility, if used with numSimultaneousAllowed==1, can also perform the send sequencing function described earlier. However, the earlier general send sequencing pattern is still useful. For example, if the moveAllCarsInSequence operation should abort upon a encountering a broken resolution, this version of sendValve would not give the correct result.
Summoning
Transparent forwarder
It is possible to have a network of machines in which the machines do not all have direct access to one another. Suppose in the network with machine A, B, and C, A and C cannot form a direct connection because of network topology, but B can reach everything. If A needs a capability on C, we can put a transparent forwarder on B such that A sends its messages to the forwarder on B:
# E sample def makeTransparentForwarder(representedRcvr) { def forwarder { match [verb,args] {E.send(representedRcvr,verb,args)} } return forwarder }
Create the forwarder on B, and hand the reference to the forwarder to the appropriate object on A. An example can be found in the eDesk example at the end of the book, in which a forwarder is used if the system detects that two of the file servers are unable to directly connect. Note that the "delegate" keyword does not quite work here: delegate generates immediate calls, not eventual sends. So we had to revert to the more general purpose matching process.
acceptOnlyMoreRecentVow
Suppose you are periodically requesting a particular piece of information from a far object. Suppose the far object cannot supply the answer in a single game turn, i.e., it must ask other objects for information to get the answer, and so it sends you a promise rather than a result. It is possible for the resolutions of these promises to occur out-of-order, i.e., the resolution on the second request could get to you after the resolution of the third request has already occurred. In this situation, it will appear that the answer to the second request is newer and more up-to-date than the third request.
In this situation, to ensure you are getting only new information, not stale information, use the acceptOnlyMoreRecentVow utility.
def acceptOnlyMoreRecentVow
code here
This situation, while rare, actually arises in the Marketplace example in the Secure Distributed Computing chapter.
something about time, the race construct, timeout construct
eDesk overwrite copy jewel
The following is not really a pattern so much as it is an example of all the more novel (i.e., not-like-Java) elements of E playing together. The problem, taken from the eDesk program in the Examples at the end of the book, is this:
example
Data Lock
Though E is safe from deadlock, in which several objects compete for acquisition to a set of resources, there is a related dilemma that E programs can encounter. If two or more "when" clauses wait for each other to resolve before they can resolve themselves, this causes data lock. A complete discussion of data lock and why it is far less of a risk than deadlock is beyond the scope of this book. Briefly, both theory and current experience suggest that data locks are more difficult to create in the wild than deadlocks. Furthermore, when datalocks do occur, they are less likely to freeze up important subsystems on a grand scale: unlike the deadlock, they only choke a few "when" clauses, not a set of "critical sections" which have been named "critical" in thread programming for good reasons. And lastly, datalocks are consistent, reproducible, and therefore fixable, unlike the deadlocks that appear mysteriously when some incredibly arcane race condition occurs.
Here is a simple though foolish example of datalock. It is the E implementation of the English sentence, "This sentence is false."
def sentence := sentence <- not()
In this construction, the sentence will forever be an unresolved promise. As noted above, the failure of this to resolve will not impede the progress of the rest of the program in the least. And it is very reliable: the unresolvability of the promise will appear the first time you run the program, and the second, and in every debugging pass thereafter.
At the time of this writing, data lock has been seen only once in the wild. In that instance, a widely used object (call it WUO),contained one piece of mutable state and one piece of immutable state. The object that computed new versions of WUO's mutable state needed the immutable part of the state, and requested the current version of WUO from the owner of the object. Unfortunately, the owner knew that a new version was under construction, and was only sending out promises for WUO until the update was resolved. The solution in this case was to refactor WUO. This resulted in an architecture that was cleaner even by normal object design criteria, without regard to the distributed usage behavior. Insufficient data is yet available to assess how commonly this type of solution will suffice.
Next Section: Secure Distributed Computing