Designed by Mike Taylor, beginning Tuesday 24th August 1999
Copyright © Mike Taylor, 1999.
This document may be freely redistributed provided it is not modified in any way. Specifically, the authorship must remain clear. All feedback is extremely welcome, and should be emailed to the author at firstname.lastname@example.org.
This document describes version 1.0 of the ETA language.
Document SCCSID ("@(#)/export/home/staff/mike/src/language/eta/doc/SCCS/s.eta.html 1.6")
The design of the ETA programming language follows from the axiom that just because you have to program on a tiny machine, that doesn't mean your programs have to be unreadable.
The designers of less ambitious languages such as COBOL intended that programs written in their languages should be readable to English-speaking managers as well as trained technical staff. ETA goes further by allowing you to write your programs in such a way that they can easily be read by speakers of whatever language you choose.
ETA achieves this lofty goal despite, or perhaps because of, its tiny size. It has very few instructions, and is thus extremely easy to learn. Admittedly, this extraordinary combination of power and simplicity comes at a cost: there is no guarantee that the natural language description in the program text will bear any relation to what the program actually does.
Why the name ETA? There are several reasons. Firstly, it's named after its own instruction set (see below). Secondly it's named after the seventh letter of the Greek alphabet, both to celebrate its use of the base-7 numbering system, and because it comes a little way before iota, indicating how tiny the language is. And thirdly, ETA stands for Estimated Time of Arrival, which is appropriate since ETA programs tend to run rather slowly. Oh, and fourthly, ETA is an anagram of TEA, which is nice to drink.
Oh, and Harvey Thompson (harveyt@sco.COM) suggests the following alternative interpretations of the ETA acronym:
Extremely Tortuous Algorithms Expressions That Amaze Encoded Tricky Allusions Esteemed Thrick Avuncularisations Elocutionist Throat Aberrations Effervescent Tautological Anagrams
The ETA language has only eight instructions, and these are named by the eight most frequently occurring letters in written English: E, T, A, O, I, N, S and H. Since the same eight letters are used to represent numbers (see below), all other characters in an ETA program are ignored (with the exception of line-breaks - see below.) This flexibility allows additional non-functional characters to be arbitrarily intermingled with the instruction and operand characters.
As a further aid to readability, case is not considered significant: programmers are encouraged to use mixed case in such a way as to elucidate the structure of the program.
For example, the following program fragments are all equivalent (they are no-ops):
NENAET nenaet NeNAEt Wunce upun a ET... Unbend a Pet, ugly!
All the instructions work on a stack of signed 32-bit integers. Implementations must support stack-depth of at least 100. More is better. Infinite is best.
Two of the instructions use addresses. An address in an ETA program is a line-number, counted from one upwards. (This is why line-breaks are significant.) Lines are considered to be terminated by an LF (ASCII 012), a CR (ASCII 015), a CR-LF pair or an LF-CR pair, so that ETA programs run portably on systems with different end-of-record conventions.
The instructions are as mnemonic as possible (which in truth is not really all that mnemonic):
|E||dividE second-top by top; push quotient, then remainder.|
|T||if top-of-stack is true, Transfer control to stacked line.|
|A||push one more than the current Address onto the stack.|
|O||Output a character whose code is popped from the stack.|
|I||Input a character and push its code on the stack.|
|N||push a Number onto the stack.|
|S||Subtract the top of stack from the top-but-one.|
|H||Halibut: duplicate or move an element of the stack.|
The individual instructions are described in more detail below.
Pops two numbers off the stack: first b, then a. Performs the integer division of a by b, and pushes first the quotient and then the remainder onto the stack. For example, if first 19 and then 7 were pushed onto the stack, then the dividE instruction would push 2 (the quotient) and then 5 (the remainder).
(Sorry about the horrible mnemonic.)
Pops two numbers off the stack: first addr, then cond. If cond is non-zero, then control is transferred to the first significant character on line number addr of the program. As a special case, transferring to the address zero terminates the program.
This is the only conditional instruction in ETA: it is used to build loops and to call and return from subroutines.
(It can be hard to remember what order the parameters for this instruction go on the stack: the author finds it helpful to remember the word CAT: Condition, Address, Transfer!)
Pushes one more than the current address onto the stack. This is useful for allowing a Transfer to subroutine followed by a subsequent Transfer back to the point immediately after. It can also be used to construct transfer addresses relative to the current address, yielding position-independent code.
Pops the top number off the stack, and outputs the character whose ASCII code is that number. (ASCII is specified so that it's possible to write ETA programs which will run portably on machines using different character sets. This is obviously a pain to implement on non-ASCII machines. But as one of our poets has written, ``Here's a nickel, kid. Get yourself a better computer.'')
Reads a single character and pushes its ASCII code onto the stack. (See the note on ASCII in the previous section.) If the end of the input file has been reached, then -1 is pushed.
This instruction pushes its argument on the stack. It's unique in that it takes its argument from the program text - all the other instructions that need operands pop them from the stack. The argument is made up of the sequence of significant characters following the N, up to but not including the first occurence of E. Those characters are treated as a big-endian sequence of base-seven digits with the following values:
So, for example, the sequence Ntone pushes the value 75 (1x7^2 + 3x7^1 + 5x7^0).
Note that negative numbers can not be expressed in this way: they must be generated by a Subtract instruction.
The characters making up the numeric argument are skipped; they are not also used as instructions, but execution continues from the first significant character after the terminating E character.
(The usual ETAOINSH order of the letters is perturbed for the mapping to digits so as to yield more Es - which are common in English text - and fewer Hs.)
Pops two numbers off the stack: first b, then a. Subtracts b from a and pushes the result onto the stack.
Pops a number, n, off the stack, and modifies the remaining stack by rolling the n'th from the top to the top. So when n is one, it swaps the top two elements; and when n is two, it rolls the top three elements.
If n is negative, its absolute value is used, but the specified element of the stack is copied to the top rather than moved. So when n is -1, a stack A B C is transformed to A B C B. For the purposes of this instruction, zero is conveniently considered negative, so that n=0 performs a DUP rather than a no-op. (There are already 246 ways to do a no-op in ETA.)
I couldn't find a single synonym for ``roll'' that had an ``h'' in it anywhere, hence the horribly un-mnemonic mnemonic. Sorry.
There is no commenting convention as such in ETA: significant character are always significant, no matter what context. The intention is that the whole of a well-written ETA program is a comment.
However, comments may be formed by using words made up entirely of letters other than the eight significant ones. This is something of an art-form in itself. Good words to use in ETA comments include crumbly, fulcrum, luxury, cruddy, buggy, fuzzy, guru, etc.
A related art is to allow significant characters to occur, but only in sequences which perform no-ops. For example, NeS subtracts zero from the top of the stack (so be careful not to use it when the stack is empty); and NeNanythingeT (where anything is a sequence of characters other than e) performs a conditional transfer, but the condition is false, so nothing happens.
(The author can't use his own first name - Mike - in an ETA program, since its significant sequence is ie, which reads a character and divides the top of stack by its value - so that if a NUL character is read, an illegal division by zero will ensue.)
Of course, significant characters may but used in program text with impunity provided that the afected portion of the program text is never executed. For example, anything on a line following the sequence NteAT (transfer control to the next line) will never be executed. This could be considered as a commenting convention; but use of this mechanism is considered poor form.
Some idioms emerge repeatedly. The first batch of these are so common and useful that they can almost be thought of as additional primitives (DUP, SWAP, ADD, JUMP, EXIT, SKIP, REM, DROP.)
|Ne H||Duplicate the top element of the stack.|
|Nte H||Swap the top two elements of the stack.|
|Ne Nnumber S S||Add number to the top of the stack.|
|Nte Naddress T||Unconditional jump to address.|
|Nte Ne T||Successful completion (jump to address zero).|
|A Ne Nte S S T|
A Ne Nnumber S S T
|If non-zero, skip the next line.|
... or skip number lines.
|Nte A T||Disregard the rest of the current line. (This works for any sequence of non-E characters between the initial N and e except for sequences in which the only significant characters are h's.)|
|Ne Nte H T|
|Discard the top element of the stack.|
Special case: Discard top of stack, if known to be 0.
Function call and return can be done in a variety of ways, but the following conventions seem to work quite well:
The thrust of these conventions, especially the first, is to make it easy for the caller. (Where necessary, this done at the expense of the convenience of the callee, since a typical function will be called many times but defined only once.)
|A Nte Naddress T||Call the function at the specified address, assuming that any necessary arguments have already been pushed onto the stack.|
|No function-call prologue needed when there are no arguments.|
|Nte H||Function-call prologue (1 argument).|
|Nae H Nae H||Function-call prologue (2 arguments).|
|Noe H Noe H Noe H||Function-call prologue (3 arguments).|
|Nte Nte H T||Function-call epilogue (no return-value).|
|Nte Nae H T||Function-call epilogue (1 return-value).|
|Nte Noe H T||Function-call epilogue (2 return-values).|
Functions that wish to use many arguments from the stack, or leave many values on the stack (e.g. functions dealing with NUL-terminated strings) will need to make their own arrangements.
Unappealing though it seems, we may be forced to change the assignment of instructions to letters, so as to allow the most common English letters to occur most commonly in program text. With the current encoding, the letter N (the Number instruction) tends to be most common, followed by H (the Halibut instruction), although both of these letters are towards the lower end of the frequency-in-prose spectrum; while conversely, the letter E (the dividE instruction) tends to occur least frequently.
This problem is at least partially addressed by the assignment of base-seven digits and the number terminator to letters, so re-assigning the instructions themselves may prove unnecessary.
ETA programmers might like to tackle the following challenges:
Best of Show: I guess the ultimate tour de force in ETA programming would probably be a suite of functionally equivalent programs structured as limericks in different languages, in which the meaning of the limericks describes the function performed by the code.
Since numeric constants may be broken across lines, extra style points can be earned by using the same code sequence both as a continuation of a numeric constant and (when Transferred to) code.
Successful attempts at any of the challenges may be emailed to the author (address above) for attributed inclusion in future releases of the ETA distribution.
Due to the constraints of time, and the desire to construct real, working ETA programs as quickly as possible, the code below is not particularly good English. The ``Hello, World!'' and pip programs are poetry rather than prose; the recursive factorial program is not padded at all, consisting only of the significant characters.
More elegant renditions of these programs would be very welcome; please email them to the author (address above).
This program prints everyone's favourite message. The interleaving of the Number and Output instructions is obviously pretty arbitrary. I've chosen to avoid the fenceposts: the selected interleaving is neither n × (Number; Output) nor n × Number; n × Output.
No heat: "hello.eta", written by Mike Taylor ** FUNGICIDE ** -- Fungus calendar -- CURTSEY: Fungal toe! Fungal toe! Fungal hoe! (Burnt programmer nucleus) Ooooooo! CRUDDY 2nd TOE: Nine(!) fungal hyaena toe5! Dungy alfalfa, penalty superlunary -- Oh, blubber! Oo! Oooo! OW!
An alternative approach would be push a marker character on to the stack - NUL would be an obvious choice - followed by all the characters of the message, then to loop, Outputting each one (or to call a well-known function to do this):
NEN toe nine NA hyaena toe N <aaa!> Renault sudden Adolph Enid: ugly M$IE4.0 Cygnus agenda: t'send a toe 2 nato. Bend a ghoul entourage ant! Venice T. NT, Be, Next -- a not-budget list of Operating Systems. Nurture 1 hundred hangmen, uterus 1st. s3ntient L1fe-f0rm? J@bb@ $ Hut?! -- On the Niet (Russian)
This program, a re-implementation of the popular CP/M utility program pip, demonstrates the use of the Transfer instruction for conditionals (line 4), looping (line 8) and termination (line 6).
-- Curriculum -- 1. Bruckner PhD (crud) 2. Pungent Debussy (Burgundy theft) 3. Pungent Czech cult (Nutmeg nugget) Ugly, bulky, buzzwordy blurb puncture 3rd juncture -- TRY!
An alternative, and more compact, rendering of the same program:
IN: (ehnen) Tess Nae TnEnTehtntenet ON: (-----) TE-- N-- T-E-T---------
(Not all the features of pip are implemented!)
The following unpadded program defines and recursively calls a factorial function which calculates the product of an integer with all the smaller positive integers: for example, 5 factorial is 120 (5×4×3×2×1):
antenahet antennet anteniiet ntoeontenet nteh neh ntesanentesst ntentset neh ntes antennet antenshet ntenaeht i nehniies anentesst nentetnahet nenteh nsses nteh ntoeantenshet nenteh ss i nehniiesnoset nteniaet nehntoesnitet nteniaet nehnentessnanet nenteht ntenaeht nenaeh neh anentesst sntennnet ntoee nenssess nteh nteninet antenttaet ntenteht naehnaeh ne naeh naeh neh anentesst ntentthet ntes naeh nenenoesh ss naeh naeh ntensiet nteht ntenaeht nteh neh anentesst sntenteht o ntenttaet
Since this is a relatively large program, some annotation is in order:
|line 1||Call the subroutine at line 14 to read a number, in the form of a sequence of decimal digits, from the standard input. That number is placed on the stack.|
|line 2||Call the factorial function itself at line 5, converting the number on top of the stack into its own factorial.|
|line 3||Call the subroutine at line 32 to output the result as a sequence of decimal digits|
|line 4||Output a trailing newline character, and jump to line zero, terminating execution. This is the end of the main program: the remainder of the code consists of the subroutines called in these first four lines, and the subroutines that they in turn call.|
|line 5||The factorial routine itself: the prologue at line 5 rolls the argument n above the return address; line 11 makes the recursive call with argument n-1 if the argument is not 1; line 12 calls the routine at line 42 to multiply n by fact(n-1); and line 13 is the epilogue, rollin the return address back above the result, and transferring control to it.|
|line 14||The subroutine to read a decimal constant from the standard input: lines 14-17 skip any leading sequence of space characters; lines 18 initialises the accumulator; lines 19-24 deal with an incoming digit; lines 25-29 break on a space or newline or at end of file, and otherwise loop; lines 30 and 31 discard the junk character on top of the stack and return the accumulated value to the caller.|
|line 32||The subroutine to write a decimal constant to the standard output: line 32 is the function-call prologue; lines 33-39 calculate and stack each digit in turn, least-significant first; line 40 calls the output-a-string subroutine at line 58 to unstack the digits to standard output most-significant first; line 41 is the function-return epilogue.|
|line 42||The subroutine to multiply two numbers together, which works by the primitive method of adding the value of the top-but-one value on the stack to an accumulator, initialised to zero, the number of times indicated by the top value on the stack. Prologue on lines 42-45, main loop on lines 46-55, and epilogue on line 56 and 57.|
|line 58||The subroutine to write out stacked characters, down to but not including a terminating NUL character. The function-return epilogue happens to be in the middle of the code, on line 61 (why not?)|
What could possibly be clearer? Note how ETA facilitates top-down program design: this program begins by making high-level calls describing its algorithm in abstract terms, and then proceeds to elaborate on how those high-level calls are to be executed.
The following program prints the words to the classic song 99 Bottles of Beer on the Wall:
*** NASTY ENID BLIETON! *** ("bottles.eta", written by Mike Taylor) Nutmeg. Why? Burnt eats are improved by its characteristic odour Nude. Why? Runty default suggests it's not such a HOT idea. vvvvvvvv Brandenburg tess3r@ct [1-6] -- Sun 3/80: turkey? Dung-TECH CRT ^^^^^^^^ >>> O! <<< Nut grenade CRT Drunken quake chum Uncle handymen: cutesy dust fungus sell-out (null turkey @ Dung-TECH turf) -- puny, ugly, rubbery chum Brandenburg tesst 504 468 sbcdnfgtejklmnapqrieutwxyz *b*d*fg**jklm**pq***u**xyz nbcdtfgoejklme*pqr**uvwxyz *bc**f*****lm***qr**uv*x** nbcdefgnsjklmsepqrssuvwxyz *bcd*fg**jk*m**pqr**uvw*yz (@ Dung-TECH ...) @unt1e @untie T (Auntie? And what's with all the Dung-TECH stuff?) [Antena 4 ET] NT *ex -- Dung-TECH turf! Dung-TECH!! [Anten@ 4 (tt) ET] >> nenaaaenahoenahoenahheniienahienatseniie << NT-eat: No optimisation >>> naaoenahoenatoenaaienaaienatsenahheniie <<< NT-eat: No optimisation Ant grenade, truly. @unt1e Dung-TECH turf!!! Dung-TECH!!!! Ant grenad3 set (2 ugly!) >> nenatoenatoentssenaoheniienahoe << NT-eat: No optimisation >>> nahsenaaieniienatnenatseniie <<< NT-eat: No optimisation Ant grenade: truly ugly & cruddy. Nurture Dung-TECH turf!!!! (it needs watering & occasional feeding) fungal chute, NE [hanten] -> "concept" NS [aeoni] -> "video" NE [hanten] -> "asymmetry" NT [*] -> "Romeo" >> nentoenahaenatnenaanenatse << >>> naaaentsseniienaaienatheniie <<< >>>> naaoenaaoentssenaaheniiensae <<<< >>>>> natnenaohenatsenahaeniienahoe <<<<< >>>>>> natnenatseniienahoenataentssentnhe <<<<<< Antena <- ET, Runty Jules; Unruly Czech lantern concept 2nd toe; 2nd toe; Oo! 2nd belch => ninety *** NEWT!! ***
The code needs no explanation.
There exists a reference interpreter for the ETA language, called simply eta, against which all the example programs in this document, and more, have been tested. It is implemented in Perl for maximum portability.
The reference interpreter is invoked as follows:
eta [ -d number ] [ -O ] [ eta-file [ eta-file2 ... ] ]
It executes the ETA program made by concatenating the eta-files named on the command-line, reading the program from standard input if no files are named. The ETA program runs with its standard input and output connected as the interpreters (which means that programs read from standard input can't do themselves do input.)
The options have the following effects:
The debugging output is controlled by the value supplied to the -d option: it is composed of separate bitwise components, described in the following table, which may be ORred together:
|1||Summary of compiled program's size.|
|2||``Done.'' message at end of run.|
|4||Execution trace: before each instruction is executed, a line is emitted of the form char/line: stack inst, where char is the current program counter, measured in significant characters; line is the current line; stack is a space-separated list of the values on the stack, with the most-recently-pushed to the right; and inst is the instruction about to be executed.|
Other ETA implementations exist, including an interpreter written in C, which is about thirty times as fast as the reference interpreter. However, whenever their behaviours differ in an area concerning which this document is ambiguous, the behaviour of the reference implementation is correct by definition. Any such ambiguities should be emailed to the author (address above).
Source filename: eta.html
Last update: Wed Sep 29 11:37:54 1999