Chapter 15: EXECUTE

Saving The Data Type System

StrongForth's data type system is static, which means that the stack effect of each word is already known at compile time. For example, the stack effect of #> is ( NUMBER-DOUBLE -- CDATA -> CHARACTER UNSIGNED ). When #> is executed or compiled, the interpreter or compiler knows exactly how to keep track of the respective data type heap. As a consequence, words like ?DUP, whose stack effect depend on runtime values, cannot be implemented in StrongForth. The stack effect has to be unique.

One of those ANS Forth words that do not have a unique stack diagram is EXECUTE. Here's the stack diagram of EXECUTE in ANS Forth:

( i*x xt -- j*x )

EXECUTE has an arbitrary number of unspecified input parameters, except for the last, and an arbitrary number of output parameters. From StrongForth's point of view, the situation is much worse than with ?DUP, because there's almost no information about the stack effect. Should EXECUTE be removed as well? Certainly not! It is one of the most powerful and versatile words in ANS Forth. What can we do?

When EXECUTE is being interpreted, the value of the execution token is known. We could easily define EXECUTE in such a way that it manually takes care of the stack effects caused by the token being executed. Like this, for example:

: EXECUTE ( TOKEN -- )
  ?EXECUTE SEARCH-TOKEN 0= ABORT" INVALID TOKEN"
  FALSE DT>DT (EXECUTE) ;
 OK
10033. 20 .S
UNSIGNED-DOUBLE UNSIGNED  OK
'TOKEN UM/MOD EXECUTE .S . .
UNSIGNED UNSIGNED 501 13  OK

It seems to work fine. (EXECUTE) expects an execution token as input parameter. It does exactly what EXECUTE does in ANS Forth, i. e., it executes a word without taking regard of possible stack effects. However, directly using (EXECUTE) in StrongForth without considering the stack effect of the executed word would corrupt the data type system. That's why EXECUTE additionally performs DT>DT. You might remember that (EXECUTE) was introduced in connection with INTERPRET. Actually, INTERPRET uses exactly the same technique as in this example. For details, please see the explanation of INTERPRET in chapter 11.

But why should anyone use EXECUTE explicitly during interpretation? In the example above, the phrase

'TOKEN UM/MOD EXECUTE

could simply be replaced by

UM/MOD

because the interpreter takes care of calculating the token, applying the stack effect, and performing (EXECUTE). EXECUTE has its real value during compilation. And during compilation, the solution presented in the above example doesn't work. Because the runtime value of an execution token is generally not known at compile time, the stack diagram can't be determined. What is required is a version of EXECUTE that has the runtime semantics of (EXECUTE) and takes care of the stack effects already at compile time. So, how can we tell the compiler what the stack effect of executing a yet unknown token is?

Obviously, there can be different kinds of tokens, each of which is associated with a specific stack effect. For example, if the token is the one of 1+ ( INTEGER -- 1ST ) or CELLS ( INTEGER -- 1ST ) or 2* ( INTEGER -- 1ST ), the stack effect of EXECUTE applied to it is ( INTEGER TOKEN -- 1ST ). If the token is the one of #>, the stack effect of EXECUTE is ( NUMBER-DOUBLE TOKEN -- CDATA -> CHARACTER UNSIGNED ). In order to select the correct stack effect for EXECUTE, all the compiler needs to know is which type of token is actually on the stack. And that's exactly how StrongForth solves the problem. For each kind of token, it defines a subtype of data type TOKEN and an overloaded version of EXECUTE that can be applied to it. For example, in order to execute tokens of words with stack diagram ( INTEGER -- 1ST ), we have to define a subtype of data type TOKEN, which might be called TOKEN(I--1), and an overloaded version of EXECUTE:

EXECUTE ( INTEGER TOKEN(I--1) -- 1ST )

A subtype of data type TOKEN, which is associated with a specific stack effect, and for which an overloaded version of EXECUTE exists, is called a qualified token. Obviously, the number of possible qualified tokens is pretty high. Predefining all qualified tokens for all words in the StrongForth dictionary is not practical, and predefining all qualified tokens for user defined words, including those containing user defined data types, is simply impossible. Qualified tokens need to be defined on demand, using the word )PROCREATES. Here's an example of how )PROCREATES is to be used:

( INTEGER -- 1ST )PROCREATES TOKEN(I--1)
 OK
LATEST PREV .
TOKEN(I--1) ( STACK-DIAGRAM -- 1ST )  OK
LATEST .
EXECUTE ( INTEGER TOKEN(I--1) -- 1ST )  OK
18 ( INTEGER -- 1ST )' 1+ >CODE CAST TOKEN(I--1) .S
UNSIGNED TOKEN(I--1)  OK
EXECUTE .S .
UNSIGNED 19  OK

)PROCREATES defines two words, a new subtype of TOKEN and an overloaded version of EXECUTE that can be applied to it. The phrase ( INTEGER -- 1ST )' 1+ >CODE CAST TOKEN(I--1) calculates the qualified token of the word 1+. This is the definition of )PROCREATES:

: )PROCREATES ( MEMORY-SPACE FLAG STACK-DIAGRAM -- )
  <DIAGRAM DUP ENCLOSE-DIAGRAM
  [DT] TOKEN PROCREATES LATEST >R
  " ' (EXECUTE) >CODE (CREATE) EXECUTE" EVALUATE
  NAME-SPACE OVER OVER
  ?DO I @ DT-OUTPUT ATTRIBUTE? IF LEAVE THEN 1+ I @ , LOOP
  ROT R> ?DATA-TYPE DT-INPUT OR PARAM, ROT ROT
  ?DO I @ , LOOP
  END-DIAGRAM DIAGRAM> END-CODE ;

)PROCREATES terminates the specification of the stack diagram that the qualified token it creates shall be associated with. It creates a subtype of TOKEN with the name supplied by the input source, and an overloaded version of EXECUTE. The stack diagram of the new version of EXECUTE is composed from the input parameters of the stack diagram provided to )PROCREATES, the just created qualified token as additional input parameter, and the output parameters of the given stack diagram. The execution semantics of EXECUTE is identical to the one of (EXECUTE).

At compile time, the qualified token determines the specific overloaded version of EXECUTE that is to be compiled. Compiling this version of EXECUTE automatically keeps track of the stack effect of the executed qualified token.

Qualified Tokens

In the example of the previous section, we used CAST to convert the token of 1+ into a qualified token:

( INTEGER -- 1ST )' 1+ >CODE CAST TOKEN(I--1)

This works perfectly well. But it might create a feeling of uneasiness. What if you make a mistake by chosing the wrong qualified token? What if you, for example, mistakenly cast the token of + ( INTEGER INTEGER -- 1ST ) to TOKEN(I--1)? Nothing ... unless you execute the token of + with the wrong overloaded version of EXECUTE. Executing + will consume one more cell from data stack than the compiler expected it would. As a result, the data type system will become corrupted, because the data type heap and the data stack will get out of sync. Anything can happen, from wrong data up to a system crash. And the danger is real. For example, let's assume we just had forgotten that StrongForth provides multiple overloaded versions of 1+, and had simply written

' 1+ >CODE CAST TOKEN(I--1)

We had chosen the wrong version of 1+:

' 1+ .
1+ ( INTEGER-DOUBLE -- 1ST )  OK

This is actually a very subtle case, because you won't find out immediately that you did something wrong. Only in very few cases, errors will show:

' 1+ >CODE CAST TOKEN(I--1) CONSTANT DANGER!
 OK
DANGER! SEARCH-TOKEN DROP .
1+ ( INTEGER-DOUBLE -- 1ST )  OK
18 DANGER! EXECUTE .
19  OK
-5 +7 DANGER! EXECUTE . .
8 -5  OK
-5 -1 DANGER! EXECUTE . .
0 -4  OK

The value -5 shouldn't be affected at all by executing 1+. But actualy, it is affected if the value on top of the stack is -1. The reason is, that incrementing -1 produces a carry, which is added to the next cell on the stack, because 1+ assumes dealing with a double-cell value. Such kinds of bugs can be extremely difficult to find!

Therefore, using >CODE (or 'TOKEN) together with a simple type cast for creating qualified tokens is not sufficient. To ensure that a suitable word is selected, it is recommended using the dedicated word ?TOKEN:

DT TOKEN(I--1) ?TOKEN 1+ CAST TOKEN(I--1)
 OK

?TOKEN expects a qualified token on the stack. It parses the input source for the name of a word and tries to find a word with this name in the dictionary whose stack diagram matches the qualified token. Let's see if it worked all right:

SEARCH-TOKEN DROP .
1+ ( CFAR-ADDRESS -- 1ST )  OK

Oops ... that was not what we expected! We wanted to get 1+ ( INTEGER -- 1ST ). Does this mean ?TOKEN made a mistake? No, of course not. SEARCH-TOKEN is playing tricks with us here. Since 1+ ( CFAR-ADDRESS -- 1ST ) and 1+ ( INTEGER -- 1ST ) share the same token, SEARCH-TOKEN is not able to distingish between those two overloaded versions, and just returns the first one it finds:

DT TOKEN(I--1) ?TOKEN 1+ .
508  OK
( INTEGER -- 1ST )' 1+ >CODE .
508  OK

So everything is fine. ?TOKEN actually contains quite a number of security measures in order to prevent the creation of qualified tokens if it doesn't find a suitable word:

NULL DATA-TYPE ?TOKEN ? is no subtype of TOKEN
TOKEN
DT TOKEN(I--1) ?TOKEN 1- DROP
 OK
DT TOKEN(I--1) ?TOKEN +

DT TOKEN(I--1) ?TOKEN + ? data types not congruent
TOKEN

Now, here's the definition of ?TOKEN:

: ?TOKEN ( DATA-TYPE -- TOKEN )
  DUP ?SUBTOKEN DUP ?DEFINITION
  NEXT DUP ?IS-EXECUTE
  STATE @ >R ] FREEZE >R DTP! DUP #PARAMS ALL-PARAMS>DT
  SWAP ?SAME-DATA-TYPE
  SPLIT DROP PARSE-WORD ROT MATCH SEARCH-ALL
  DTP| R> THAW DROP R> STATE !
  IF >CODE CAST TOKEN
  ELSE DROP -258 THROW NULL TOKEN
  THEN ;

?TOKEN first checks whether the given data type really is a qualified token. ?SUBTOKEN throws an exception if a data type is not a direct subtype of TOKEN:

: ?SUBTOKEN ( DATA-TYPE -- )
  PARENT [DT] TOKEN <> IF -265 THROW THEN ;

But this test is not sufficient, because you can always create subtypes of TOKEN which are not necessarily qualified tokens:

DT TOKEN(I--1) ?SUBTOKEN
 OK
DT INTEGER-DOUBLE ?SUBTOKEN

DT INTEGER-DOUBLE ?SUBTOKEN ? is no subtype of TOKEN

DT TOKEN PROCREATES CHEAT!
 OK
DT CHEAT! ?SUBTOKEN
 OK

Therefore, another test is performed. In the dictionary, the definition that identifies the qualified token must be succeeded by an overloaded version of EXECUTE. ?IS-EXECUTE throws an exception if the name of a given definition is not equal to EXECUTE:

: ?IS-EXECUTE ( DEFINITION -- )
  " EXECUTE" TRANSIENT ROT NAME COMPARE IF -265 THROW THEN ;

Of course, it is still possible to cheat ?TOKEN, but the probability of this happening by accident is pretty low. ?TOKEN is now sufficiently convinced that it really deals with a qualified token. The final thing to do is finding a word with a given name in the dictionary, whose stack diagram matches the one of the overloaded version of EXECUTE, except for the last input parameter (which is the token to be executed), but including the output parameters. ?TOKEN uses the compiler data type heap for storing a temporary copy of the input parameters. But what if the compiler data type heap is already in use? If ?TOKEN is executed during the compilation of a word, it is necessary to save and restore the contents of the compiler data type heap. For this purpose, the compiler data type heap is frozen. Since the compiler data type heap might be locked, it has to be unlocked before it can be used.

The temporary copy of EXECUTE's input parameters on the compiler data type heap includes the qualified token. This has to be removed before searching the dictionary for a suitable word. ?SAME-DATA-TYPE pops the top of the compiler data type heap and asserts that this data type is identical to the qualified token. This is another confirmation that the data type that is provided to ?TOKEN really is a qualified token:

: ?SAME-DATA-TYPE ( DATA-TYPE -- )
  DT> ROT ROT <> OR IF -265 THROW THEN ;

The dictionary seach is executed by SEARCH-ALL. With MATCH specifying the additional search criteria, SEARCH-ALL tries to find a word in the dictionary

More details on the matching rules are outlined in the next section. The pointer to the word that delivers the output parameter list is provided to SEARCH-ALL as the low word of the definition of EXECUTE.

After SEARCH-ALL returns, the compiler data type heap is being locked, and THAW restores its original contents. The output parameter of THAW can be ignored, because ?TOKEN uses FREEZE and THAW only for saving and restoring the compiler data type heap. If the search was successful, ?TOKEN returns the token of the found definition. Otherwise, ?TOKEN throws an exception and returns a null token.

Using ?TOKEN, qualified tokens can be obtained in a save way. The example from the previous section can actually be modified as follows:

( INTEGER -- 1ST )PROCREATES TOKEN(I--1)
 OK
18 DT TOKEN(I--1) ?TOKEN 1+ CAST TOKEN(I--1) .S
UNSIGNED TOKEN(I--1)  OK
EXECUTE .S .
UNSIGNED 19  OK

And finally, here's the proof that ?TOKEN really selects the correct overloaded version of 1+:

DT TOKEN(I--1) ?TOKEN 1+ SEARCH-TOKEN DROP .
1+ ( CFAR-ADDRESS -- 1ST )  OK

Some Details On Matching

The general matching rules of SEARCH-ALL in combination with MATCH are documented in chapter 8. Matching a word with a qualified token goes one step further, because the output parameters have to be compared as well. But let's start with the input parameters. The input parameter list of a word that is to be converted into a qualified token are matched against the input parameter list of the corresponding version of EXECUTE (except for it's last input parameter, which is the token itself) instead of the contents of the interpreter or compiler data type heap. Otherwise, the matching rules are identical. ALL-PARAMS>DT copies the input parameter list of EXECUTE to the compiler data type heap, resolving all type data references on the way. For example, in the following case,

( CDATA -> CHARACTER 2ND 1ST UNSIGNED -- 1ST 5 TH )PROCREATES QTOKEN
 OK
LATEST .
EXECUTE ( CDATA -> CHARACTER 2ND 1ST UNSIGNED QTOKEN -- 1ST 5 TH )  OK

ALL-PARAMS>DT copies

CDATA -> CHARACTER CHARACTER CDATA -> CHARACTER UNSIGNED QTOKEN

onto the compiler data type heap. ?TOKEN then removes QTOKEN and executes SEARCH-ALL with MATCH as the additional search criterion. SEARCH-ALL now tries to find a word with the given name and an input parameter list that matches

CDATA -> CHARACTER CHARACTER CDATA -> CHARACTER UNSIGNED

Let's assume SEARCH-ALL really finds a word with the given name and matching input parameter list. In order for the data type system to remain consistent, the output parameters of this word must match the output parameters of EXECUTE. In our example, the output parameters of EXECUTE, after resolving the data type references, are

CDATA -> CHARACTER UNSIGNED

The word's output parameter list has to be identical to this list. Anything else, like

CDATA UNSIGNED
CADDRESS -> CHARACTER UNSIGNED
CDATA -> CHARACTER INTEGER
CDATA CHARACTER UNSIGNED

will not match. This means, that the matching rules for the output parameters are much stricter than the matching rules for the input parameters, where a data type in the word's input parameter list generally matches any subtype on the data type heap.

If the output parameter list of the word contains data type references, these are resolved based on EXECUTE's input paremter list, i. e.,

CDATA -> CHARACTER CHARACTER CDATA -> CHARACTER UNSIGNED

For example, the output parameter lists of the following words will be resolved to the correct list of data types,

MATCH1 ( CDATA -> CHARACTER 2ND 1ST UNSIGNED -- 1ST 5 TH )
MATCH2 ( CADDRESS -> INTEGER 2ND 1ST INTEGER -- 1ST UNSIGNED )
MATCH3 ( CDATA CHARACTER 1ST SINGLE -- CDATA -> 2ND 4 TH )

while those words will not:

NOMATCH1 ( CDATA -> CHARACTER 2ND 1ST UNSIGNED -- 1ST INTEGER )
NOMATCH2 ( CADDRESS -> INTEGER 2ND 1ST INTEGER -- 1ST 2ND )
NOMATCH3 ( CDATA CHARACTER 1ST SINGLE -- CDATA -> 4 TH UNSIGNED )

The most interesting one of these examples is probably MATCH3. 2ND references CHARACTER and 4 TH references SINGLE. However, during the input parameter match, CHARACTER is associated with the third basic data type on the data type heap,

CDATA -> CHARACTER CHARACTER CDATA -> CHARACTER UNSIGNED

and SINGLE is assiciated with the last one:

CDATA -> CHARACTER CHARACTER CDATA -> CHARACTER UNSIGNED

As a result, the data type references resolve to

CDATA -> CHARACTER UNSIGNED

and that is exactly what is needed for a successful output parameter match.

But there's still something left you should know about the matching rules of ?TOKEN. The input parameters of the word must match the input parameters of EXECUTE, after resolving data type references and removing the qualified token. Now, what if the word has less input parameters than required? Consider the following example:

: MUTILATED ( CHARACTER CDATA -> 1ST UNSIGNED -- 4 TH )
  NIP NIP ;
 OK
DT QTOKEN ?TOKEN MUTILATED
 OK

It works! And why shouldn't it? Executing MUTILATED while the data type heap contains

CDATA -> CHARACTER CHARACTER CDATA -> CHARACTER UNSIGNED
will result in
CDATA -> CHARACTER UNSIGNED
on the data type heap, which exactly match EXECUTE's output parameters. Actually, if the input parameter match does not consume all of EXECUTE's input parameters, the remaining input parameters simply stay on the data type heap and become output parameters. Of course, this only works under the condition that EXECUTE's output parameter list start with the same data types as its input parameter list. For example, the following qualified token won't allow any mutilated word to match:

( INTEGER FLAG -- SIGNED )PROCREATES TOKEN(IF--S)
 OK
WORDS SIGN
SIGN ( FLAG -- )
 OK
DT TOKEN(IF--S) ?TOKEN SIGN

DT TOKEN(IF--S) ?TOKEN SIGN ? data types not congruent
TOKEN

Example: A Simple Jump Table

At the end of this chapter, we'll have a look at a small example that demonstrates how a simple jump table can be implemented in StrongForth. Here it is:

999 LIST
 0 \ EXAMPLE: JUMP TABLE
 1
 2 ( UNSIGNED 1ST -- 1ST )PROCREATES TOKEN(U1--1)
 3
 4 DT TOKEN(U1--1) ?TOKEN + CAST TOKEN(U1--1) VARIABLE JUMP-TABLE
 5 DT TOKEN(U1--1) ?TOKEN - DATA-SPACE ,
 6 DT TOKEN(U1--1) ?TOKEN *            ,
 7 DT TOKEN(U1--1) ?TOKEN /            ,
 8 DT TOKEN(U1--1) ?TOKEN MOD          ,
 9 DT TOKEN(U1--1) ?TOKEN MIN          ,
10 DT TOKEN(U1--1) ?TOKEN MAX          ,
11 DT TOKEN(U1--1) ?TOKEN DROP         ,
12
13 : TEST ( UNSIGNED 1ST -- )
14   8 0 DO OVER OVER JUMP-TABLE I + @ EXECUTE . LOOP DROP DROP ;
15
 OK
999 LOAD
 OK
16 7 TEST
23 9 112 2 2 7 16 16  OK

The jump table contains tokens of various arithmetic operations to be performed on two items of data type UNSIGNED, and returning a result of the same data type. TEST is just a loop that traverses the jump table, executing all qualified tokens with the same parameters. You can view what is stored in the jump table, for example:

JUMP-TABLE 2 + @ SEARCH-TOKEN . .
-1 * ( INTEGER UNSIGNED -- 1ST )  OK

This version of * is more general than required in order to match the qualified token, because it allows an item of data type INTEGER as its first input parameter. But of course, it still matches if the first input parameter is of data type UNSIGNED. For MIN and MAX, ?TOKEN apparently found the wrong version:

JUMP-TABLE 5 + @ SEARCH-TOKEN . .
-1 MIN ( ADDRESS 1ST -- 1ST )  OK

Again, this is just a trick played by SEARCH-TOKEN. Because MIN ( ADDRESS 1ST -- 1ST ) and MIN ( INTEGER 1ST -- 1ST ) share the same token, SEARCH-TOKEN is unable to distinguish these two words.

The last entry in the jump table has a mutilated input paraemter list:

JUMP-TABLE 7 + @ SEARCH-TOKEN . .
-1 DROP ( SINGLE -- )  OK

However, its token can still be converted into TOKEN(U1--1), because the first input parameter of EXECUTE takes the place of DROP's missing output parameter. Applying DROP to UNSIGNED UNSIGNED yields UNSIGNED, so its resulting stack effect is identical to that of the arithmetic operations, and satisfies the requirements of the qualified token.


Dr. Stephan Becher - June 6th, 2007