Chapter 14: Defining Words

The Basics

Defining words are words that define other words. Here's a simple example in ANS Forth syntax:

: CONST+ ( n -- ) CREATE , DOES> @ + ; OK
3 CONST+ 3+ OK
4 3+ . 7 OK

You probably see the difficulties stongforth's data type system will encounter. In ANS Forth, the runtime code of DOES> leaves the data field address on the stack, which is then used by @ for fetching the constant 3. In StrongForth, the data field address needs to be more specific, because @ needs to know the data type of the item stored at the data field address. But even if this problem was solved by a type cast, the next problem is already waiting for us: + expects two operands on the stack.

The solution is obvious. Since the runtime code of the defining word may be considered as a completely separate word, we may provide it with it's own stack diagram:

CONST-SPACE
 OK
: CONST+ ( INTEGER -- )
  CREATE ,
  DOES> ( INTEGER CONST -> INTEGER -- 1ST )
  @ + ;
 OK
3 CONST+ 3+
 OK
4 3+ .
7  OK

And it works! The only differences to the ANS Forth version are the stack diagrams. There's only one question: Where does the stack diagram of 3+ come from?

LATEST .
3+ ( INTEGER -- 1ST )  OK
3+ obviously has the correct stack diagram, because the data field address with data type CONST -> INTEGER is not visible outside 3+. Again, the solution is pretty obvious. When the defining word CONST+ defines 3+, it simply copies the stack diagram of its own runtime part, except for the last input parameter, which is CONST -> INTEGER in this example. Quite simple, isn't it?

You might wonder why we have to switch to the constant data space at the beginning of the example. The reason is that CONST+ uses , for compiling the constant 3 into the data field of 3+. Since , applies to the current memory space, and a word's data field is always in the constant data space, we have to make the constant data space the current memory space before executing CONST+. If the parameters are accidentally compiled into a different memory space, like for example the code space, the data field remains empty and @ inside 3+ will fetch an undefined value.

The Implementation

The implementation of DOES> is not trivial. Generally, we have to distinguish between three different kinds of words:

The compiling part of the defining word executes the immediate word DOES>, which performs 4 tasks:

Here's the definition of DOES>, plus the memory image of the virtual code generated by it:

: DOES> ( COLON-DEFINITION -- 1ST )
  ?COMPILE CODE-HERE -> DEFINITION [LITERAL]
  POSTPONE (DOES) POSTPONE ;
  SPACE@ CODE-SPACE :NONAME SWAP , CONST-HERE DOES,
  SWAP SPACE! ; IMMEDIATE
LIT
code address parameter
(DOES)
(EXIT)

The value of the code address parameter is calculated from the current code space pointer, because the data structure mentioned in task 4 starts at this location. The first component of this set is a pointer to the noname definition created by :NONAME, which will become the runtime part of the defining word. The second component is the machine code shared by all words defined by the defining word. It is completely compiled by DOES,. DOES, expects the data field address of the runtime part as a parameter, because this address is part of the machine code:

DOES, ( CONST -- )

Here's the memory image of the data structure that is compiled into the code space:


DEFINITION


machine code

We need dedicated machine code for the words created by a defining word, because each defining word has its specific runtime part. The machine code initiates the inner interpreter to execute the runtime part. This requires that a pointer to the data field of the runtime part is compiled into the machine code.

Note that the input parameter of DOES> is an item of data type COLON-DEFINITION that belongs the to compiling part of the defining word. It is consumed by ;. The output parameter of DOES> is also an item of data type COLON-DEFINITION, but this one is created by :NONAME and thus belongs to the runtime part of the defining word.

Just like any other word in the StrongForth dictionary, the memory image of the defined words is distributed over the name space, the constant data space, and the code space. Because the part in the code space is shared among all words defined by one specific defining word, it is compiled only once during compilation of the defining word. While executing the defining word, CREATE generates the name field, the link field, the attribute field, and the token field. The words between CREATE and DOES> compile the data field. What's still missing are the input and output parameter fields, and the code field. These fields are filled by (DOES):

: (DOES) ( CODE -> DEFINITION -- )
  LATEST #PARAMS 0=
  IF SPACE@ NAME-SPACE OVER @ DUP ?HAS-INPUT-PARAMS
     0 ENCLOSE-PARAMS ?CHECK-REFERENCES
     >R NULL STACK-DIAGRAM SWAP 0 PARAM@,
     OVER #PARAMS R> PARAM@, NIP END-DIAGRAM SPACE!
  THEN 1+ CAST CODE LATEST >CODE ! ;

The input parameter of (DOES) is the address of the data structure that has been compiled into the code space. From this structure, (DOES) obtains the definition of the defining word's runtime part, whose parameter list is the model for the defined words. ENCLOSE-PARAMS returns the parameter indexes of the last (compound) input parameter and of the first output parameter. Starting with a null stack diagram, two instances of PARAM@, compile the input parameter list and the output parameter list of the defined word. END-DIAGRAM compiles the attribute field from the stack diagram built by the two instances of PARAM@,. Finally, the pointer to the common machine code is stored in the new definition's code field.

PARAM@, is a simple loop that compiles all input and output parameters between two index positions from an existing definition into a new definition. The index positions refer to the basic data types in the parameter list, starting with 0:

: PARAM@, ( DEFINITION STACK-DIAGRAM UNSIGNED 3RD -- 1ST 2ND )
  ?DO OVER I PARAM@ PARAM, LOOP ;

The index positions are partly provided by ENCLOSE-PARAMS:

: ENCLOSE-PARAMS ( DEFINITION UNSIGNED -- 1ST 2ND 2ND )
  BEGIN OVER OVER +PARAM OVER #PARAMS OVER >
  WHILE OVER OVER PARAM@ DT-INPUT ATTRIBUTE?
  WHILE NIP NIP
  REPEAT
  THEN NIP ;

Starting at a given parameter index, ENCLOSE-PARAMS scans the compound data types in the parameter list of a definition, until the fist output parameter or the end of the parameter list is encountered. In general, the parameter list of a definition consist of n input parameters and m output parameters:

Data Type Index Parameter
0 Input Parameter 1
si(1) Input Parameter 2

...


...

si(1) + ... + si(n-2) Input Parameter n-1
si(1) + ... + si(n-1) Input Parameter n
si(1) + ... + si(n) Output Parameter 1
si(1) + ... + si(n) +
so(1)
Output Parameter 2

...


...

si(1) + ... + si(n) +
so(1) + ... + so(m-1)
Output Parameter m
si(1) + ... + si(n) +
so(1) + ... + so(m)
 

si(k) and so(k) denote the number of basic data types input parameter k or output parameter k consist of. For example, the index position of the first output parameter is equal to the total number of basic data types in the input parameter list. ENCLOSE-PARAMS steps through the parameter list until reaching the first output parameter or, if the definition has no output parameters, until the end of the parameter list. It then returns the index of the last input parameter (si(1) + ... + si(n-1)), and the index of the position after the last input parameter (si(1) + ... + si(n)). +PARAM is used for moving from one parameter to the next, skipping basic data types until the end of a compound data type:

: +PARAM ( DEFINITION UNSIGNED -- 1ST 2ND )
  BEGIN OVER OVER PARAM@ DT-PREFIX ATTRIBUTE?
  WHILE 1+
  REPEAT 1+ ;

The definition of (DOES) contains two words that perform sanity checks on the stack diagram of the runtime part of the defining word. ?HAS-INPUT-PARAMS ensures that the stack diagram contains at least one input parameter. The last input parameter is reserved for the address of the defined word's data field and must always be present. Nevertheless, since this parameter is excluded from the stack diagram of the defined word, the output parameters may not contain any references to it. Something like

 ... DOES> ( LOGICAL CONST -> SIGNED -- 3RD ) ... 

is not allowed, because the resulting stack diagram of the defined words would be invalid:

( LOGICAL -- 3RD ) \ Invalid!

?CHECK-REFERENCES checks, based on the output of ENCLOSE-PARAMS, for such mistakes. The output parameters must not reference any position within the last compound input parameter. Here are the definitions of ?HAS-INPUT-PARAMS and ?CHECK-REFERENCES:

: ?HAS-INPUT-PARAMS ( DEFINITION -- )
  DUP #PARAMS
  IF 0 PARAM@ DT-OUTPUT ATTRIBUTE? ELSE DROP TRUE THEN
  IF -262 THROW THEN ;

: ?CHECK-REFERENCES ( DEFINITION UNSIGNED 2ND -- 1ST 2ND 2ND )
  LOCALS| FIRST-OUT LAST-IN DEF | DEF #PARAMS FIRST-OUT
  ?DO DEF I PARAM@ OFFSET LAST-IN > IF -261 THROW THEN
  LOOP DEF LAST-IN FIRST-OUT ;

Defining Words With Variable Parameters

But we're not yet done with DOES> and (DOES). You may have noticed that there's a conditional clause in the definition of (DOES), whose purpose has not yet been explained. (DOES) compiles the input and output parameters of the defined word only if it doesn't have input and/or output parameters yet. This makes it possible to compile non-constant input and output parameter lists into the new definition, that deviate from the input and output parameter lists of the runtime part of the defining word. What's that good for? Well, think about defining a word like CONSTANT, for example. CONSTANT is a defining word, but the data type of its one and only output parameter is not always the same:

700 CONSTANT INT
 OK
LATEST .
INT ( -- UNSIGNED )  OK
CHAR B CONSTANT LETTER
 OK
LATEST .
LETTER ( -- CHARACTER )  OK

We could define CONSTANT with CREATE ... DOES> as follows:

: CONSTANT ( SINGLE -- )
  CREATE CONST, DOES> ( CONST -> SINGLE -- SINGLE ) @ ;
 OK
700 CONSTANT INT
 OK
LATEST .
INT ( -- SINGLE )  OK
CHAR B CONSTANT LETTER
 OK
LATEST .
LETTER ( -- SINGLE )  OK

Do you see the problem? Of course. This version of CONSTANT always defines words with an output parameter of data type SINGLE. We'd need separate versions of CONSTANT for each possible compound data type! That's not a good solution. So, let's give it another try:

: CONSTANT ( SINGLE -- )
  CREATE CONST, SPACE@ NAME-SPACE NULL STACK-DIAGRAM (CONSTANT)
  END-DIAGRAM SPACE! DOES> ( CONST -> SINGLE -- SINGLE ) @ ;
 OK
700 CONSTANT INT
 OK
LATEST .
INT ( -- UNSIGNED )  OK
CHAR B CONSTANT LETTER
 OK
LATEST .
LETTER ( -- CHARACTER )  OK

Now, that's the way we want to have it. One version of CONSTANT for all data types. The stack diagram of the defined word is compiled between CREATE and DOES>. (DOES) detects that the stack diagram is already present and skips the compilation of the stack diagram from the sample given by the stack diagram of the runtime part of the defining word.

More Examples

The example of CONST+ at the beginning of this chapter is actually a very simple defining word. Another example is PROCREATES, which was already presented in chapter 7. A more unusual application of CREATE ... DOES> is the definition of MARKER:

: MARKER ( -- )
  NULL DATA -> SINGLE DUP 11 + SWAP
  DO I @ CONST, LOOP CREATE
  DOES> ( CONST -> SINGLE -- )
  12 - NULL DATA -> SINGLE 11 MOVE ;

The compiling part of MARKER copies the first 11 cells of the CONST memory area. These 11 cells contain StrongForth's memory space pointers, the current memory space, the word lists, and the content of LATEST. The runtime part simply restores these pointers and system variables. As a result, all memory allocations, including those for new definitions, are being undone.

What's different with respect to other defining words is the fact that MARKER compiles the data field before creating a new word. Strictly speaking, the 11 cells are located outside of the new word instead of within its data field. The data field itself is empty. The reason for doing it this way is that the pointers and system variables shall be saved before they are modified by the creation of the new word. If the new word is being executed later, it restores the pointers and system variables to the state before its own creation, i. e., the new word deletes its own definition. Even the block of 8 cells in the CONST memory area is de-allocated, because the CONST memory space pointer is contained in the first one of the 8 cells.


Dr. Stephan Becher - May 16th, 2007