Chapter 8: The Dictionary

A Definition's Memory Image

Just like in ANS Forth, the StrongForth dictionary is a list of definitions. Each definition consists of several components, e. g., the definition's name and attributes, parameters and a pointer to the next definition. In StrongForth, a definition additionally contains its stack diagram. Otherwise, the interpreter and the compiler wouldn't be able to perform type checking and to distinguish overloaded words.

As you might remember from chapter 4, StrongForth provides several distinct memory spaces:

name space
code space
data space
constant data space
local data space

Now, the data that constitutes a definition is actually distributed over three memory spaces, using pointers to link the various components. The name space contains everything that is required to compile a definition. Virtual machine code is stored in the constant data space and real machine code is stored in the code space. While each definition has a unique set of data in the name space, several definitions may share the same virtual machine code, and different blocks of virtual machine code might share the same real machine code. For example, all colon definitions use the same machine code to interpret their virtual machine code.

Let's begin with the those components of a definition that are stored in the name space:

name field

link field

attribute field

token field

input parameter field

output parameter field

Unless a definition is not explicitly defined by :NONAME, it has a name and a pointer to the previous definition in the dictionary. The name is just a counted string, i. e., a byte indicating the length, followed by the characters of the string. The link field contains an address that points to the name field of the previous definition within the same word list that has a name field. The link field of the first definition in a word list contains a null pointer.

An item of data type DEFINITION may be casted to an item of data type FAR-ADDRESS to obtain a pointer to the attribute field of the definition. It's not a pointer to the name field, because there are some definitions without a name field and a link field. The attribute field is always present. It has the size of a single cell and consists of various bit fields whose contents provides additional information about the definition:

Unused

Size

Unused:
I:
U:
N:
Size:

reserved for implementation defined attributes
immediate attribute
reserved
noname attribute
size of the stack diagram

For normal words like DUP and 1+, the immediate attribute is 1. If it is 0, the word is immediately executed even in compilation mode. Typical words of this category are IF and [CHAR]. Note that the semantics of the immediate attribute is inverted with respect to the expectations, i. e., words which are immediately executed in compilation state have a 0 in this bit field. This should not present a common cause of error, because the immediate attribute is usually set and queried by special words with an unambiguous name.

The noname attribute marks all words that have no name field and no link field. Those are the words defined by :NANAME.

Finally, the size attribute is the size of the definition's stack diagram. It contains the number of basic data types that constitute the stack diagram. For example, in

DROP ( SINGLE -- )

the size attribute is 1. The definition

! ( SINGLE DATA -> 1ST -- )

has a stack diagram consisting of 3 basic data types. It's size attribute is 3.

The next field in the name space is the token field. It contains a pointer of data type CONST to the definition's code field in the constant data space. This pointer is the execution token of the definition. Whenever a word is compiled as part of a new colon definition, it's token is added to the virtual machine code. Colon definitions are actually stored as lists of tokens.

After the token field, the stack diagram of the definition is stored in the name space. The stack diagram is just a list of item's of data type DATA-TYPE as described in chapter 7. It is divided into two sections, the input parameter field and the output parameter field. Since a definition might have no input parameters, no output parameters, or no parameters at all, one of these sections or the complete stack diagram might be empty. If a definition has no stack diagram, its size attribute is 0.

Those parts of the dictionary, which are stored in the name space, are only required by the interpreter and the compiler. During execution of a word, all these data are irrelevant. In an embedded system that just executes already compiled code and does not compile or interpret any StrongForth source code, it's often possible to completely remove the name space. Usually, this will save a lot of memory.

As mentioned before, a definition's token field contains a pointer to a part of the definition that is located in the constant data space:

code field

data field

This part consists of the code field and the variable-length data field. The code field contains a pointer of data type CODE, which is the address of the definition's machine code. Whenever the definition is executed, this machine code is actually executed. Pure machine code words like SWAP or + do not need anything else to execute. They do not have a data field. On the other hand, colon definitions or definitions generated by other defining words like VARIABLE and PROCREATES need some more information, because the definitions from each of these categories share the same machine code. This additional information is stored in the data field, and is accessed by the machine code. For example, the machine code of colon definitions is an interpreter for the virtual machine code contained in the data field. The data field of a definition created by VARIABLE contains a pointer to the location of the variable's value within the data space. The variable's value is not located directly in the data field, because the data field is in the constant data space, which is assumed to be read-only during runtime.

Finally, the machine code of a definition is stored in code space:

machine code

As a short summary, each definition has an attribute and a token field in the name space. Optionally, the name space might contain a name and link field and, if the definition has a stack diagram, input and/or output parameter fields. The token field contains a pointer to the code and data fields, which are located in the constant data space. Different definitions may share the same code and data fields. The code field contains a pointer to the definition's machine code. The same machine code can be used by multiple definitions.

Word Lists

The link field connects a definition with the previous one within the same word list. A word list is identified by an item of data type WID:

DT SINGLE PROCREATES WID

But, given the fact that the Search-Order word set is optional, why does StrongForth bother dealing with word lists at this point? Actually, even without the Search-Order word set being loaded, StrongForth already has three different word lists:

FORTH-WORDLIST ( -- WID )
LOCAL-WORDLIST ( -- WID )
ENVIRONMENT-WORDLIST ( -- WID )

The Forth word list is the usual dictionary, which contains all words that constitute StrongForth's word sets. This is the default word set for defining new words and for searching the dictionary. The Local word list is temporarily created during compilation of a new word. It contains only locals and other temporary definitions that behave like locals. This word list uses the local name space instead of the name space, because the local name space is located in the DATA memory area. You'll learn more about the additional local word set in chapter 12. Finally, the Environment word set contains a definition for each environmental query string. Using a word list as a data base for environmental query strings simplifies the implementation of ENVIRONMENT? and the definition of new strings.

Without the Search-Order word set, only few words deal with word sets. GET-CURRENT is a constant that returns the identifier of the Forth word list. This means that all new definitions are being linked to the Forth word list.

FORTH-WORDLIST CONSTANT GET-CURRENT

GET-CURRENT is used by CREATE to obtain the word list in which a new definition is created. Since this is almost always the FORTH-WORDLIST word list, GET-CURRENT can be a constant. Almost? Yes, there are cases where a definition is created in a different word list, even without the Search-Order word set. Whenever a new environment query string needs to be defined, for example as part of loading a new word set as a library, the ENVIRONMENT-WORDLIST word list is to be used. That's why SET-CURRENT from the ANS Forth Search-Order word set is included in StrongForth's core library:

: SET-CURRENT ( WID -- )
  [ ' GET-CURRENT >BODY -> WID ] LITERAL ! ;

SET-CURRENT changes the value of the constant GET-CURRENT. But why not simply make GET-CURRENT a VALUE instead of a CONSTANT? A VALUE basically has two drawbacks:

The value is stored in one cell of the DATA memory area, which is rather precious in an embedded system with restricted RAM size.
The value needs to be explicitly initialized within QUIT or ABORT.

The value of a constant, on the other hand, is stored in the CONST memory area, which doesn't need to be initialized. And since this value will only be changed during compilation and never at application runtime, implementing GET-CURRENT as a constant is not a problem. With SET-CURRENT, environment query strings can easily be defined like this:

ENVIRONMENT-WORDLIST SET-CURRENT
TRUE CONSTANT FACILITY
TRUE CONSTANT FACILITY-EXT
FORTH-WORDLIST SET-CURRENT

Each word list is associated with a system variable that contains the address of the name field of its latest definition. More precisely, the variable contains just the address offset without the segment. The segment does not need to be stored in a variable, because it is the same for all words in a specific word list. LINK returns the address of the system variable that belongs to a given word list:

Finally, SEARCH searches a given word list for a word with a given name and (optionally) with additional search criteria, like a stack diagram match or a specific code field. Two parameters, SINGLE and CODE, specify the additional search criteria. Specifying the very general data type SINGLE ensures that SEARCH accepts different kinds of parameters to be used, as long as they fit into a single-cell item. SEARCH returns the definition that matches all search criteria, and a signed number indicating whether a definition has been found and whether the definition is immediate or not.

SEARCH ( CDATA -> CHARACTER UNSIGNED SINGLE CODE WID -- DEFINITION SIGNED )

Note that SEARCH is overloaded with an ANS Forth word for searching substrings within strings. However, because of the different stack diagrams interpreter and compiler will never fail to cleanly distinguish between searching word lists and searching string. A more detailed description of SEARCH will be given later in this chapter.

Creating a Definition

Most components of a new definition are being created by (CREATE):

: (CREATE) ( CONST -- )
  SPACE@ NAME-SPACE HERE
  PARSE-WORD ", ALIGN \ name field \
  , \ link field \ LATEST! 
  [ 7 BIT ] LITERAL , \ attribute field \
  SWAP , \ token field \ SPACE! ;

(CREATE) expects an address of data type CONST as input parameter. This address becomes the token of the new definition. Since (CREATE) compiles stuff into the name space, it has to save the current memory space on entry and restore it at the end. SPACE@ and SPACE! were already presented in chapter 4. (CREATE) uses PARSE-WORD ", to parse the definition's name and compile it into the name field. ALIGN is required because ", leaves the memory space pointer unaligned. The link field temporarily receives a pointer to the name field of the same definition. Linking the new definition into the compilation word list, i. e., entering a pointer to the name field of the previous definition into the link field and making the definition the latest one of the compilation word list, will be done later when the definition is finished. This technique prevents unfinished words to be found in the dictionary. An explicit smudge bit, like in other Forth implementations, is not required. LATEST! updates the pointer to the latest definition, This word will be described below in more detail.

The initialization value of the attribute field consists only of the immediate attribute. This means, that the new definition is a non-immediate word by default. The size of the stack diagram cannot yet be determined at this point.

(CREATE) has a sibling called (CREATE-NONAME). (CREATE-NONAME) creates a definition with no name and no link field. Consequently, it sets the noname attribute in the attribute field. Otherwise its definition is identical to the definition of (CREATE):

: (CREATE-NONAME) ( CONST -- )
  SPACE@ NAME-SPACE LATEST!
  [ 7 BIT 5 BIT OR ] LITERAL , \ attribute field \
  SWAP , \ token field \ SPACE! ;

StrongForth provides a variable that contains a pointer to the recently created definition. LATEST is defined as a value containing the latest definition:

NULL DEFINITION VALUE LATEST

Having an easy access to the currently compiled definition is often very useful. As mentioned before, an item of data type DEFINITION contains the full address of a definition's attribute field. Now, after (CREATE) has compiled the name and link fields, the next available address of the name space is the attribute field address. Casting this address to an item of data type DEFINITION yields the definition itself. LATEST! does nothing else but using this address to update LATEST:

: LATEST! ( -- )
  FAR-HERE CAST DEFINITION TO LATEST ;

Why do we need two separate words, LATEST and LINK, to get access to the latest definition? Well, there are several differences beween those two words. LINK returns the address of a pointer to the name field of the definition that was most recently linked to a specific word list. LATEST, on the other hand, returns the most recent definition across all word lists as an item of data type DEFINITION. It is important to note that LATEST may also return a definition with no name and link fields, whereas LINK can never return a pointer to such a definition.

Now let's have a look at a couple of typical defining words. A very simple defining word is CODE:

: CODE ( -- )
  ?EXECUTE CONST-HERE (CREATE) CODE-HERE CONST, ;

Note that CODE is overloaded with both a defining word and a data type identifier. As expected, CODE uses (CREATE) to create a new definition in the dictionary. It compiles a pointer to the first unused location of the code space into the code field. This is the start address of the definition's machine code.

Now, what's still missing in order to make a complete definition? Well, it's the stack diagram and the machine code. Pure machine code words do not have a data field. Chapter 7 already explained how a stack diagram is generated. Starting with the null stack diagram, the input and output parameters are collected and finally compiled into the name space. Remember that the last cell (CREATE) compiles into the name space is the token field, so the next available address in the name space actually is the address of the input parameter field. If no stack diagram is specified for the new definition, nothing more is compiled into the name space, and the null stack diagram remains unchanged. Here's a simple example of creating a machine code word:

CODE DROP

Now, the stack diagram has to be specified:

( SINGLE -- )

Next is the compilation of machine code words:

AX POP, NEXT,

AX POP, compiles machine code to pop one cell off the data stack. This is the sole semantics of DROP. NEXT, is a macro that compiles StrongForth's inner interpreter. Almost all machine code words end with this macro, just like machine code subroutines must end with a return instruction.

Finally, END-CODE links the code definition into the current compilation word list:

END-CODE

Remember that (CREATE) initializes the link field of a new definition with a pointer to its own name field instead of with a pointer to the name field of the previous definition. The final link is not established before the new definition is complete. This technique avoids that unfinished definition can be found in the dictionary. This is the definition of END-CODE:

: END-CODE ( -- )
  LATEST NONAME? INVERT
  IF LATEST CAST FAR-ADDRESS -> ADDRESS 1- \ LFA \ >R
     GET-CURRENT LINK DUP @ R@ @ ROT ! R> ! ;
  THEN ;

Applying END-CODE to a definition with no name and link fields obviously makes no sense. If a link field is present, which is true if the definition is created by CODE, its address can be determined by moving one cell back from the address of the attribute field, After the link field address has been canculated, END-CODE simply swaps the contents of the link field with the contents of the system variable that contains the pointer to the latest definition of the current compilation word list. As the result, the link field receives the pointer to the previous definition, and the system variable receives the pointer to the (new) latest definition.

Since CODE used (CREATE) to create a new definition in the dictionary, you might wonder what happened to the ANS Forth word CREATE. So here's the second defining word:

: CREATE ( -- )
  ?EXECUTE CONST-HERE (CREATE) ['CODE] FORTH-WORDLIST CONST,
  END-CODE ;

The definition of CREATE looks quite similar to the definition of CODE. The main difference is that all words defined by CREATE share a common runtime machine code. The code field is the same as the one of FORTH-WORDLIST. Another difference is that CREATE executes END-CODE by default, because the definition is actually complete. A code definition, on the other hand, is not complete before the machne code has been compiled.

Words created by CREATE just return the address of their data field. Since CREATE does not compile the new word's stack diagram, you have to compile it explicitly, like in the following example. Failing to provide the proper stack diagram will result in StrongForth's data type system becoming corrupted the first time the new word is being executed. Note that CONST-SPACE is required because the data field is allocated in the constant data space.

CONST-SPACE CREATE DAYS-PER-MONTH ( -- CCONST -> UNSIGNED )
 OK
31 C, 28 C, 31 C, 30 C, 31 C, 30 C, 31 C, 31 C, 30 C, 31 C, 30 C, 31 C,
 OK
DAYS-PER-MONTH 1+ @ .  \ February
28  OK
DAYS-PER-MONTH 8 + @ . \ September
30  OK

The Abstract Definition

Given an item of data type DEFINITION, how can you access the various components, i. e. obtain the information that is stored in them? Let's start with the name field. StrongForth offers the word NAME to access the name of a definition:

: NAME ( DEFINITION -- CFAR-ADDRESS -> CHARACTER UNSIGNED )
  ?NONAME CAST CFAR-ADDRESS -> CHARACTER 4 -
  BEGIN DUP @ BL >
  WHILE 1-
  REPEAT DUP 1+ SWAP @ CAST UNSIGNED ;

NAME throws an exception if the definition does not have a name field. Since an item of data type DEFINITION actually is the address of the attribute field, NAME starts at this address and traverses back through the characters of the name string until it finds the length byte. The length byte is clearly distinguished from any character, because it's value is always less than 32, which is the ASCII code of a space character. The first address where the length byte might be is 4 bytes below the attribute field. The address of the length byte is the name field address. This address is then converted into the address of the first character and the string length, which constitute the output parameters of NAME.

The link field contains a pointer to the name field of the previous definition. StrongForth provides the word PREV to access the previous definition in the dictionary:

: PREV ( DEFINITION -- 1ST )
  ?NONAME CAST FAR-ADDRESS -> ADDRESS DUP 1- @ DUP
  IF SWAP SPLIT NIP MERGE
     CAST CFAR-ADDRESS -> UNSIGNED NAME>DEFINITION
  ELSE DROP DROP NULL DEFINITION
  THEN ;

PREV, just like NAME, throws an exception if the definition was created by :NONAME. Starting at the attribute field, PREV steps one cell back to fetch the contents of the definition's link field. Let's first assume that this is not the first definition in the dictionary, where the link field contains zero. Since the link is an address of data type ADDRESS, it has to be combined with the memory segment or bank of the name space to get the full address of the previous definition's name field. But this is still not the desired result. To get from there to the attribute field, PREV has to skip the name field and the link field. This final task is performed by the word NAME>DEFINITION:

: NAME>DEFINITION ( CFAR-ADDRESS -> UNSIGNED -- DEFINITION )
  DUP @ DUP 31 >
  IF DROP CAST DEFINITION
  ELSE + 1+ ALIGNED 1 CELLS + CAST DEFINITION
  THEN ;

NAME>DEFINITION expects the address of the name field on the stack. It adds the length of the string, plus one for the location that contains the length byte, and then gets the address aligned to cell size. The address now points to the link field. After skipping the link field, NAME>DEFINITION returns the definition, i. e., the address of the attribute field,

But NAME>DEFINITION also works if its input parameter is the address of a noname definition's attribute field instead of a name field. Since definitions with no name field have bit 5 set in the attribute field, the false length byte has a value of 32 or greater. In this case, NAME>DEFINITION just needs to cast its input parameter to a definition. What is this feature useful for? PREV does not take advantage of it, because noname definitions are not linked in the dictionary. The answer will be given at the end of this section, where PREV's counterpart NEXT is presented.

Now, what if PREV is applied to the first word in the dictionary? The link field of the first word is zero, and it makes no sense at all to continue. In this case, PREV simply returns a null definition to indicate that there is no previous word.

PREV is very useful when it comes to iterate over the words in the dictionary, as WORDS does. In some cases, PREV can also be applied to get access to a specific overloaded version of a word that is hidden below another overloaded version. This is because ' always returns the most recently defined overloaded version of a word with a given name:

' RSHIFT .
RSHIFT ( LOGICAL UNSIGNED -- 1ST )  OK
' RSHIFT PREV .
RSHIFT ( LOGICAL -- 1ST )  OK

Anyway, a more versatile means to get access to different overloaded versions will be presented later in this chapter.

The attribute field is the next field within the name space. It's contents can easily be queried by using the StrongForth words IMMEDIATE?, NONAME? and #PARAMS. IMMEDIATE? and NONAME? query the noname and immediate attributes, respectively. NONAME? returns FALSE if the corresponding bit is 0, and TRUE if the bit is 1. With IMMEDIATE?, it's exactly the other way around. Remember that the immediate attribute is inverted with respect to the immediate property of a definition. #PARAMS returns the content of the Size bit field, which is actually the length of the definition's stack diagram.

: IMMEDIATE? ( DEFINITION -- FLAG )
  CAST FAR-ADDRESS -> LOGICAL @ [ 7 BIT ] LITERAL AND 0= ;

: NONAME? ( DEFINITION -- FLAG )
  CAST FAR-ADDRESS -> LOGICAL @ [ 5 BIT ] LITERAL AND 0<> ;
  
: #PARAMS ( DEFINITION -- UNSIGNED )
  CAST FAR-ADDRESS -> DATA-TYPE @ OFFSET ;

The definition of #PARAMS takes advantage from the fact that the format of the combination of attribute field and token field resembles the format of an item of data type DATA-TYPE. On items of data type DATA-TYPE, the words ATTRIBUTE? and OFFSET can be applied. The relation between the bit fields of items of data type DATA-TYPE and the attribute field of a definition are as follows:

`DATA-TYPE`	attribute field
prefix attribute	immediate attribute
input parameter attribute	(reserved)
output parameter attribute	noname attribute
offset attribute	size of the stack diagram

The same programming trick is used in the definition of IMMEDIATE. IMMEDIATE is an ANS Forth word that makes the most recent definition an immediate word. In StrongForth, it just clears the immediate attribute.

: IMMEDIATE ( -- )
  LATEST CAST FAR-ADDRESS -> DATA-TYPE
  DUP @ DT-PREFIX INVERT AND SWAP ! ;

Another word changes the content of the size bit field by adding the offset attribute of an item of data type STACK-DIAGRAM to it. Because this operation usually has to be performed immediately after compiling the input and output parameter fields of a definition, this word is called END-DIAGRAM:

: END-DIAGRAM ( STACK-DIAGRAM -- )
  OFFSET LATEST CAST FAR-ADDRESS -> STACK-DIAGRAM DUP @
  ROT OFFSET+ SWAP ! ;

You've already seen an application of END-DIAGRAM in the definition of ) in chapter 7.

StrongForth provides an additional word that throws an exception if a definition has the noname attribute:

: ?NONAME ( DEFINITION -- 1ST )
  DUP NONAME? IF -267 THROW THEN ;

This word is quite useful if further processing relies on the absence of the noname attributes. You already know ?NONAME from the definitions of NAME and PREV. Any attempt to access the name field or the link field of a definition that does not have these fields can be prevented by including ?NONAME at the beginning the code.

After the attribute field comes the token field. In order to obtain a pointer to a definition's code field, >CODE simply fetches the execution token and casts it to an item of data type CONST -> CODE. Remember that the pointer to the code field is identical to the definition's token.

: >CODE ( DEFINITION -- CONST -> CODE )
  CAST FAR-ADDRESS -> CONST 1+ @ -> CODE ;

>BODY is very similar to >CODE. It returns a pointer to the definition's data field, which is next to its code field. Since the contents of the data field may vary between different words, >BODY just returns an unspecified pointer of data type CONST. Note that >BODY is an ANS Forth word, while >CODE is not.

: >BODY ( DEFINITION -- CONST )
  >CODE 1+ CAST CONST ;

Finally, PARAMS and PARAM@ grant you access to a definition's input and ourput parameter fields. PARAMS just calculates the address of the input parameter field by skipping the attribute and token fields: For that purpose, these two fields are again interpreted as one item of data type DATA-TYPE:

: PARAMS ( DEFINITION -- FAR-ADDRESS -> DATA-TYPE )
  CAST FAR-ADDRESS -> DATA-TYPE 1+ ;

Given a definition and a zero-based index into it's stack diagram, PARAM@ returns the selected basic data type as an item of data type DATA-TYPE. PARAM@ has already been presented in chapter 7:

: PARAM@ ( DEFINITION UNSIGNED -- DATA-TYPE )
  SWAP PARAMS SWAP + @ ;

As promised at the beginning of this section, here's the definition of NEXT, the counterpart of PREV:

: NEXT ( DEFINITION -- 1ST )
  DUP PARAMS SWAP #PARAMS +
  CAST CFAR-ADDRESS -> UNSIGNED NAME>DEFINITION ;

Starting at the attribute field of a definition, NEXT advances to the next definition by skipping the attribute field, the token field, and the input and output parameter fields, provided that the definition is not unfinished. The result is the address of the next definition's name field, so all that's missing is now for NAME>DEFINITION to skip the name field and the link field of the next definition.

Now it also becomes clear why NAME>DEFINITION's capability to deal with noname definitions is useful. It returns the correct result even if the next definition has no name field. Note that if a definition has no successor, which means it's the latest definition in the dictionary, the result of NEXT is undefined.

Searching the Dictionary

ANS Forth provides the word FIND for searching the dictionary for the latest occurrence of a word with a given name:

FIND ( c-addr -- c-addr 0  |  xt 1  |  xt -1 )

This word does not fit well into StrongForth. First, StrongForth requires that stack diagrams are unique at compilation time. Alternative stack diagrams are not allowed. In ANS Forth, the stack diagram of FIND depends on the search result, which is unknown at compile time. Secondly, StrongForth has abandoned counted strings, i. e. strings with a leading length byte in memory. Strings should rather be passed as a pointer to the first character and a separate parameter for the character count:

CDATA -> CHARACTER UNSIGNED

But ANS Forth offers an alternative to FIND. It's the word SEARCH-WORDLIST from the Search-Order word list:

SEARCH-WORDLIST ( c-addr u wid -- 0 | xt 1 | xt -1 )

This word does not expect a counted string. The ambiguity of the stack diagram can by fixed by adding a dummy xt as an additional output parameter in the case of an unsuccessful search. A version of SEARCH-WORDLIST that is compatible to StrongForth would look like this:

SEARCH-WORDLIST ( CDATA -> CHARACTER UNSIGNED WID -- TOKEN SIGNED )

This word always returns an object of data type TOKEN and a signed integer to indicate the search result. If SEARCH-WORDLIST is successful, it returns the execution token and +1 or -1, depending on whether the definition is an immediate word or not. If SEARCH-WORDLIST is not successful, it returns a null token and 0.

`SIGNED`	Search result
0	definition not found
+1	definition found, immediate word
-1	definition found, no immediate word

However, we're not yet done. Since bare execution tokens are not very useful in StrongForth, it makes much more sense to return an item of data type DEFINITION instead of one of data type TOKEN. As explained in the previous section, the token can easily be derived from a definition, because the token of a definition is just the address of it's code field.

Furthermore, StrongForth often needs additional search criteria to be applied, because a simple match on the name of the word is insufficient in many cases. For example, the interpreter and the compiler have to find a word whose name matches the parsed name and whose stack diagram somehow fits to the contents of the data type heap. This need is satisfied by adding two more parameters to SEARCH-WORDLIST. An item of data type CODE contains the address of a machine code subroutine that performs additional matching criteria. An item of data type SINGLE can be used to optionally pass parameters to this subroutine. It is even possible to search the word list for a word that only matches the additional search criteria, i. e., without considering the name of the word. To initiate such a search, you simply have to specify an empty string (a string with zero length) for the name of the word.

And finally, let's take advantage of StrongForth's capability to overload words by renaming SEARCH-WORDLIST to SEARCH. The interpreter and the compiler will always be able to distingush this version of SEARCH from the one for searching substrings within strings by doing an input parameter match. From the context, it is always obvious whether a word list or a substring within a string is to be searched. Now we have the final version of StrongForth's replacement for the ANS Forth word FIND:

SEARCH ( CDATA -> CHARACTER UNSIGNED SINGLE CODE WID -- DEFINITION SIGNED )

In contrast to FIND, SEARCH only searches one specific word list. Therefore, a variant called SEARCH-ALL is provided that searches all word lists in the search order. As long as the Search-Order word list is not loaded, the search order simply consists of the Forth word list. Loading the Search-Order word list redefines the deferred word SEARCH-ALL to search all word lists in the search order.

DEFER SEARCH-ALL ( CDATA -> CHARACTER UNSIGNED SINGLE CODE -- DEFINITION SIGNED )
:NONAME ( CDATA -> CHARACTER UNSIGNED SINGLE CODE -- DEFINITION SIGNED )
  FORTH-WORDLIST SEARCH ; IS SEARCH-ALL

StrongForth provides six machine code subroutines that implement additional search criteria. Their addresses are available as constants of data type CODE:

CODE-FIELD ( -- CODE )
TOKEN-FIELD ( -- CODE )
IDENTITY ( -- CODE )
DEFERRED ( -- CODE )
VIRTUAL ( -- CODE )
MATCH ( -- CODE )

If CODE-FIELD or TOKEN-FIELD are used together with a null parameter as SINGLE, no additional search criteria are applied, i. e., SEARCH and SEARCH-ALL find the latest word with the given name. This comes closest to the semantics of FIND in ANS Forth. A simple application of SEARCH-ALL with no additional search criteria is the word ', which is defined as follows:

: ' ( -- DEFINITION )
  PARSE-WORD NULL CODE CODE-FIELD SEARCH-ALL
  0= IF -13 THROW THEN ;

' always returns the latest definition whose name is identical to the one parsed from the input source. Other than ANS Forth, StrongForth's version of ' does not return the execution token of a definition. To obtain the token, you have to use 'TOKEN:

: 'TOKEN ( -- TOKEN )
  ' >CODE CAST TOKEN ;

A very similar word, 'CODE, returns the content of the code field of a definition with a given name:

: 'CODE ( -- CODE )
  ' >CODE @ ;

Note that both 'TOKEN and 'CODE will only work correctly if either the definition name is unique, i. e., if the definition is not overloaded, or you know for sure that you want the latest definition with the given name.

At this point, it should be clear how the phrase

NULL CDATA -> CHARACTER 0 0 CODE-FIELD SEARCH-ALL DROP

works, which has been used in the definition of WORDS. Since SEARCH-ALL is executed with no search criteria at all, it returns the first definition of the first word list. And this is exactly the one WORDS has to start with.

If CODE-FIELD is combined with a non-zero parameter, SEARCH and SEARCH-ALL consider the content of the code field as an additional search criterion. They find the latest word in the dictionary with the given name, whose code field contains the value specified by SINGLE. This is useful if only words with a specific code field shall to be taken into account. For example, DT tries to find a word that has been defined by PROCREATES, and TO may only find words that have been defined by VALUE. The definition of DT is presented in chapter 9.

TOKEN-FIELD specifies an additional search criterion that is quite similar to the one of CODE-FIELD. It allows only words to be found whose token field contains the value given by the parameter SINGLE, provided SINGLE has a non-zero value. This search criterion can be used to find the definition that belongs to a specific execution token. Although an execution token is a nearly unique identifier, there's no link back from the token to the definition. You have to search the complete dictionary in order to find out which definition a specific token belongs to. For this task, it is usually not necessary to provide the name of the definition to be searched. StrongForth actually provides the word SEARCH-TOKEN:

: SEARCH-TOKEN ( TOKEN -- DEFINITION SIGNED )
  NULL CDATA -> CHARACTER 0 ROT TOKEN-FIELD SEARCH-ALL ;

And here's an example of how SEARCH-TOKEN can be applied:

'TOKEN KEY .S
TOKEN  OK
SEARCH-TOKEN . .
-1 KEY ( -- CHARACTER )  OK

Another machine code subroutine that implements an additional search criterion is the one whose address is returned by IDENTITY. This criterion requires that the stack diagram of the word to be found has to be identical to a given sample. The sample is to be provided at the top of the local name space, just below the first unused address. SINGLE contains the length of the stack diagram as an unsigned number. Manually creating a stack diagram can be quite a tedious task, as shown in the following sample code

LOCAL-SPACE
 OK
DT LOGICAL DT-INPUT OR ,
 OK
DT UNSIGNED DT-INPUT OR ,
 OK
DT LOGICAL DT-OUTPUT OR 1 OFFSET+ ,
 OK
PARSE-WORD LSHIFT 3 IDENTITY SEARCH-ALL . .
-1 LSHIFT ( LOGICAL UNSIGNED -- 1ST )  OK

It's much more convenient to take advantage of StrongForth's special tools for creating stack diagrams. In the definition of the word )', <DIAGRAM and DIAGRAM> are used to process a stack diagram in the common format:

: )' ( MEMORY-SPACE FLAG STACK-DIAGRAM -- DEFINITION )
  <DIAGRAM DUP OFFSET PARSE-WORD ROT IDENTITY SEARCH-ALL
  0= IF -13 THROW THEN ROT ROT DIAGRAM> ;

)' is similar to ', but it allows to additionally specify the stack diagram of the word:

' LSHIFT .
LSHIFT ( LOGICAL UNSIGNED -- 1ST )  OK
( LOGICAL UNSIGNED -- 1ST )' LSHIFT .
LSHIFT ( LOGICAL UNSIGNED -- 1ST )  OK
( LOGICAL -- 1ST )' LSHIFT .
LSHIFT ( LOGICAL -- 1ST )  OK
( UNSIGNED -- 1ST )' LSHIFT .

( UNSIGNED -- 1ST )' LSHIFT ? undefined word
DEFINITION

However, an exact match of a given stack diagram still does not implement the search criteria of StrongForth's interpreter and compiler. The interpreter's and compiler's search criteria are more complex, because they allow a data type to match any of it's subtypes, or a basic data type to match compound data types. Even references to other data types can be provided within stack diagrams and are considered by the interpreter and the compiler. The details of this matching algorithm are described in detail in the next section.

DEFERRED as an additional matching rule is sort of a combination of the matching rules CODE-FIELD and IDENTITY. SEARCH and SEARCH-ALL will only find a definition whose code field is the one of a deferred definition, and whose stack diagram is exactly identical to the one of a sample definition, whose address is provided as the additional parameter SINGLE. Actually, SINGLE contains only the lower part of an item of data type DEFINITION that identifies the sample word. We'll dig deeper into the usage of DEFERRED in connection with the definition of IS in the last section of this chapter.

VIRTUAL also implements a combination of matching rules. The code field of the definition to be found has to be identical with a value embedded in the machine code at address VIRTUAL 2 +. Furthermore, the stack diagram of the definition to be found by SEARCH or SEARCH-ALL must be identical to the one of the sample definition whose address offset is provided as the parameter SINGLE, except for the last input parameter. The last input parameter of the definition may be the same as the one of the sample definition, but it may also be its parent or any other data type the last input parameter of the sample definition is derived from. These complex search criteria are used in StrongForth's OOP word set when assigning class member functions to virtual member functions.

If MATCH is specified as the address of the machine code subroutine, and the value of SINGLE is zero, SEARCH and SEARCH-ALL perform the interpreter's and compiler's matching rules by comparing the stack diagram of each word with the contents of the interpreter or compiler data type heap. In interpretation state, the interpreter data type heap is used. In compilation state, the selection of the data type heap depends on whether the word is immediate or not. The stack diagrams of immediate words are matched against the interpreter data type heap, while those of non-immediate words are matched against the compiler data type heap.

These rules work nicely for the interpreter and the compiler, but not for words like [COMPILE] and POSTPONE. These two words can only be executed in compilation state, because they always compile a word. They require a special rule to always match a stack diagram against the compiler data type heap, no matter whether the word is immediate or not. This special rule is applied if MATCH is combined with -1 as the value of SINGLE.

In both of these two cases, the chosen data type heap is updated to carry out the data type transformations as determined by the stack diagram of the found word. If, for example, the data type heap contains CODE FLAG UNSIGNED at it's top and SEARCH or SEARCH-ALL find ROT, these three data types are replaced by FLAG UNSIGNED CODE. Note that the word is not yet executed or compiled. Only the data type heap is being updated.

Are we now done with all those matching criteria? No, one is still missing. If MATCH is combined with a value of SINGLE that is neither zero nor minus one, a special variant of MATCH is applied. The matching rules for the input parameters are the same as for SINGLE = -1. However, an additional matching rule is applied to the output parameters. After carrying out the data type transformations for a word with matching input parameters, the output parameters are being compared with those of a sample word whose address ooset is provided as the parameter SINGLE. This address offset is assumed to be different from 0 and -1. If the output parameters of the word as calculated by the data type transformations are not exactly the same as the output parameters of the sample word, the word is rejected and the search is continued. An application of this rather complex matching rule will be presented in chapter 15 in connection with EXECUTE.

Before starting with a detailed description of the interpreter's and compiler's matching rules, let's summarise the matching criteria of SEARCH and SEARCH-ALL as specified by the two parameters SINGLE and CODE:

`SINGLE`	`CODE`	Matching criteria
`0`	`CODE-FIELD`	The latest word in the word set with the given name is found.
code field	`CODE-FIELD`	Only words whose code field contains the value of `SINGLE` can be found.
`0`	`TOKEN-FIELD`	The latest word in the word set with the given name is found.
token	`TOKEN-FIELD`	Only words whose token is equal to the value of `SINGLE` can be found.
length	`IDENTITY`	Only words whose stack diagram is exactly the same as the temporary stack diagram stored at the top of the local name space can be found. `SINGLE` is the length of the temporary stack diagram.
`0`	`MATCH`	In interpretation state, only words whose input parameters match the data types on the interpreter data type heap can be found. In compilation state, only non-immediate words whose input parameters match the data types on the compiler data type heap, and immediate words whose input parameters match the data types on the interpreter data type heap can be found.
`-1`	`MATCH`	Only words whose input parameters match the data types on the compiler data type heap can be found. An ambiguous condition exists if executed in interpretation state.
sample definition (lower part)	`MATCH`	Only words whose input parameters match the data types on the compiler data type heap can be found. An ambiguous condition exists if executed in interpretation state. Additionally, the output parameter list has to be identical to the output parameter list of the definition specified by `SINGLE`, after resolving all data type references to the input parameters. The value in single is the low word of a definition.

Matching Rules

When trying to find a word in the dictionary, the interpreter and the compiler do not only consider the name of the word. Since the concept of operator overloading allows to distinguish equally named words by the data types they are applied to, the stack diagrams become part of the matching rules. Note that ANS Forth allows redefining words as well, but once a new version has been defined, the old versions can no longer be found in the dictionary. In StrongForth, previously defined versions stay alive, as long as they can be distinguished by their stack diagrams from versions defined later.

Generally, interpreter and compiler try to find a word with the given name that can be applied to the items on the data stack. If, for example, the item on top of the stack has the data type UNSIGNED, a word whose stack diagram has an item of data type FLAG as it's last input parameter won't match. But what about a word whose stack diagram has only one input parameter of data type INTEGER? This word matches, because it can be applied to data type INTEGER and all its subtypes.

So what are the exact matching rules? First of all, SEARCH and SEARCH-ALL check whether the data stack contains at least as many items as the word has input parameters. Note that each item on the data stack corresponds to either a basic or a compound data type on the data type heap and in the input parameter field. Thus, the length of the input parameter field is not identical to the number of input parameters. For example, the input parameter field of the word

! ( SINGLE CONST -> 1ST -- )

consists of only 2 parameters, although the length of the input parameter field is 3.

Next, SEARCH and SEARCH-ALL compare the data type of each parameter with the data type of the corresponding item on the data stack. Four different cases have to be considered.

Case	Input parameter	Data stack item
1	Basic data type	Basic data type
2	Basic data type	Compound data type
3	Compound data type	Basic data type
4	Compound data type	Compound data type

Case 1

If both the input parameter and the corresponding item on the data stack are basic data types, they match if their data types are identical or if the item on the data stack is a subtype of the input parameter. For example,

ALIGNED ( ADDRESS -- 1ST )

matches if the item on top of the data stack is of data type ADDRESS, or any of its direct or indirect subtypes, like DATA, CONST or CDATA.

Case 2

Now. let's assume that the item on the data stack is a compound data type, while the corresponding input parameter is still a basic data type. A compound data type is always more specific than a basic data type. The input parameter matches the item on the data stack, if the header of the compound data type, which is the leftmost of the basic data types that constitute the compound data type, is identical to or a direct or indirect subtype of the basic data type of the input parameter. In the previous example, ALIGNED matches even if the item on top of the data stack is one of these:

DATA -> UNSIGNED
CONST -> CODE -> SINGLE
CDATA -> CHARACTER

Case 3

On the other hand, if the input parameter has a more specific data type than the corresponding item the data stack, they don't match. A word that requires a compound data type as an input parameter can't be satisfied by a basic data type. For example,

@ ( CODE -> SINGLE -- 2ND )

won't match if the item on top of the data stack has the basic data type CODE.

Case 4

What if both data types are compound data types? The matching rule for this case can be described by a simple recursion. Let

head1 -> tail1

be the compound data type of the input parameter and

head2 -> tail2

be the compound data type of the corresponding item on the data stack. head1 and head2 are basic data types, while tail1 and tail2 may be either basic or compound data types. The input parameter matches the item on the data stack if both of the following two conditions are met:

head2 is identical to head1, or head2 is a direct or indirect subtype of head1.
tail1 and tail2 match according to the rules described by cases 1 to 4.

In the above example, @ matches if the item on the data stack is one of these:

CODE -> SIGNED
CODE -> CONST -> LOGICAL

The same version of @ don't match if the data type of the item on top of the data stack is either of the following:

CODE -> UNSIGNED-DOUBLE
CCODE -> CHARACTER
DATA -> SINGLE

Data Type References

But still, these four cases do not cover everything. What happens if an input parameter contains a data type reference? A data type reference can be either in a basic data type or in the last basic data type of a compound data type. This means, a basic data type with a data type reference does never have the prefix attribute:

2ND               \ Okay
DATA -> 1ST       \ Okay
3RD -> UNSIGNED   \ Wrong!

If a reference is part of a compound data type, all basic data types up to the reference are processes according to the matching rules described in cases 1 to 4. The basic data type containing the reference is then substituted by the referenced basic or compound data type. Any one of the basic data types in the input parameter list, whose index is lower that the index of the basic data type containing the reference, can be referenced:

( UNSIGNED CHARACTER 1ST -- )	\ Okay
( CDATA -> CHARACTER 1ST -- )   \ Okay
( CDATA -> CHARACTER 2ND -- )   \ Okay
( CDATA -> CHARACTER 3RD -- )   \ Wrong!

Note that the referenced data type can be basic data type, the head of a compound data type or even the tail of a compound data type.

But although the referenced data type is part of the input parameter list, the actual match is performed against the data types of the items on the data stack. While matching the input parameters one by one with the data types of the items on the data stack, each basic data type in the input parameter list is assigned to a unique basic data type on the data type heap. In order for the match to succeed, the basic or compound data type on the data type heap, which is assigned to the data type reference in the input parameter list, has to be identical to the basic or compound data type on the data type heap, which is assigned to the referenced basic or compound data type in the input parameter list. Note that subtype or prefix relationships are not considered when matching references. The data types have to be identical.

The matching rules for data type references are best explained by a number of examples.

! ( SINGLE DATA -> 1ST -- )

matches

FLAG DATA -> FLAG

and

CONST -> UNSIGNED DATA -> CONST -> UNSIGNED

1ST is a reference to the first basic data type in the input parameter list, which is SINGLE. However, the data type is substituted by the corresponding data type of the item on the data stack, which is FLAG or CONST -> UNSIGNED. According to this rule, the following data types will not match:

INTEGER DATA -> SINGLE
CODE -> SINGLE DATA -> CODE
CONST DATA -> CONST -> CHARACTER

The rationale of using a reference in this special case is that an item of a specific data type can only be stored in a memory location, which is specified by an address of exactly the same data type.

- ( ADDRESS -> SINGLE 1ST -- INTEGER )

matches every pair of identical data types, provided the first parameter has passed the matching rule for ADDRESS -> SINGLE.

The last input parameter of

FILL ( DATA -> SINGLE UNSIGNED 2ND -- )

is a reference to the tail of the first parameter. It matches for

DATA -> TOKEN UNSIGNED TOKEN
DATA -> CONST -> SIGNED UNSIGNED CONST -> SIGNED

but not for

DATA -> LOGICAL UNSIGNED FLAG
DATA -> SINGLE UNSIGNED DATA -> SINGLE

To make things even worse, references are allowed to be recursive. Stack diagrams like in the following example are possible, and the matching rules for data type references are applied correctly:

( INTEGER CCONST -> 1ST DATA -> 2ND -- )

A word with this stack diagram can be found in the dictionary if the data type heap contains

INTEGER CCONST -> INTEGER DATA -> CCONST -> INTEGER

or something that is based on subtypes of data types INTEGER, CCONST and DATA, like

UNSIGNED CCONST -> UNSIGNED DATA -> CCONST -> UNSIGNED

Alias Definitions

In StrongForth, it is not unusual to define a new word that has exactly the same semantics as an already existing word. There are mainly two reasons for doing this:

Giving an existing definition a second name. This makes sense if a word is used for two different purposes. A good example is the word CELLS, which has the same semantics as 2* on machines where the size of a single cell is two address units.
Providing an overloaded version of a definition for input parameters of different data types. For example, if you have already defined a word that performs some calculation on items of data type INTEGER, and you need an overloaded version for items of data type ADDRESS, the semantics is usually exactly the same.

Of course, these two reasons can be combined. In some cases, you might want to define a word with the same semantics as an existing word, but with a different name and a different stack diagram. These so-called alias definitions can be defined by the StrongForth word ALIAS:

: ALIAS ( DEFINITION -- )
  ?EXECUTE >CODE (CREATE) END-CODE ;

ALIAS expects the existing definition on the data stack. From this definition, it extracts the contents of the code field and then creates a new definition with the same code field. Thus, the typical usage of ALIAS is as follows:

' name1 ALIAS name2 ( ... -- ... )

name1 is the existing definition, name2 is the alias definition. Now, here are the examples that were already mentioned in the above use cases:

' NOOP ALIAS CHARS ( INTEGER -- 1ST )
 OK
' 2* PREV ALIAS CELLS ( INTEGER -- 1ST )
 OK
' THEN ALIAS ENDIF ( ORIGIN -- )IMMEDIATE
 OK
: 2+ ( INTEGER -- 1ST )
  2 + ;
 OK
LATEST ALIAS 2+ ( ADDRESS -- 1ST )
 OK
7 2+ .
9  OK
HERE . HERE 2+ .
6852 6854  OK

The word PREV after ' 2* is required, because the first version of 2* that ' finds is the one for items of data type INTEGER-DOUBLE. Its predecessor in the dictionary is 2* for items of data type INTEGER:

' 2* .
2* ( INTEGER-DOUBLE -- 1ST )  OK
' 2* PREV .
2* ( INTEGER -- 1ST )  OK

Finally, here's one more example for an application of ALIAS. The definition of PARAMS, which has been presented earlier in this chapter, contains only one word besides a type cast. It is therefore an ideal candidate for an alias definition:

( FAR-ADDRESS -> DOUBLE -- 1ST )' 1+ ALIAS PARAMS
( DEFINITION -- FAR-ADDRESS -> DATA-TYPE )

)' needs to be used in order to select the correct overloaded version of 1+. This is actually the real definition of PARAMS. Executing this version saves one level of indirection compared to a colon definition.

ALIAS is a nice feature. However, the fact that the alias definition shares its code field with the one of the original definition means that the same token is used when both definitions are compiled. In the following example, both NEGATE and CFLAG are compiled into TEST with the same token:

( INTEGER -- 1ST )' NEGATE ALIAS CFLAG ( FLAG -- INTEGER )
 OK
TRUE CFLAG .
1  OK
FALSE CFLAG .
0  OK
: TEST +8140 NEGATE . TRUE CFLAG . ;
 OK
TEST
-8140 1  OK

Everything works fine. So what's the problem? The problem arises when you try to SEE the definition of TEST:

SEE TEST
: TEST ( -- )
  8140 CFLAG . TRUE CFLAG . ;  OK

SEE is not able to distinguish the two words and thus displays the name of the alias, because it was defined most recently. This is correct, but it's misleading. Especially when it comes to defining stack movement words for mixed single-cell, double-cell and floating-point items. For example, OVER ( SINGLE FLOAT -- 1ST 2ND 1ST ) is defined as an alias of DUP ( SINGLE -- 1ST 1ST ), so SEE would display OVER instead of each DUP. Actually, almost all stack movement words will be displayed wrongly once the Floating-Point word set has been loaded. In order to fix this issue, an alternative to ALIAS has been provided, which creates an alias definition with a new code field. This word is called SYNONYM. It can only be used for code definitions, because code definitions usually don't have a data field. SYNONYM is simply used in a code definition as a replacement for the machine code instructions, like in this example:

CODE NEGATE ( INTEGER -- 1ST )
' CFLAG SYNONYM

SEE is now able to distinguish the tokens of NEGATE and CFLAG:

: TEST +8140 NEGATE . TRUE CFLAG . ;
 OK
SEE TEST
: TEST ( -- )
  8140 NEGATE . TRUE CFLAG . ;  OK

Since CODE already creates a new code field, SYNONYM just needs to replace its content with that of the other definition's code field. An exception will be thrown if you try to apply SYNONYM to anything else but a code definition. The definition of SYNONYM already contains END-CODE to terminate the code definition:

: SYNONYM ( DEFINITION -- )
  ?EXECUTE >CODE @ LATEST >CODE
  DUP @ CODE-HERE <> IF -274 THROW THEN ! END-CODE ;

You should use SYNONYM instead of ALIAS whenever you want to define an alias for a code definition that has a different name than the original code definition.

Forward References

As a general rule, every Forth word has to be defined before it can be either compiled or executed. However, there are cases in which it is desireable to compile a word before it is actually defined. Compiling a word that has not yet been defined is called a forward reference. A typical application of forward references are recursions that extend over more than one word. If word A uses word B and word B in turn uses word A, none of these two words can be defined first:

: A ( ... -- ... ) ... B ... ;
: B ( ... -- ... ) ... A ... ;

Using execution tokens is a possible solution for this problem:

( ... -- ... )PROCREATES T
NULL T VALUE (B)
: A ( ... -- ... ) ... (B) EXECUTE ... ;
: B ( ... -- ... ) ... A ... ;
DT T ?TOKEN B CAST T TO (B)

But there's a more elegant solution:

DEFER B ( ... -- ... )
: A ( ... -- ... ) ... B ... ;
:NONAME ( ... -- ... ) ... A ... ; IS B

The definition of B is deferred until after A has been defined. DEFER creates a deferred definition that executes a yet to be defined word. IS calculates the token of this word and stores it in the data field of the deferred definition B. Of course, IS checks that

B is a deferred definition
B and the word have exactly the same stack diagrams.

Here's a simple example about how to use DEFER and IS:

DEFER FLIP ( UNSIGNED -- 1ST )
 OK
: FLOP ( UNSIGNED -- 1ST )
  ." FLOP " DUP 1 > IF 1- FLIP THEN ;
 OK
:NONAME ( UNSIGNED -- 1ST )
  ." FLIP " FLOP ; IS FLIP
 OK
3 FLIP
FLIP FLOP FLIP FLOP FLIP FLOP  OK
3 FLOP
FLOP FLIP FLOP FLIP FLOP  OK

Now, let's have a view at the definitions of DEFER and IS:

: DEFER ( -- )
  ?EXECUTE CONST-HERE (CREATE) ['CODE] THROW CONST,
  ['TOKEN] NOOP CONST, END-CODE ;
  
: IS ( DEFINITION -- )
  DUP SPLIT DROP PARSE-WORD ROT DEFERRED SEARCH-ALL
  IF SWAP >CODE SWAP >BODY -> CONST -> CODE ! 
  ELSE DROP DROP -269 THROW
  THEN ;

DEFER simply creates a new definition with a given name and the same code field as the definition of THROW. This means, THROW must be a deferred definition. Of course, there's a good reason to make THROW a deferred definition, and you'll find out why in a moment. The data field of the new definition is initialized with the token of NOOP. IS tries to find a deferred definition with the given name whose stack diagram is identical to the one of its input parameter. DEFERRED provides the special matching criteria for this purpose to SEARCH-ALL. If there's a match, IS stores the token of its input parameter, i. e., the address of the code field, in the data field of the deferred definition. The machine code that is shared by all deferred definitions simply executes the token that is stored in their data field.

DEFER and IS are not specified in ANS Forth, but they are a de facto standard. That's one reason why they are an integral part of StrongForth. But there's a second reason. Certain word sets as specified by ANS Forth require that the semantics of some words belonging to other word sets are being modified. For example, when loading the Floating-Point word set, INTERPRET has to be extended for parsing floating-point numbers and for interpreting or compiling them as literals. ABORT has to initialize the floating-point hardware, and THROW has to save the floating point stack pointer in the exception frame. That's why these words have deferred definitions. However, besides a small performace drop when executing one of these words, you won't notice any difference.

An important detail of the definitions of DEFER and IS is the fact that the token of the definition to be executed is stored in the CONST memory area. In an embedded system, the CONST memory area is typically write only. It is therefore not a good idea to reassign the token of the deferred definition at runtime. IS should only be used during compilation in order to resolve forward references, or to adapt the semantics of a definition to a newly loaded word set or library. At runtime, you can always use EXECUTE in connection with execution tokens that are stored on the data stack or at another location in the DATA memory area.

Dr. Stephan Becher - January 26th, 2008