Chapter 7: Data Types

What's In A Data Type?

In order to support StrongForth's data type system, the interpreter and the compiler have to store information about the data types of the items on the stack. The stack diagrams of all words have to be included in the dictionary. Interpreter and compiler use these data to find out which version of an overloaded word is to be interpreted or compiled next.

Not surprisingly, StrongForth's data type system is build up on a data type called DATA-TYPE. For example, the data structure that describes the data types of the items that are currently on the stack, are lists of items of this data type. Stack diagrams also consist of lists of items of data type DATA-TYPE. So, the first question to answer is: What's in a data type?

As shown in the below picture, each item of data type DATA-TYPE is composed of an identifier that specifies the data type, e. g., UNSIGNED or ORIGIN, and a number of attributes.

Identifier Unused P I O Offset

Identifier:
Unused:
P:
I:
O:
Offset:
unique identifier of the data type
reserved for implementation defined attributes
prefix attribute
input parameter attribute
output parameter attribute
offset attribute

A data type has four attributes, which are used in stack diagrams to distinguish input and output parameters, to build compound data types and to specify data type references. P, I and O are binary attributes, i. e., they each consist of one bit only. In the following stack diagram of the word @, all attributes are present:

@ ( DATA -> SINGLE -- 2ND )

The stack diagram consists of a list of three basic data types:

# Identifier P I O Offset
1 DATA 1 1 0 0
2 SINGLE 0 1 0 0
3 SINGLE 0 0 1 2

The first two basic data types are input parameters, while the third one, being on the right side of --, is an output parameter. Setting the prefix attribute of the first basic data type connects it with the second basic data type to compose a compound data type. The offset attribute of the third basic data type indicates that it is a reference to the second basic data type. This means also that these two data types must have the same data type identifier (SINGLE or a subtype of SINGLE).

Obviously, the usual arithmetic and logic operations can not be applied to items of data type DATA-TYPE. So how can you set and query the identifier and the attributes of a data type? First of all, you need a way to push an item of data type DATA-TYPE with a specific identifier onto the stack. For this purpose, StrongForth provides the words DT and [DT], which are applied similarly like the ANS Forth words ' and [']:

DT LOGICAL

is used in interpretation state to create an item of data type DATA-TYPE with the identifier LOGICAL and no attributes. Similarly,

[DT] LOGICAL

can be used during compilation to compile a literal data type. You'll learn more about DT and [DT] in a later section of this chapter.

Now what about the attributes? First of all, StrongForth provides four constants of data type DATA-TYPE that are bit masks for the attributes:

DT-PREFIX ( -- DATA-TYPE )
DT-INPUT ( -- DATA-TYPE )
DT-OUTPUT ( -- DATA-TYPE )
DT-OFFSET ( -- DATA-TYPE )

DT-INPUT, for example, is a data type with no identifier and only the input parameter attribute. DT-OFFSET is a bit mask for the offset attribute, i. e., all bits of the offset attribute field are set to 1 and all other bits are 0.

To be able to set and clear the attributes of a data type, overloaded versions of the logical operators AND, OR, XOR and INVERT are available in StrongForth:

AND ( DATA-TYPE DATA-TYPE -- 1ST )
OR( DATA-TYPE DATA-TYPE -- 1ST )
XOR ( DATA-TYPE DATA-TYPE -- 1ST )
INVERT ( DATA-TYPE -- 1ST )

As can be seen from the stack diagrams, these words can only be applied to items of data type DATA-TYPE and it's subtypes. Following the general rule for arithmetical and logical operations, the data type of the output parameter is identical to the data type of the first input parameter. Here's a first example:

DT-PREFIX DT-OUTPUT OR

creates a data type with both the prefix attribute and the output parameter attribute. To clear an attribute, you can apply the logical AND of the inverted bit mask:

DT-OUTPUT INVERT AND

Note that the logical operations on data types only apply to the attributes, not to the identifier. This means, the identifier of the resulting data type is always identical to the identifier of the first data type parameter of a logical operation. For example, the result of

DT CHARACTER DT-OUTPUT OR DT FLAG DT-PREFIX OR OR

is a data type with identifier CHARACTER, output parameter attribute, and prefix attribute. The identifier FLAG of the second data type is not considered at all. The result of

DT DOUBLE DT-OFFSET OR INVERT

still has identifier DOUBLE and additionally the input parameter attribute, the output parameter attribute, and the prefix attribute.

In addition to the standard logical operators, StrongForth provides three words that can be used to query the contents of a data type:

ATTRIBUTE? ( DATA-TYPE DATA-TYPE -- FLAG )
NULL? ( DATA-TYPE -- FLAG )
OFFSET ( DATA-TYPE -- UNSIGNED )

ATTRIBUTE? leaves TRUE on the stack if and only if the first data type has at least one of the attributes of the second data type. NULL? returns TRUE if and only if the data type does not contain a data type identifier, i. e., if all bits of the Identifier field are zero. OFFSET delivers the offset attribute of a data type as an unsigned number. Here's the definition of OFFSET:

: OFFSET ( DATA-TYPE -- UNSIGNED )
  DT-OFFSET AND CAST UNSIGNED ;

Eventually, there's a word for changing the offset attribute of a data type. OFFSET+ adds an integer value to the offset attribute of a data type. It's definition demonstrates quite nicely the usage of OFFSET and some of the logical operations on items of data type DATA-TYPE:

: OFFSET+ ( DATA-TYPE INTEGER -- 1ST )
  OVER OFFSET + DUP CAST UNSIGNED 0 32 WITHIN
  IF SWAP [ DT-OFFSET INVERT ] LITERAL AND
     SWAP CAST DATA-TYPE OR
  ELSE DROP -259 THROW
  THEN ;

Since the offset attribute field is only 5 bits wide, an exception has to be thrown whenever the resulting offset is 32 or higher. Negative values are likewise rejected as offset attribute.

Now it's time for some simple exercises, to get more familiar with all these words. First, let's define a word that prints a data type and all it's attributes. As you will see later in this chapter, the overloaded version of . for items of data type DATA-TYPE just prints the identifier.

: .DT ( DATA-TYPE -- )
  DUP .
  DUP DT-PREFIX ATTRIBUTE? IF [CHAR] P ELSE [CHAR] - THEN .
  DUP DT-INPUT  ATTRIBUTE? IF [CHAR] I ELSE [CHAR] - THEN .
  DUP DT-OUTPUT ATTRIBUTE? IF [CHAR] O ELSE [CHAR] - THEN .
  SPACE OFFSET . ;
 OK

Going back to the example at the beginning of this section, let's create the basic data types in the stack diagram of @:

DT DATA DT-INPUT OR DT-PREFIX OR .DT
DATA PI- 0  OK
DT SINGLE DT-INPUT OR .DT
SINGLE -I- 0  OK
DT SINGLE DT-OUTPUT OR 2 OFFSET+ .DT
SINGLE --O 2  OK

Finally, here are some more examples of how to manipulate a data type:

DT TOKEN DT-PREFIX OR DT-OUTPUT OR 2 OFFSET+ CONSTANT DT1
 OK
DT1 .DT
TOKEN P-O 2  OK
DT1 INVERT .DT
TOKEN -I- 29  OK
DT1 DT ADDRESS DT-OUTPUT OR AND .DT
TOKEN --O 0  OK
DT1 DT-INPUT DT-OUTPUT OR XOR .DT
TOKEN PI- 2  OK
DT1 -1 OFFSET+ .DT
TOKEN P-O 1  OK
DT1 DT-INPUT DT-OUTPUT OR ATTRIBUTE? .
TRUE  OK
DT1 DT-INPUT ATTRIBUTE? .
FALSE  OK

Data and Return Stack

ANS Forth specifies the existence of two stacks, the data stack and the return stack. The data stack is used to store untyped items of cell size or double cell size, while the return stack is primarily used for storing return addresses, which also have cell size. Because it is sometimes very handy to have two data stacks available, the return stack may be abused for temporary data storage as well, provided certain precautions are kept to protect the return addresses.

In StrongForth, the two stacks are used in exactly the same way as in ANS Forth. There's no difference at all. As a result, the runtime performance of a StrongForth application is not less than that of a corresponding ANS Forth application. Data type information is not stored on any on the two stacks.

Just like some other Forth systems, StrongForth provides words for initialising the stacks and for obtaining the stack pointers. Of course, both data and return stack are located in the DATA memory area:

SP! ( DATA -- )
RP! ( DATA -- )
SP0 ( -- DATA )
RP0 ( -- DATA )
SP@ ( -- DATA )
RP@ ( -- DATA )

SP! and RP! initialise the data stack pointer and the return stack pointer with an address of data type DATA, respectively. These words should be used with care, because access to parts of the stack contents might get lost. Furthermore, the data type system gets confused if you change the value of the data stack pointer without simultaneously updating the data type information. Changing the value of the return stack pointer within a word is even more dangerous, because this might cut off the way back to the point from where the execution of the word was initiated. However, in cases like system restart or exception handling, this behaviour might be intended. Chapter 16 explains in detail how SP! and RP! are used by words like QUIT, ABORT and CATCH.

SP0 and RP0 are address constants that deliver initialization values for the data and return stack pointer, respectively. These constants are used together with SP! and RP! during system restart. The phrase SP0 SP! empties the data stack, while RP0 RP! empties the return stack.

SP@ and RP@ return the data stack pointer and the return stack pointer, respectively. Since these two low-level words don't know anything about the existence of StrongForth's data type system, they just return the pointer as an unspecified address of data type DATA. Note that SP@ returns the value of the data stack pointer before it is itself pushed onto the stack:

613 15 SP@ -> UNSIGNED @ . . .
15 15 613  OK

Data Type Heaps

But where, if not on the data and return stacks, does StrongForth keep the data type information? On two separate data structures, that are called data type heaps. In contrast to a stack, which typically grows from high to low addresses, a heap grows from low to high addresses. Having different data structures for the data items and for the data type information just underlines their independence, otherwise it is insignificant. Using a heap instead of a stack brings some advantages for the implementation, because the order of the basic data types on a heap matches the order of the basic data types within a stack diagram.

Why does StrongForth need two data type heaps? Is there one for the data stack and one for the return stack? Good guess, but this is wrong. There's no data type heap for the return stack. Instead, StrongForth has two data type heaps for the items on the data stack, which are called interpreter data type heap and compiler data type heap.

The interpreter data type heap contains the data types of the items that are currently on the data stack. For each single or double cell item on the data stack, the interpreter data type heap contains a compound data type. Although this is an exact one-to-one relation, the memory requirements might look quite different. An item on the data stack takes one or two cells, while a compound data type on the data type heap consists of one or more basic data types, that each have the size of an item of data type DATA-TYPE, which is a double cell on most systems. For example, data type

SIGNED-DOUBLE

requires one double cell on both the interpreter data type heap and the data stack, whereas

CDATA -> CHARACTER

requires two double cells on the interpreter data type heap and only one cell on the data stack.

The compiler data type heap is used during compilation to specify the data types of the items which will be on the stack at runtime. Let's view a simple example:

: ?EXECUTE
  STATE @ IF -29 THROW THEN ;

When compiling STATE, the compiler puts the compound data type DATA -> FLAG onto the compiler data type heap, because that's the stack effect of STATE. Since STATE is not actually executed, there's no change on the data stack and therefore no change on the interpreter data type heap. Compiling @ causes DATA -> FLAG to be removed from the compiler data type heap, and FLAG is put onto the heap instead. Nothing spectatular so far. But now it's getting interesting, because IF is an immediate word. Being executed immediately, it can affect the compiler data type heap as well as the interpreter data type heap. IF compiles 0BRANCH, which consumes FLAG from the compiler data type heap. Furthermore, it pushes an item of data type ORIGIN onto the data stack, so data type ORIGIN must be added to the interpreter data type heap. Compiling -29 puts SIGNED onto the compiler data type heap, which is then consumed by compiling THROW. Since THEN is another immediate word, it affects the data stack and the interpreter data type heap by consuming the item of data type ORIGIN that was created by IF.

The immediate word .S may be used to visualise the contents of the data type heaps at any time. During interpretation, it prints the contents of the interpreter data type heap:

17 -5 .S
UNSIGNED SIGNED  OK
+ .S .
UNSIGNED 12  OK

During compilation, .S prints the contents of the compiler data type heap. If you want to see what's on the interpreter data type heap during compilation, you have to temporarily switch to interpretation state and then execute .S. Here's again our example:

: ?EXECUTE
STATE .S
DATA -> FLAG @ .S
FLAG IF .S
[ .S ]
COLON-DEFINITION ORIGIN -29 THROW THEN
[ .S ] ;
COLON-DEFINITION  OK

Note that during compilation, an item of data type COLON-DEFINITION resides on the data stack. This item is used to pass information between : to ;. You'll later learn more about compilation in StrongForth.

StrongForth provides a number of words that perform operations directly on the two data type heaps:

DTP@ ( -- DATA -> DATA-TYPE )
DTP! ( -- )
DTP| ( -- )
DEPTH ( -- UNSIGNED )
>DT ( DATA-TYPE -- )
DT> ( -- DATA-TYPE FLAG )

These words have one thing in common. They use the interpreter data type heap, when the system is in interpretation state, and the compiler data type heap, when it is in compilation state. Since none of them is an immediate word, they can not directly be executed in compilation state. However, they may be compiled into immediate words.

DTP@ and DTP! perform similar operations on the currently selected data type heap as SP@, SP!, RP@ and RP! do on the data or return stack. DTP@ returns the address of the next free location of the data type heap, which is referred to as the data type heap pointer. DTP! clears the data type heap by resetting the data type heap pointer to the start address of the data type heap. Note that DTP!, unlike SP! and RP!, does not have an input parameter. Executing DTP! in interpretation state is potentially dangerous, because all data type information gets lost. It will normally only be used in connection with SP!. On the other hand, DTP! should always be used at the beginning of a compilation, immediately after entering compilation state. This will ensure that the compilation starts with a clean compiler data type heap.

DTP| has no counterpart for the data and return stack. It has no effect in interpretation state. In compilation state, it locks the compiler data type heap, i. e. it prevents any further usage of this heap. The compiler data type heap can only be unlocked by executing DTP! in compilation state. As long as the compiler data type heap is locked, DTP@ returns zero instead of the compiler data type heap pointer.

What for can such a strange word be used? There are some words, like BRANCH and EXIT, which break the control flow during compilation. Keeping data type information after compiling one of these words makes no sense at all, because the data flow is broken as well. If any word is compiled after BRANCH or EXIT, it has to start with new data type information, which may come from a different control flow that happens to touch down here. DTP| ensures that the data flow can not be continued after the control flow has been interrupted. If this still sounds strange to you, please be patient and wait until words that actually compile BRANCH or EXIT are presented.

DEPTH is the next word in the list. In ANS Forth, DEPTH delivers the number of cells which are currently on the data stack, i. e. before the number itself is placed there. In StrongForth, DEPTH returns the number of basic data types on the interpreter or compiler data type heap. Since the interpreter data type heap is always updated before a word is executed, the number returned by DEPTH includes the stack effect of DEPTH itself, or of the word that directly or indirectly contains DEPTH. Here are two examples:

884003. STATE DEPTH .S .
UNSIGNED-DOUBLE DATA -> FLAG UNSIGNED 4  OK
: .DEPTH ( -- )
  DEPTH . ;
 OK
.S .DEPTH
UNSIGNED-DOUBLE DATA -> FLAG 3  OK
. .
384 884003  OK

There's no word in StrongForth to calculate the actual depth of the data stack. If you really need it, you can easily calculate it by subtracting SP0 from SP@. But beware, the result is the depth of the data stack in address units. If you need the number of cells, you either have to divide it by the number of address units per cell, or by making sure the address arithmetic is based on cells:

TRUE -816603. .S
FLAG SIGNED-DOUBLE  OK
SP@ SP0 SWAP - CAST UNSIGNED 1 CELLS / .
3  OK
SP@ -> SINGLE SP0 -> SINGLE SWAP - .
3  OK
. .
-816603 TRUE  OK

Another way to calculate the depth of the data stack is by iterating through the compound data types on the data type heap, and counting the number of single and double cells for all data types. For example, if the data type heap contains

UNSIGNED-DOUBLE DATA -> FLAG UNSIGNED
the data stack should be exactly 4 cells deep: two cells for an item of data type UNSIGNED-DOUBLE, and one cell each for the items of data types DATA -> FLAG and UNSIGNED. Here's the definition of the word that does it this way:

: DEPTH-SP ( -- UNSIGNED )
  0 TRUE DTP@ DUP DEPTH -
  ?DO IF I @ SIZE + THEN
     I @ DT-PREFIX ATTRIBUTE? INVERT
  LOOP DROP ;

Note that DEPTH-SP is included in StrongForth's Exception word set, which means that it is only available after the Exception word set has been loaded. DEPTH-SP calculates the bottom of the data type heap by subtracting DEPTH from the data type heap pointer, and then uses a ?DO loop to iterate through all basic data tapes. Inside the loop, the cell counter (initialised to 0) and a flag are kept on the stack. If the flag is true, the cell counter is incremented by 1 or 2, depending on the size in cells of the current basic data type. SIZE will be presented in chapter 9.

In the second part of the ?DO loop, the flag is recalculated. It is false, if the current basic data type has the prefix attribute, otherwise it is true. A false flag prevents incrementing the cell counter during the next iteration, because the current data type of the next iteration belongs to the tail of a compound data type. The flag calculated during the last iteration is discarded after the loop terminates, leaving only the cell counter on the stack.

DEPTH-SP can be used in interpretation state as well as in compilation state. Note that the result is only correct as long as data stack and data type heap are aligned. In interpretation state, DEPTH-SP returns the depth of the data stack after it has been executed, because its output parameter of data type UNSIGNED is included in the iteration. Here's a simple example:

CHAR A LATEST BASE DEPTH-SP .S
CHARACTER DEFINITION DATA -> UNSIGNED UNSIGNED  OK
.
5  OK

There are two more words that deal directly with the data type heaps:

>DT ( DATA-TYPE -- )
DT> ( -- DATA-TYPE FLAG )

Again, both words are state-smart in the sense that they operate on the interpreter or compiler data type heap depending on the current state. >DT pushes a new basic data type onto the heap, while DT> pops a basic data type from the data type heap. DT> also returns a flag that indicates whether the basic data type is a part of the tail of a compound data type. Note that this information cannot be obtained from the basic data type itself, because it depends on the prefix attribute of the previous basic data type on the heap.

>DT and DT> should be used with care, because they may corrupt StrongForth's data type system. They are typically used in immediate words like CAST and LITERAL. You'll see several examples of using >DT and DT> later in this chapter and in the following chapters.

Creating Stack Diagrams

As you've already seen, a stack diagram is just a list of data types with various attributes. Stack diagrams are created when a new word is compiled into the dictionary. During compilation of a stack diagram, three bits of information have to be tracked:

It's obvious that this information perfectly fits into an item of data type DATA-TYPE, because it looks very similar to the data type attributes that were described in the first section of this chapter. The offset attribute can be used to store the size of the stack diagram, the input and output parameter attributes determine whether input or output parameters are being compiled, and the prefix attribute stores the information about compound data types. Nevertheless, a stack diagram is not a data type. StrongForth provides a separate data type called STACK-DIAGRAM for this purpose. Since STACK-DIAGRAM is a direct subtype of DATA-TYPE. all words that can be applied to items of data type DATA-TYPE, like OR, ATTRIBUTE?, and OFFSET, can also be applied to items of data type STACK-DIAGRAM.

Data types are added to a stack diagram by simply executing a word with the name of the data type. Whenever words like UNSIGNED, CHARACTER, or even DATA-TYPE are executed, the corresponding data type is compiled as a parameter into a stack diagram. Since the semantics of all these words are identical, it's obvious to define new data types by using a defining word. In StrongForth, this defining word is called PROCREATES. It creates a new subtype of an existing data type, that is provided as input parameter to PROCREATES. With PROCREATES, StrongForth's predefined data types can be created like this:

NULL DATA-TYPE    PROCREATES SINGLE 1 CONST,
DT SINGLE         PROCREATES INTEGER
DT INTEGER        PROCREATES UNSIGNED
DT INTEGER        PROCREATES SIGNED
DT INTEGER        PROCREATES CHARACTER
DT SINGLE         PROCREATES ADDRESS
DT ADDRESS        PROCREATES DATA
DT ADDRESS        PROCREATES CONST
DT ADDRESS        PROCREATES PORT
DT ADDRESS        PROCREATES CODE
DT ADDRESS        PROCREATES CADDRESS
DT CADDRESS       PROCREATES CDATA
DT CADDRESS       PROCREATES CCONST
DT CADDRESS       PROCREATES CPORT
DT CADDRESS       PROCREATES CCODE
DT SINGLE         PROCREATES LOGICAL
DT LOGICAL        PROCREATES FLAG
DT SINGLE         PROCREATES TOKEN
DT SINGLE         PROCREATES MEMORY-SPACE
DT SINGLE         PROCREATES FILE
DT SINGLE         PROCREATES WID

NULL DATA-TYPE    PROCREATES DOUBLE 1 CONST,
DT DOUBLE         PROCREATES INTEGER-DOUBLE
DT INTEGER-DOUBLE PROCREATES UNSIGNED-DOUBLE
DT INTEGER-DOUBLE PROCREATES SIGNED-DOUBLE
DT DOUBLE         PROCREATES CONTROL-FLOW
DT DOUBLE         PROCREATES DATA-TYPE
DT DATA-TYPE      PROCREATES STACK-DIAGRAM
DT DOUBLE         PROCREATES FAR-ADDRESS
DT FAR-ADDRESS    PROCREATES CFAR-ADDRESS
DT DOUBLE         PROCREATES DEFINITION
DT DEFINITION     PROCREATES COLON-DEFINITION

PROCREATES expects the parent data type of the new data type on the stack. Does this mean that you always have to create a new data type as a subtype of an already existing data type? No, it doesn't. SINGLE and DOUBLE are not subtypes of any other data type. You can create data types that have no parent by providing PROCREATES with a null data type:

NULL DATA-TYPE PROCREATES ADAM 1 CONST,
NULL DATA-TYPE PROCREATES EVE 1 CONST,

However, keep in mind that those new data types will not match any input parameter of StrongForth's vocabulary. Even words like DUP and DROP can not be applied, because they are only defined for items of data types SINGLE and DOUBLE, and their respective subtypes:

25 CAST EVE
 OK
DROP

DROP ? undefined word
EVE

Nevertheless, new data types with no parent might be useful if their usage shall be restricted. You need to manually define or redefine all words that can be applied to items of these data types:

: DROP ( EVE -- )
  CAST SINGLE DROP ;

But what about the phrase 1 CONST, that succeeds the definitions of SINGLE, DOUBLE, ADAM and EVE? This phrase compiles the size of the data types in cells. Data types that have not parents are called ancestors in StrongForth. Since each directly or indirectly derived data type has the same size as its ancestor, only the ancestor itself needs to be provided with the size information. In chapter 9, the word SIZE is presented, which calculates the size in cells of any data type.

Now, let's have a look at the definition of PROCREATES:

: PROCREATES ( DATA-TYPE -- )
  CREATE SPLIT CONST, DROP
  DOES> ( STACK-DIAGRAM CONST -- 1ST )
  0 SWAP 1 CELLS - MERGE CAST DATA-TYPE (PARAM) ;

The words between CREATE and DOES> compile the identifier of the parent data type into the data field of the new data type. Note that the identifier is located in the most significant part of the double cell that constitutes a data type.

The stack diagram of words that compile data types into a stack diagram is simply

( STACK-DIAGRAM -- 1ST )

where STACK-DIAGRAM keeps information about the status of the stack diagram. Consequently, the DOES> part of PROCREATES has the same stack diagram, with the addition of an input parameter CONST, which is the pointer to the data field. This pointer minus the size of one cell gives the address of the code field, and this is the identifier of the new data type. MERGE CAST DATA-TYPE makes a data type with no attributes from this identifier. (PARAM) finally compiles the data type into the stack diagram. Note that data types, that are part of a stack diagram, are usually called parameters in StrongForth.

If (PARAM) compiled a parameter directly into the current memory space, it would be difficult to set the prefix attribute correctly. Consider an example where you have to specify a compound data type in a stack diagram:

... CDATA -> CHARACTER ...

CDATA has to be compiled with the prefix attribute. But during execution of CDATA, -> has not even been parsed. (PARAM) has no means to know that CDATA is part of a compound data type. Therefore, (PARAM) defers the compilation of a parameter until it is called the next time, which happens when either the next parameter occurs in the stack diagram or one of the words -- or ) is executed. The identifier of the deferred parameter is simply stored in the corresponding field of STACK-DIAGRAM, which has been unused so far. Here's the definition of (PARAM):

: (PARAM) ( STACK-DIAGRAM DATA-TYPE -- 1ST )
  CAST STACK-DIAGRAM OVER DT-PREFIX INVERT AND OR
  SWAP DUP NULL?
  IF DROP
  ELSE [ DT-PREFIX DT-INPUT OR DT-OUTPUT OR ] LITERAL AND PARAM,
  THEN ;

The first line of code copies all attributes except for the prefix attribute from STACK-DIAGRAM to DATA-TYPE, which makes DATA-TYPE the new STACK-DIAGRAM. If the old STACK-DIAGRAM does not contain an identifier, no parameter has been deferred in a previous execution on (PARAM) within the same stack diagram. In this case, (PARAM) is already done. Otherwise, it uses PARAM, to actually compile the deferred parameter including the input parameter, the output parameter and the prefix attribute, but without the offset attribute. Remember that the offset attribute of STACK-DIAGRAM is the current size of the stack diagram.

PARAM, just compiles a parameter into a stack diagram and increments the size of the stack diagram by one. A stack diagram is always created in the current memory space:

: PARAM, ( STACK-DIAGRAM DATA-TYPE -- 1ST )
  , 1 OFFSET+ ;

Let's continue with stack diagrams. Apart from data types, stack diagrams usually contain a number of other words as well:

-- ( STACK-DIAGRAM -- 1ST )
-> ( STACK-DIAGRAM -- 1ST )
TH ( STACK-DIAGRAM UNSIGNED -- 1ST )
1ST ( STACK-DIAGRAM -- 1ST )
2ND ( STACK-DIAGRAM -- 1ST )
3RD ( STACK-DIAGRAM -- 1ST )

The item of data type STACK-DIAGRAM, that is kept on the stack throughout the creation of a stack diagram, contains the input or output parameter attribute of the parameter that will be compiled next. Since input parameters always come first, STACK-DIAGRAM starts with the input parameter attribute. Consequently, -- switches to output parameters by clearing the input parameter attribute and setting the output parameter attribute of STACK-DIAGRAM. But this is only the last action -- performs. Before doing that, -- makes sure that STACK-DIAGRAM really has the input parameter attribute, but no output parameter attribute and no prefix attributes. This is a simple syntax check for stack diagrams:

: BAD1 ( UNSIGNED -- 1ST -- FLAG )

: BAD1 ( UNSIGNED -- 1ST -- ? invalid stack diagram
COLON-DEFINITION MEMORY-SPACE FLAG STACK-DIAGRAM
: BAD2 ( DATA -> -- CHARACTER )

: BAD2 ( DATA -> -- ? invalid stack diagram
COLON-DEFINITION MEMORY-SPACE FLAG STACK-DIAGRAM

Please don't bother about all that stuff like COLON-DEFINITION and MEMORY-SPACE. It will be explained below. To ensure that any deferred input parameters are compiled before starting with the output parameters, -- uses the phrase

NULL DATA-TYPE (PARAM)

which simply flushes STACK-DIAGRAM without deferring a new input parameter. Here's the definition of --:

: -- ( STACK-DIAGRAM -- 1ST )
  DUP [ DT-OUTPUT DT-PREFIX OR ] LITERAL ATTRIBUTE?
  OVER DT-INPUT ATTRIBUTE? INVERT OR
  IF -262 THROW
  ELSE NULL DATA-TYPE (PARAM)
     [ DT-OUTPUT DT-INPUT OR ] LITERAL XOR
  THEN ;

-> works similar. It first does a syntax check and then changes the attributes of STACK-DIAGRAM. -> must always be preceded by a parameter that has been deferred and whose identifier is stored in STACK-DIAGRAM. Furthermore, -> must not be applied more than once to the same parameter:

: BAD3 ( LOGICAL -- -> 1ST )

: BAD3 ( LOGICAL -- -> ? invalid stack diagram
COLON-DEFINITION MEMORY-SPACE FLAG STACK-DIAGRAM
: BAD4 ( CCONST -> -> CHARACTER -- )

: BAD4 ( CCONST -> -> ? invalid stack diagram
COLON-DEFINITION MEMORY-SPACE FLAG STACK-DIAGRAM

If these conditions are met, -> simply sets the prefix attribute of the deferred parameter:

: -> ( STACK-DIAGRAM -- 1ST )
  DUP NULL? OVER DT-PREFIX ATTRIBUTE? OR
  IF -262 THROW
  ELSE DT-PREFIX OR
  THEN ;

TH is used in a stack diagram to create a reference to other data types within the same stack diagram. If n is an unsigned number, n TH specifies that the data type it represents should be exactly the same as the n-th basic data type of the stack diagram. 1ST, 2ND and 3RD are just specialised versions of TH that are more convenient to use and are more compliant to natural english language:

: 1ST ( STACK-DIAGRAM -- 1ST )
  1 TH ;
  
: 2ND ( STACK-DIAGRAM -- 1ST )
  2 TH ;
  
: 3RD ( STACK-DIAGRAM -- 1ST )
  3 TH ;

You've already seen quite a number of examples in which 1ST, 2ND, 3RD and TH have been used. However, since data type references belong to StrongForth's most basic concepts, let's go into details with some more examples:

! ( SINGLE DATA -> 1ST -- )

1ST refers to the first basic data type of !, which is SINGLE. If 1ST just means that this parameter should be identical to SINGLE, why can't you just write SINGLE instead? Well, during compilation of ! it actually makes no difference. But whenever the interpreter or the compiler parses the name !, it finds the above version of ! only if 1ST and SINGLE are really identical.

6153 VARIABLE N
 OK
CHAR $ N !

CHAR $ N ! ? undefined word
CHARACTER DATA -> UNSIGNED

Since CHARACTER and UNSIGNED are not identical, the interpreter does not find any version of ! that matches. In this particular case, it prevents storing a character into an unsigned number variable.

References parameters also work if the referenced parameters are compound parameters or tails of compound parameters. In the following stack diagram, 2ND refers to ADDRESS -> INTEGER.

DUMMY ( DATA -> ADDRESS -> INTEGER 2ND -- UNSIGNED )

Whether DUMMY is found by the interpreter, depends on the data types of the two items on top the stack. A few examples are shown in the following table.

next of stack top of stack match
DATA -> ADDRESS -> UNSIGNED ADDRESS -> UNSIGNED yes
DATA -> ADDRESS -> UNSIGNED ADDRESS -> CHARACTER no
DATA -> ADDRESS -> UNSIGNED ADDRESS -> LOGICAL no
DATA -> ADDRESS -> UNSIGNED CONST -> UNSIGNED no
DATA -> CONST -> UNSIGNED CONST -> UNSIGNED yes
DATA -> CONST CONST no
DATA -> CCODE -> SIGNED CCODE -> SIGNED yes
DATA -> CCODE -> SIGNED CCODE no
DATA -> DATA -> INTEGER DATA -> INTEGER yes

TH can also be used as an output parameter, but the reference must always be to an input parameter. Whenever a word containing a reference as an output parameter is executed or compiled, the output parameter assumes the data type of the actually referenced input parameter. Note that the referenced input parameter might be a compound data type or the tail of a compound data type. A typical application is

@ ( DATA -> SINGLE -- 2ND )

as used in the following example:

BASE .S
DATA -> UNSIGNED  OK
@ .S .
UNSIGNED 10  OK
DATA-SPACE
 OK
HERE CAST DATA -> CCONST -> CHARACTER @ .S
CCONST -> CHARACTER  OK

Now, let's have a look at the definition of TH:

: TH ( STACK-DIAGRAM UNSIGNED -- 1ST )
  SWAP NULL DATA-TYPE (PARAM) TUCK
  OFFSET OVER < OVER 0= OR
  IF DROP -261 THROW
  ELSE OVER OFFSET OVER 1- -
     HERE CAST DATA -> DATA-TYPE SWAP - @
     DUP [ DT-OFFSET DT-OUTPUT OR ] LITERAL ATTRIBUTE?
     IF DROP DROP -261 THROW
     ELSE NULL DATA-TYPE AND SWAP OFFSET+
        OVER [ DT-INPUT DT-OUTPUT OR ] LITERAL AND OR PARAM,
     THEN
  THEN ;

It's not as complicated as it looks. At the beginning, TH flushes any deferred data type that is kept in STACK-DIAGRAM. Next, the range of the index is checked. It should at least be 1 and must not be greater than the number of input parameters provided so far. Remember that the parameter count is stored in the offset attribute of STACK-DIAGRAM and can be obtained by OFFSET. If the index really points to a valid parameter, TH fetches this parameter. Another check is performed to make sure that it is an input parameter and that it is not a reference itself. The parameter TH finally compiles is constructed by taking the identifier of the referenced parameter, adding the index as offset attribute and copying the input or output parameter attribute from STACK-DIAGRAM. A reference parameter never has the prefix attribute, but the referenced parameter might have it.

NULL DATA-TYPE AND

clears all attributes of a data type (or a stack diagram) without changing it's identifier.

In ANS Forth, stack diagrams are nothing else but comments. It is good programming style to specify the stack diagram of each word in the source code, and most Forth programmers go conform with this rule. In StrongForth, stack diagrams are not comments, but the syntax is mostly identical to the usual form of stack diagrams in ANS Forth. Stack diagrams in StrongForth are enclosed in parenthesis as well, which means that the parenthesis have completely changed their semantics with respect to ANS Forth. StrongForth comments are always delimited by backslashes (\).

A stack diagram is syntactically initiated by a left parenthesis:

: ( ( -- MEMORY-SPACE FLAG STACK-DIAGRAM )
  SPACE@ STATE @ POSTPONE [ DT-INPUT CAST STACK-DIAGRAM
  LOCAL-SPACE ; IMMEDIATE

( is an immediate word, because it can be used during compilation. It enters interpretation state and makes the local name space the current memory space after remembering both the current memory space and the current state for later restoration by ). Stack diagrams are always at least temporarily created in the local name space. Finally, ( pushes STACK-DIAGRAM onto the stack, starting only with the input parameter attribute.

At the other end of a stack diagram, ) processes the stack diagram and restores the memory space and the state to what they were before the stack diagram. ) mostly relies of on two words, which enclose processing the stack diagram:

: <DIAGRAM ( FLAG STACK-DIAGRAM -- 2ND )
  DUP [ DT-INPUT DT-PREFIX OR ] LITERAL ATTRIBUTE? INVERT
  OVER DT-OUTPUT ATTRIBUTE? AND
  IF NULL DATA-TYPE (PARAM)
  ELSE -262 THROW
  THEN SWAP STATE ! ;
  
: DIAGRAM> ( MEMORY-SPACE STACK-DIAGRAM -- )
  LOCAL-SPACE OFFSET CELLS 2* NEGATE ALLOT SPACE! ;

<DIAGRAM ensures that STACK-DIAGRAM has the output parameter attribute, but neither the input parameter attribute nor the prefix attribute. Stack diagrams that do not contain --, or that end with ->, are invalid. <DIAGRAM It then flushes any deferred parameters of the stack diagram and restores the state.

After the stack diagram has been processed, DIAGRAM> deallocates it from the local name space and then restores the current memory space.

StrongForth actually provides two versions of ), a generic version for all words and a special version only for colon definitions: Let's begin with the generic version:

: ) ( MEMORY-SPACE FLAG STACK-DIAGRAM -- )
  <DIAGRAM LATEST #PARAMS
  IF -264 THROW
  ELSE ENCLOSE-DIAGRAM NAME-SPACE ?DO I @ , LOOP
  THEN TUCK DIAGRAM> END-DIAGRAM ;

The input parameters of ) come from (. ) starts processing the stack diagram by executing <DIAGRAM, and then verifies that the latest definition still doesn't have a stack diagram. #PARAMS, which will be presented in chapter 8, determines the length of the stack diagram. Next, the stack diagram is copied from the local name space to the name space, in order to become a permanent part of the new definition. DIAGRAM> ends processing the stack diagram. Finally, END-DIAGRAM stores the length of the stack diagram in the attribute field of the current definition. The attribute field is part of a definition's memory image, which will be presented in the next chapter. ENCLOSE-DIAGRAM calculates the loop limit and the loop index of the DO loop that copies the stack diagram:

: ENCLOSE-DIAGRAM ( STACK-DIAGRAM -- 1ST DATA -> DATA-TYPE DATA -> DATA-TYPE )
  DUP OFFSET HERE CAST DATA -> DATA-TYPE DUP ROT - ;

: and :NONAME both push a NULL item of data type COLON-DEFINITION onto the stack. COLON-DEFINITION is a direct subtype of DEFINITION. The generic version of ) would also match in colon words, if the interpreter wouldn't find the special version of ) first:

: ) ( COLON-DEFINITION MEMORY-SPACE FLAG STACK-DIAGRAM -- 1ST )
  ) DUP #PARAMS ALL-PARAMS>DT ;

In addition to the semantics of the generic version of ), the special version initialises the compiler data type heap with the input parameters of the colon definition's stack diagram. ALL-PARAMS>DT copies a definition's input parameters to the (compiler) data type heap and resolves all data type references in the input parameter list:

: ALL-PARAMS>DT ( DEFINITION UNSIGNED -- 1ST )
  0 ?DO DUP I PARAM@ DUP DT-OUTPUT ATTRIBUTE?
     IF DROP LEAVE THEN OVER SWAP PARAMS>DT
  LOOP ;

Using an index starting with 0, the ?DO loop iterates over all basic data types in the input parameter list of the colon definition. The loop terminates at the first output parameter or at the end of the stack diagram, if the definition has no output parameters. The parameter UNSIGNED, which is calculated by ), contains the total number of basic data types in the stack diagram of the definition.

PARAM@ returns a basic data type with a given index from the stack diagram of a definition. It doesn't matter whether the basic data type belongs to the input parameter list or the output parameter list. In the next chapter, you'll learn more about the memory image of definitions.

: PARAM@ ( DEFINITION UNSIGNED -- DATA-TYPE )
  SWAP CAST FAR-ADDRESS -> DATA-TYPE 1+ SWAP + @ ;

PARAMS>DT is a very interesting word. It pushes a data type onto the compiler data type heap while recursively resolving references to other input parameters:

: PARAMS>DT ( DEFINITION DATA-TYPE -- )
  DUP OFFSET
  IF OFFSET 1-
     BEGIN OVER OVER OVER SWAP PARAM@ RECURSE
        OVER OVER PARAM@ DT-PREFIX ATTRIBUTE?
     WHILE 1+
     REPEAT DROP DROP
  ELSE >DT DROP
  THEN ;

If the data type does not contain a reference, it is just pushed onto the compiler data type heap. Otherwise, the parameter at the referenced position substitutes the original parameter. Since the new parameter may be part of a compound data type, PARAMS>DT iterates through the following input parameters until it finds a data type with no prefix attribute. However, this data type might contain another reference. This is why PARAMS>DT has to be recursive.

As usual, recursive algorithms are rather difficult to understand. Let's try an example:

: EXAMPLE ( CDATA -> CHARACTER CCONST -> 2ND 3RD -- )
.S
CDATA -> CHARACTER CCONST -> CHARACTER CCONST -> CHARACTER

The first three input parameters are just copied sequentially onto the compiler data type heap. 2ND is the first reference. It is replaced by the parameter it references, which is just the basic data type CHARACTER. Recursion already stops at this point. 3RD is a reference to the compound data type CCONST -> 2ND. PARAMS>DT iterates through this compound data type by first pushing CCONST onto the compiler data type heap and then recursively solving the reference 2ND, which is again CHARACTER.

The definition of EXAMPLE can now start with the above list of data types on the compiler data type heap.


Dr. Stephan Becher - January 4th, 2008