Chapter 9: Data Types Revisited

Data Types in the Dictionary

Although data types were already subject of chapter 7, those aspects that are related to the representation of data types in the dictionary had to be postponed until now. For example, you didn't learn in chapter 7 how DT works and how data types are formatted for printing. But let's start with the definition of a new data type. Picking up the thread from chapter 7, you already know that a new data type is created as a child of an existing data type by applying the defining word PROCREATES:

DT <parent data type> PROCREATES <child data type>

From the definition of PROCREATES, which is repeated here for convenience, you can see that it compiles the identifier of the parent data type into the data field of the definition that represents the child data type.

: PROCREATES ( DATA-TYPE -- )
  CREATE SPLIT CONST, DROP
  DOES> ( STACK-DIAGRAM CONST -- 1ST )
  0 SWAP 1 CELLS - MERGE CAST DATA-TYPE (PARAM) ;

At compilation time, PROCREATES uses SPLIT to extract the identifier from the parent data type. At runtime, the associated data type is constructed from its data field address in order to be processed by (PARAM). The identifier of the data type is equal to the data field address minus one cell, which is actually the code field address or the token of the definition. Here's a summary of the important facts:

For converting a definition into the data type it is associated with, StrongForth provides two words: ?DATA-TYPE and DT.?DATA-TYPE expects an item of data type DEFINITION, and returns the associated data type:

: ?DATA-TYPE ( DEFINITION -- DATA-TYPE )
  >CODE DUP @ ['CODE] SINGLE =
  IF 0 SWAP MERGE CAST DATA-TYPE
  ELSE DROP NULL DATA-TYPE
  THEN ;

It first checks whether the definition has really been created by PROCREATES. This is true if the definition has the same code field as any other definition created by PROCREATES, for example SINGLE. ['CODE] is an immediate word that compiles the content of a definition's code field as a literal. The data type is then constructed by merging the code field address as data type identifier and null as data type attributes. If the definition has not been created by PROCREATES, ?DATA-TYPE just returns a null data type.

Whenever a literal data type is required, DT can be used. For example,

DT INTEGER

simply returns the data type INTEGER. DT parses the input source for the name of the data type and then looks this name up in the dictionary, considering only those definitions whose code field indicates that it is associated with a data type. If such a definition is found, it is converted to the data type using ?DATA-TYPE. Otherwise, an exception is thrown and DT returns a null data type. DT is actually a typical example of a dictionary search for words with a given code field (see chapter 8).

: DT ( -- DATA-TYPE )
  PARSE-WORD ['CODE] SINGLE CODE-FIELD SEARCH-ALL
  IF ?DATA-TYPE
  ELSE -260 THROW CAST DATA-TYPE
  THEN ;

?DEFINITION is the inverse operation with respect to ?DATA-TYPE. This means, it converts a data type to the associated definition:

: ?DEFINITION ( DATA-TYPE -- DEFINITION )
  SPLIT NIP CAST TOKEN SEARCH-TOKEN DROP ;

Since there's no direct link from a data-type to its associated definition, ?DEFINITION has to search the dictionary for the word with the execution token that identifies the data type. The data type identifier is extracted with SPLIT and then passed to SEARCH-TOKEN to find the associated definition.

The most obvious application for a data type to definition conversion is printing a data type. StrongForth provides an overloaded version of . for this purpose:

: . ( DATA-TYPE -- )
  ?DEFINITION NAME TYPE SPACE ;

. for items of data type DATA-TYPE is in turn used by .S. .S is just a loop that prints the contents of the data type heap, one basic data type at a time. If the prefix attribute of a basic data type is set, the current and the next basic data type belong to the same compound data type.

: .S ( -- )
  DTP@ DUP DEPTH -
  ?DO I @ DUP . DT-PREFIX ATTRIBUTE? IF ." -> " THEN
  LOOP ;

Family Relations

Except for SINGLE and DOUBLE, each data type has a parent data type. PROCREATES expects the parent of a new data type as its input parameter. Since the parent data type is stored in the data field of the definition that is associated with a data type, it's pretty easy to retrieve:

: PARENT ( DATA-TYPE -- 1ST )
  SPLIT DUP
  IF CAST CONST -> SINGLE 1+ @
  THEN MERGE CAST DATA-TYPE ;

PARENT extracts the code field address of the associated definition, and fetches the identifier of the parent data type from the data field. Data type attributes remain unchanged by PARENT. Since SINGLE and DOUBLE contain zero in the data fields of their associated definitions, PARENT returns a null data type when applied to SINGLE or DOUBLE. The parent data type of a null data type is also a null data type.

If PARENT is applied recursively, starting with any data type, we'll sooner or later end up with either SINGLE or DOUBLE, because all data types in StrongForth are direct or indirect subtypes of these two data types:

DT NUMBER-DOUBLE DUP .
NUMBER-DOUBLE  OK
PARENT DUP .
UNSIGNED-DOUBLE  OK
PARENT DUP .
INTEGER-DOUBLE  OK
PARENT .
DOUBLE  OK

This rule provides an easy means to find out whether a data type belongs to a single-cell or a double-cell item. ANCESTOR repeatedly applies PARENT in order to determine the ancestor of a data type. The ancestor is a data type that has no parent, like SINGLE and DOUBLE.

: ANCESTOR ( DATA-TYPE -- 1ST )
  BEGIN DUP PARENT NULL? INVERT
  WHILE PARENT
  REPEAT NULL DATA-TYPE AND ;

Note that the phrase NULL DATA-TYPE AND clears the data type identifier without affecting the attributes. Here are two examples:

DT CONST ANCESTOR .
SINGLE  OK
DT DOUBLE ANCESTOR .
DOUBLE  OK
NULL DATA-TYPE ANCESTOR .
 OK

The ancestor of data types SINGLE and DOUBLE are SINGLE and DOUBLE, respectively. The ancestor of a null data type is the null data type. ANCESTOR is being used by SIZE, which calculates the size of any data type:

: SIZE ( DATA-TYPE -- UNSIGNED )
  ANCESTOR SPLIT NIP CAST UNSIGNED DUP
  IF CAST CONST -> UNSIGNED 2 + @ THEN ;

The size information is only stored in the body of ancestors. It has to be compiled explicitly, like in the following examples you already know from chapter 7:

NULL DATA-TYPE PROCREATES ADAM 1 CONST,
NULL DATA-TYPE PROCREATES EVE 1 CONST,

Data types that are directly or indirectly derived from an ancestor inherit its size information. They have always the same size as their ancestor. Compiling size information for derived data types is obsolete. Note that for null data types, SIZE returns zero:

DT FLAG SIZE .
1  OK
DT UNSIGNED-DOUBLE SIZE .
2  OK
NULL DATA TYPE SIZE .
0  OK

A more convenient way to define an ancestor offers the following overloaded version of PROCREATES:

: PROCREATES ( UNSIGNED -- )
  NULL DATA-TYPE PROCREATES CONST, ;

However, since new ancestors are rarely being defined in a StrongForth application, this overloaded version is not included in StrongForth.

Data Type Conversions

A very interesting application of SIZE is the implementation of CAST. As you already know, CAST replaces the data type of the item on top of the data stack with any other data type:

TRUE .S
FLAG  OK
CAST SIGNED .S .
SIGNED -1  OK

Basically, CAST uses DT> to remove the original data type from the data type heap. Then, it obtains a new data type using DT and pushes it onto the data type heap with >DT. Since CAST is an immediate word, it works both in interpretation and in compilation state. DT> and >DT automatically apply to the data type heap that is associated with the current state.

But it's not as easy as that. Two more things have to be considered. First, the data type of the item on top of the data stack might be a compound data type. To remove it completely from the data type heap, DT> has to be applied repeatedly, if necessary. This task is done by DTDROP:

: DTDROP ( -- DATA-TYPE )
  BEGIN DT> WHILE DROP REPEAT ;

DTDROP returns the head of the removed compound data type, because this is required by CAST. Remember that DT> returns a TRUE flag if and only if the data type it pops from the data type heap belongs to the tail of a compound data type.

The second problem the implementation of CAST needs to take into account is the sizes of the old and the new data type. If, for example, the data type of a double-cell item is simply replaced with the data type of a single-cell item, data stack and data type heap would run out of synchronization. The next word that consumes the item on top of the stack would only remove one half of the double-cell item from the data stack, because it expects a single-cell item. To avoid this problem, size conversions have to be applied whenever the old and the new data type belong to items of different sizes. CAST compiles S>D when a single-cell item is being cast to a double-cell item, and it compiles D>S when a double-cell item is being cast to a single-cell item.

: CAST ( -- )
  DTDROP DT DUP >DT SWAP DUP >DT SIZE 10 * SWAP SIZE +
  CASE  0 OF              ENDOF
       11 OF              ENDOF
       12 OF POSTPONE S>D ENDOF
       21 OF POSTPONE D>S ENDOF
       22 OF              ENDOF
       -271 THROW
  ENDCASE DT> DROP DROP ; IMMEDIATE

Note that CAST temporarily adds the head of the removed (compound) data type back to the data type stack, in order to be able to compile the proper conversion word. But why does the CASE statement within the definition of CAST consider a case for the sizes of both the old and the new data type being zero? Until now, only data types with sizes of one and two cells have been presented. Size zero is reserved for so-called tuples, whose size cannot be determined at compile time. Obviously, CAST allows casting one tuple to another, but not a tuple to a single-cell or double-cell data type and vice versa. In chapter 11, tuples will be introduced in connection with the implementation of SAVE-INPUT and RESTORE-INPUT.

A simple, but often very useful companion of CAST, is NULL. It executes the single-cell constant 0 and then casts it to a given data type.

: NULL ( -- )
  POSTPONE 0 POSTPONE CAST ; IMMEDIATE

NULL can be used to create single-cell and double-cell items of any data type with a binary value of 0:

NULL FLAG .S .
FLAG FALSE  OK
NULL CHARACTER .S 65 + .
CHARACTER A OK
NULL FAR-ADDRESS .S .
FAR-ADDRESS 0  OK

With CAST, you can easily cast a data type to any other basic data type. But in many cases, you actually need a compound data type like CDATA -> CHARACTER. If, for example, you want to store an item of data type CHARACTER into the data space, a syntax like

DATA-SPACE HERE CAST CDATA -> CHARACTER !

would come very handy. And it really works this way! The reason is the existence of an overloaded version of ->, which appends a basic data type as a tail to the data type on top of the data type heap. Just like CAST, -> is an immediate word:

: -> ( -- )
  DT> DROP DT-PREFIX OR >DT DT >DT ; IMMEDIATE

It works similar to CAST, although it doesn't need to consider whether it deals with a basic or a compound data type, or whether a single-cell or a double-cell item is affected. -> just sets the prefix attribute of the topmost basic data type and then pushes another basic data type, which is obtained by DT, onto the data type heap. Note that -> can be applied independently of CAST, and that is can be applied repeatedly to construct compound data types consisting of more than two basic data types, like in the following fragment:

... >BODY -> DATA -> LOGICAL ...

This is a rather typical application of ->. -> is often used together with words like >BODY or CONST-HERE, because these words leave unspecific addresses on the data stack.


Dr. Stephan Becher - January 4th, 2008