Chapter 12: Constants And Variables

Global Constants And Variables

This section deals with the following ANS Forth defining words:

CONSTANT

CONSTANT has exactly the same semantics as specified by ANS Forth. Whenever the constant defined by CONSTANT is executed, it pushes a constant value onto the data stack. Is that all? Sure. But again, there's a small but significant difference between StrongForth and ANS Forth. In StrongForth, each constant has a data type, which is identical to the data type of the value that is provided to CONSTANT. To figure out how CONSTANT gets access to this data type, let's have a look at a small example:

42 CONSTANT ANSWER
 OK
LATEST .
ANSWER ( -- UNSIGNED )  OK

Immediately after interpreting 42, the topmost data type on the interpreter data type heap is UNSIGNED. Next, CONSTANT is about to be interpreted. The interpreter finds CONSTANT in the dictionary, updates the interpreter data type heap according to the stack diagram of CONSTANT and then calls the inner interpreter to execute CONSTANT. The stack diagram of CONSTANT indicates that this word expects an item of data type SINGLE on the stack. It doesn't not have any output parameters:

: CONSTANT ( SINGLE -- )
  <VALUE (CONSTANT) ['CODE] FALSE CONST,
  ROT CONST, VALUE> ;

Data type SINGLE, or UNSIGNED in our example, is removed from the interpreter data type heap before CONSTANT is being executed. Removing an item from a data type heap means that the heap pointer, which always points to the next free cell on the heap, is decremented. This means, the interpreter data type heap pointer now points to UNSIGNED. And this is where CONSTANT gets the data type from.

The definition of CONSTANT contains three new words: <VALUE, (CONSTANT) and VALUE>. <VALUE creates a new definition and prepares for adding the stack diagram of a constant or a variable. VALUE> is the counterpart of <VALUE, finalizing the new definition and restoring the memory space to what it was before the execution of <VALUE.

: <VALUE ( -- MEMORY-SPACE STACK-DIAGRAM )
  ?EXECUTE CONST-HERE (CREATE)
  SPACE@ NAME-SPACE NULL STACK-DIAGRAM ;

: VALUE> ( MEMORY-SPACE STACK-DIAGRAM -- )
  END-DIAGRAM SPACE! END-CODE ;

(CONSTANT) compiles the data type of a constant as the output parameter into the stack diagram of the new constant. A loop is used to cover the general case of compiling a compound data type consisting of more than one basic data type:

: (CONSTANT) ( STACK-DIAGRAM -- 1ST )
  DTP@
  BEGIN TUCK @ DT-OUTPUT OR PARAM,
     SWAP DUP @ DT-PREFIX ATTRIBUTE?
  WHILE 1+
  REPEAT DROP ;

CONSTANT compiles the machine code of the new constant, which is identical to the one of the predefined constant FALSE, into the code field. The actual value of the constant is store in the data field.

CONSTANT as in the above definition creates single-cell constants only, For double-cell constants, StrongForth provides an overloaded version of CONSTANT, which looks almost identical to the single-cell version:

: CONSTANT ( DOUBLE -- )
  <VALUE (CONSTANT) ['CODE] DT-INPUT CONST,
  ROT CONST, VALUE> ;

The only visible difference besides the stack diagrams is the machine code address, which is compiled into the code field of the new definition. But there are also invisible differences. The single-cell and double-cell versions use different overloaded versions of ROT and CONST,, because these are the words dealing with the input parameter (SINGLE or DOUBLE).

ANS Forth specifies a separate word 2CONSTANT for defining double-cell constants or a couple of single-cell constants. Similar to 2DUP, 2DROP, 2SWAP and 2ROT, 2CONSTANT does not exist in StrongForth. However, be aware that the overloaded double-cell version of CONSTANT only covers the double-cell semantics of 2CONSTANT, but not the couple of single-cells semantics.

VARIABLE

The ANS Forth word VARIABLE does not expect any parameters on the stack. It just allocates one cell of memory and creates a new definition that returns the address of this memory cell when executed. So, how does the StrongForth version of VARIABLE know about the data type of the variable to be created? Since you already know some of StrongForth's system variables and how they are defined, you probably know the answer.

BASE .S DROP
DATA -> UNSIGNED  OK
STATE .S DROP
DATA -> FLAG  OK

StrongForth's version of VARIABLE has a slightly different semantics than the ANS Forth version. It expects an input parameter, whose data type becomes the data type of the variable and whose value initialize the variable:

1973 VARIABLE YEAR
 OK
YEAR .S
DATA -> UNSIGNED  OK
@ .S .
UNSIGNED 1973  OK

The definition of VARIABLE itself is similar to the definition of CONSTANT.

: VARIABLE ( SINGLE -- )
  <VALUE (VARIABLE) ['CODE] BASE CONST,
  DATA-SPACE HERE CONST, ROT , VALUE> ;

Let's just discuss the differences. First, VARIABLE uses (VARIABLE) instead of (CONSTANT) to create the stack diagram of the new definition. (VARIABLE) compiles the data type of the initializer, which is preceded by the prefix DATA -> in order to add one level of indirection.:

: (VARIABLE) ( STACK-DIAGRAM -- 1ST )
  [ DT DATA DT-OUTPUT DT-PREFIX OR OR ] LITERAL PARAM,
  (CONSTANT) ;

The second difference between VARIABLE and CONSTANT is that VARIABLE stores the value of the initalizer in the data space and the variable's address in the data field. The reason why the variable's value is not directly stored in the data field should be obvious. The data field is located in the constant data space. In an embedded system, the constant data space is typically mapped to some kind of read-only memory, which cannot be written to at runtime. As a consequence, the variable's value would be frozen. In contrast to variables, constants and addresses of variables can savely be stored in the constant data space, because they are not changed at runtime.

The third and final thing that catches the eye is the code field. VARIABLE compiles the code field of the variable BASE instead of the code field of the constant FALSE. What's the difference? There is none! BASE actually has the same code field as FALSE. At runtime, both constants and variables push the contents of their data field onto the data stack. For constants, it is the value of the constant, while for variables, it is the address of the variable.

Of course, StrongForth provides an overloaded version of VARIABLE for double-cell variables. Just as with CONSTANT, there's almost no difference between the single-cell and the double-cell versions. The double-cell version of VARIABLE even compiles the same code field like the single-cell version, because addresses of single-cell and double-cell variables do not differ at all. The only visible difference between the definitions of the two versions of VARIABLE is in the stack diagrams:

: VARIABLE ( DOUBLE -- )
  <VALUE (VARIABLE) ['CODE] BASE CONST,
  DATA-SPACE HERE CONST, ROT , VALUE> ;

VALUE

A value is a variable that is used like a constant. To see the differences between values, constants and variables, let's compare the definition of VALUE with the definitions of CONSTANT and VARIABLE:

: VALUE ( SINGLE -- )
  <VALUE (CONSTANT) ['CODE] SOURCE-ID CONST,
  DATA-SPACE HERE CONST, ROT , VALUE> ;

Because a value has the same stack diagram like a constant, it compiles the stack diagram of the new definition with (CONSTANT). On the other hand, to ensure that a value can be changed at runtime, it has to be stored in the data space, just like a variable. The data field of the definition contains the address of the value. The code field is the same as the one of SOURCE-ID, which is actually a predefined value.

The overloaded version of VALUE for double-cell values is almost identical. The only visible difference to the version for single-cell values is in the compilation of the code field. LATEST is a typical double-cell value that is predefined by StrongForth.

: VALUE ( DOUBLE -- )
  <VALUE (CONSTANT) ['CODE] LATEST CONST,
  DATA-SPACE HERE CONST, ROT , VALUE> ;

Locals

The Local Dictionary

Locals have, other than global constants, variables and values, a strictly limited lifetime. They are defined within the definition of another word, and are abandoned when the definition of this word is finished. On the other hand, they can be used just like any word that is being compiled into a definition.

StrongForth keeps locals in the dedicated local word set, which resides in the local name space. The local name space is located in the DATA memory area, because it needs to be re-written for each new definition that uses locals. Besides the local word set, the local name space contains other temporary data structures, which will be presented later. As explained in chapter 10, both : and :NONAME execute LOCALS! to initialize the local name space at the beginning of a new colon definition.

The memory image of a local is very similar to the one of an ordinary word. However, the local word set only occupies memory in the local name space, whereas the (global) word sets are spread over the name space, the constant data space and the code space. The memory image of a local within the local name space looks like this:


name field

link field
attribute field
data field

output parameter field

The name field and the link field are the same as for a (global) word in the name space. But where are the token and input parameter fields? The token field is not required, because all locals have the same compilation and execution semantics. They don't need a code field. The place of the token field is actually occupied by the data field, which specifies where to find the value of a local at execution time. Since the code field is omitted and the data field is now located in the local name space, locals don't occupy memory in the constant data space and in the code space. Finally, the output parameter field contains the compound data type of the local. Input parameters do not exist for locals.

At execution time, the values of locals are kept on the return stack. They are accessed relatively to the return stack pointer,

RP@ offset +

where the positive offset is calculated at compilation time as

#LOCALS @ index - CELLS

#LOCALS is a system variable that contains the number of cells that will be occupied by locals on the return stack at runtime. The content of the data field is the index that is subtracted from this value in order to calculate the return stack offset. As an example, let's consider a definition with two locals:

: EXAMPLE ( DOUBLE SINGLE -- ... )
  LOCALS| S D | ... S ... D ... ;

At runtime, S and D are located on the return stack, with offsets 2 CELLS and 0 CELLS from the return stack pointer, respectively. At compilation time, #LOCALS is 3, because the two locals occupy three cells on the return stack. The data field of the dictionary entry for S contains 1, while the data field of the dictionary entry for D contains 3.

Creating, Finding and Forgetting Locals

StrongForth provides the dedicated word CREATE-LOCAL to create a local in the local name space:

: CREATE-LOCAL ( CDATA -> CHARACTER UNSIGNED INTEGER -- )
  ?COMPILE #LOCALS +!
  SPACE@ LOCAL-SPACE ROT ROT
  HERE ROT ROT ", ALIGN LOCAL-WORDLIST LINK DUP @ , !
  DTP@ DUP DT+ SWAP OVER OVER - , #LOCALS @ ,
  DO I @ DT-OUTPUT OR , LOOP SPACE! ;

Other than CREATE, which obtains the name of the new definition by parsing the input source, does CREATE-LOCAL expect the name of the local as a character string on the data stack. The name field is created in the same way as for ordinary words, using ", ALIGN. The pointer to the previous local is compiled into the link field, and the pointer to the latest local is being updated. As long as locals are being defined, #LOCALS counts the number of cells occupied by the locals defined so far. CREATE-LOCAL adds its parameter INTEGER, which is the size of local to be created in cells, to the system variable #LOCALS. The updated value of #LOCALS is then compiled into the data field. In the above example, S is the first local, occupying one cell. #LOCALS is 1. D occupies two cells, which are added to the one cell occupied by S. #LOCALS is now 3.

Finally, CREATE-LOCAL compiles the compound data type of the new local into the output parameter field. It is assumed that the data type has been popped from the compiler data type heap immediately before CREATE-LOCAL is executed. Therefore, the data type heap pointer now points to the data type of the local. DT+ traverses a compound data type in order to determine its length:

: DT+ ( DATA -> DATA-TYPE -- 1ST )
  BEGIN DUP 1+ SWAP @ DT-PREFIX ATTRIBUTE? INVERT
  UNTIL ;

Note that CREATE-LOCAL creates a complete local. Nothing else is required. This is different from CREATE, which leaves some fields of the new word's memory image to be filled later. For example, it is not necessary to first store a temporary value into the link field and then update this value after the definition is finished.

After all locals have been created, they can be found by SEARCH-LOCAL. This word is an application of SEARCH that is specialized for searching the local word set:

: SEARCH-LOCAL ( CDATA -> CHARACTER UNSIGNED -- DATA -> DATA-TYPE SIGNED )
  FALSE CODE-FIELD LOCAL-WORDLIST SEARCH
  IF CAST DATA -> DATA-TYPE DUP 1+ SWAP CAST DATA -> SIGNED 1+ @
  ELSE CAST DATA -> DATA-TYPE +0
  THEN ;

It expects the name of the local as a character string on the data stack. No other input parameters are required. If a local with the given name exists in the local dictionary, SEARCH-LOCAL returns a pointer to its output parameter field, and the content of its data field as a signed number. If a local with the given name cannot be found in the local dictionary, SEARCH-LOCAL returns a null pointer and 0 as the index.

StrongForth provides a means to remove locals from the local dictionary before the compilation is done. Is this really necessary? Locals defined by LOCALS| exist until the end of the definition, so there's no need to explicitely remove them. But there are other kinds of locals, whose scope is even smaller than those of usual locals. Those are the ANS Forth words I, J and R@. In StrongForth, these words do not exist in the dictionary. Instead, they are defined as locals at compilation time. I, and possibly J, are locals that are created by DO and ?DO, and removed by LOOP and +LOOP. They do only exist during the compilation of a DO loop. Similarly, R@ is created by >R, and removed by R>. The scope of R@ is the compiled code between >R and R>. Details on the compilation of loops and about using the return stack as temporary data storage will be given later.

FORGET-LOCAL removes the most recently defined local from the local dictionary:

: FORGET-LOCAL ( INTEGER -- )
  ?COMPILE NEGATE #LOCALS +!
  SPACE@ LOCAL-SPACE LOCAL-WORDLIST LINK @ HERE - ALLOT
  HERE DUP CAST CDATA -> UNSIGNED @ 1+ + ALIGNED
  CAST DATA -> ADDRESS @ LOCAL-WORDLIST LINK ! SPACE! ;
The size in cells of the local is expected on the data stack. This parameter is used to update #LOCALS. Next, FORGET-LOCAL deallocates the space occupied by the local. And finally, the pointer to the latest local has to be updated as well. This is a little bit tricky, because FORGET-LOCAL has to calculate the address of the link field of the removed local by traversing its name field. Remember that LINK returns a pointer to the name field of the latest entry in the specified word list.

Defining Locals

In order to define locals during the compilation of a new word, ANS Forth specifies a well-defined process. The ANS Forth word (LOCAL) shall send a message to the system for each local to be defined, and an additional last local message after the last local. Implementing this process obviously requires that the system knows whether the last local message has already been sent. StrongForth keeps this status information in the system variable #LOCALS:

#LOCALS Semantics
zero No locals defined yet.
negative First local has been defined, but last local message has not yet been sent. The absolute value of #LOCALS is the total number of cells that have been allocated for locals so far.
positive All locals have been defined and the last local message has been sent. The value of #LOCALS is the total number of cells that have been allocated for locals.

The definition of (LOCAL) in based on the status information contained in #LOCALS:

: (LOCAL) ( CDATA -> CHARACTER UNSIGNED -- )
  ?COMPILE DUP
  IF #LOCALS @ NEGATE 0<
     IF DROP DROP -263 THROW
     ELSE POSTPONE (>R) DTP@ @ SIZE NEGATE CREATE-LOCAL
     THEN
  ELSE DROP DROP #LOCALS @ NEGATE #LOCALS !
  THEN ;

First, let's see what happens if (LOCAL) is executed with a valid string as input parameter. If the last local message has already been sent, (LOCAL) throws an exception. Otherwise, it defines a new local by compiling code to push the value of the local onto the return stack, and by executing CREATE-LOCAL in order to add the local to the local dictionary and to increment the (negative) value of #LOCALS by the number of cells to be allocated on the return stack.

(>R) is the runtime code of (LOCAL). It takes the item on top of the data stack and pushes it onto the return stack. As usual, StrongForth provides two overloaded versions for single-cell and double-cell items:

(>R) ( SINGLE -- )
(>R) ( DOUBLE -- )

The second case for (LOCAL) is when it is executed with a null string as the input parameter. This is the last local message. (LOCAL) just changes the status information by negating the value of #LOCALS. From now on until the end of the definition, the value of #LOCALS is positive.

(LOCAL) is just the system's low-level interface to defining locals. The high-level interface specified by ANS Forth is the immediate word LOCALS|. LOCALS| is typically executed somewhere at the beginning of a new definition, using the syntax

: name ... LOCALS| name-1 name-2 ... name-n | ... ;

The semantics of LOCALS| is to parse the input source and to execute (LOCAL) for each word up to but excluding |. Finally, it sends the last local message by executing (LOCAL) with a null string as input parameter. An expection is thrown if the list of locals is not terminated within the parse area. Here's the definition of LOCALS|:

: LOCALS| ( COLON-DEFINITION -- 1ST )
  NULL CDATA -> CHARACTER 0 (LOCAL)
  BEGIN PARSE-WORD OVER OVER " |" COMPARE
  WHILE DUP 0= IF -263 THROW THEN (LOCAL)
  REPEAT 1- (LOCAL) ; IMMEDIATE

Other than specified by ANS Forth, StrongForth allows LOCALS| to be used multiple times within the same definition. This makes it possible to define sets of locals at different locations within a definition. However, locals cannot be defined within a conditional clause, within a loop, or between >R and R>. These rules are enforced by the dummy parameter COLON-DEFINITION, which is returned unchanged as 1ST. Therefore,

: VALID ( UNSIGNED -- SIGNED )
  LOCALS| A | -45 LOCALS| B | B A + ;

is a valid StrongForth definition and works as expected, whereas

: INVALID ( UNSIGNED FLAG -- )
  IF LOCALS| A | ... \ not allowed

will be rejected.

Compiling Locals

In the previous section, you've seen how locals are defined. Now, what needs to be done if a local is actually used during compilation? Let's have a look on a simple definition that uses locals. MSWAP swaps the contents of two memory cells:

: MSWAP ( DATA -> SINGLE 1ST -- )
  LOCALS| ADDR1 ADDR2 | ADDR1 @ ADDR2 @ ADDR1 ! ADDR2 ! ;
You might be tempted to say that it's probably much easier to implement the semantics of MSWAP by using simple stack movements like DUP, ROT and SWAP, but you'll be surprised how complicated things get if you try to avoid locals in this case. The solution with locals is more straight-forward and much easier to read.

Next, we'll use SEE to see the virtual machine code that has been compiled by the above definition of MSWAP:

SEE MSWAP
: MSWAP ( DATA -> SINGLE 1ST -- )
  (>R) (>R) (R@) 2 @ (R@) 0 @ (R@) 2 ! (R@) 0 ! ;  OK

The first two tokens are compiled by LOCALS|. They push the two input parameters onto the return stack at runtime. From the rest of the virtual machine code, it's easy to see that ADDR1 is compiled into (R@) 2 and ADDR2 is compiled into (R@) 0. (R@) actually fetches one cell from an address relative to the return stack pointer, and pushes it onto the data stack. The offset in address units is a parameter to (R@), which is stored in the virtual machine code. It is calculated during compilation time as follows, with index being the absolute value of the local's data field:

#LOCALS index - CELLS

In the example of MSWAP, #LOCALS is 2, while the data fields of ADDR1 and ADDR2 contain -1 and -2, respectively. The return stack pointer directly points to ADDR2, i. e., with zero offset. Since ADDR1 was pushed onto the return stack before ADDR2, its offset is positive. On a 16-bit machine, an offset of one cell is the same as two address units.

To compile locals, StrongForth provides the word LOCAL,. LOCAL, expects the output of SEARCH-LOCAL ABS on the data stack, which are a pointer to the compound data type of the local, and the absolute value of the local's data field:

: LOCAL, ( DATA -> DATA-TYPE SIGNED -- )
  ?COMPILE #LOCALS @ SWAP - DUP 0<
   IF DROP DROP -263 THROW
   ELSE OVER @ SIZE
      CASE 1 OF ['TOKEN] (R@)  ENDOF
           2 OF ['TOKEN] (DR@) ENDOF
      -271 THROW ['TOKEN] NOOP SWAP ENDCASE
      CONST, CELLS CONST, @>DT
   THEN ;

An exception is thrown if the calculated return stack offset is a negative value for some reason. For example, this might happen if LOCAL, is executed before the last local message has been sent. If the local is a single-cell value, the token of (R@) is be compiled as the virtual machine code. For a double-cell local, the token of (DR@) needs to be compiled. The semantics of (DR@) is similar to the one of (R@). Instead of fetching only one cell from the return stack, (DR@) fetches two cells from the return stack and pushes them onto the data stack. (R@) and (DR@) are low-level words that should normally not be used directly in a definition:

(R@) ( -- SINGLE )
(DR@) ( -- DOUBLE )

After compiling the virtual machine code, consisting of the tokens of either (R@) and (DR@) plus the calculated return stack offset, LOCAL, adds the local's compound data type to the compiler data type heap. That's all.

Changing The Value Of Locals

ANS Forth specifies the word TO to change locals and values after they have been initialized. Let's first investigate how locals are being changed by looking at the virtual machine code of an example word:

: EXAMPLE ( UNSIGNED -- 1ST )
  3 + LOCALS| X | X 8 TO X X + ;
 OK
2 EXAMPLE .
13  OK
SEE EXAMPLE
: EXAMPLE ( UNSIGNED -- 1ST )
  3 + (>R) (R@) 0 8 (R) 0 ! (R@) 0 + ;  OK

You can see that TO actually compiles (R) 0 ! as virtual machine code. Just like with (R@), the constant 0 following (R) is a parameter indicating the return stack offset of the local to be changed. Again, (R) is a low-level word that shouldn't be used directly within a definition. It simply returns the address of the local:

(R) ( -- DATA )

TO compiles ! to store the value in the location on the return stack that is identified by this address:

: TO ( -- )
  POSTPONE ADDR POSTPONE ! ; IMMEDIATE

By using !, StrongForth prevents that items can be stored in a local, which do not have exactly the same data type. In order for this to work correctly, the data type of the address returned by (R) is not specific enough. ! needs the address of an item whose data type is the one of the local. That's what ADDR takes care about:

: ADDR ( -- )
  PARSE-WORD OVER OVER SEARCH-LOCAL DUP
  IF ?COMPILE POSTPONE (R) DT> DROP DT-PREFIX OR >DT SWAP @>DT
     #LOCALS @ SWAP ABS - CELLS CONST, DROP DROP
  ELSE DROP DROP ?VALUE [ DT DATA DT-PREFIX OR ] LITERAL >DT
     DUP [ NULL DATA-TYPE 1 OFFSET+ ] LITERAL PARAMS>DT
     >BODY -> DATA @ STATE @
     IF LITERAL, ELSE ( DATA -- )CAST THEN
  THEN ; IMMEDIATE

ADDR is used in the same way as TO to calculate the address of a local or a value. It starts parsing the input source for the name of a local or a value, and then tries to find this name in the local dictionary. If a local with this name exists, ADDR asserts that the system is in compilation state. It compiles (R) and replaces the unspecified address DATA with the compound data type of the local's address on the compiler data type heap. Finally, the return stack offset is calculated and compiled as virtual machine code.

The ELSE branch of the definition of ADDR deals with values. If the name parsed by ADDR does not belong to a local, ADDR searches the (global) dictionary for a value with the given name. ?VALUE returns its definition, or throws an exception if no value with the given name exists. ADDR then executes or compiles the value's address as a literal with the proper data type. which is

DATA -> type

where type is the value's data type. ADDR adds the head and the tail of the compound data type to the data type heap. Remember that PARAMS>DT adds a complete compound data type to the data type heap, starting at the given position of a definition's stack diagram. In this case, it's the definition of the value, and the data type is the data type of the value's one and only parameter, starting at the first position of the stack diagram. The actual address is in the data field of the value's definition. In compilation state, the address is compiled into the virtual machine code, while in interpretation state it is simply left on the data stack. The phrase

STATE @ IF LITERAL, ELSE ( DATA -- )CAST THEN

is similar to the one in the definition of INTERPRET. However, ADDR does not need to distinguish between single-cell and double-cell literals, because addresses of data type DATA always occupy only one cell. Here's an example illustrating the usage of ADDR and TO in context with values:

0 VALUE COUNTER
 OK
: INCREMENT ( -- )
  COUNTER 1+ TO COUNTER ;
 OK
INCREMENT INCREMENT COUNTER .
2  OK
SEE INCREMENT
: INCREMENT ( -- )
  COUNTER 1+ 2972 ! ;  OK
ADDR COUNTER .S
DATA -> UNSIGNED  OK
DUP @ . 7 SWAP ! COUNTER .
2 7  OK

2972 is the address of the memory cell where the content of COUNTER is stored.

Now, what about ?VALUE? This word's definition is pretty much straight-forward, except for the fact that it's deferred:

DEFER ?VALUE ( CDATA -> CHARACTER UNSIGNED -- DEFINITION )

:NONAME ( CDATA -> CHARACTER UNSIGNED -- DEFINITION )
  LOCALS| COUNT ADDR |
  ADDR COUNT ['CODE] SOURCE-ID CODE-FIELD SEARCH-ALL
  IF EXIT THEN DROP
  ADDR COUNT ['CODE] LATEST CODE-FIELD SEARCH-ALL
  IF EXIT THEN -32 THROW ; IS ?VALUE

In order to find only values, ?VALUE selects a matching critera for SEARCH-ALL that matches only words with the same code field as the predefined single-cell value SOURCE-ID or the predefined double-cell value LATEST. ?VALUE is a deferred definition, because its semantics will later be extended to find floating-point values as well. With an updated version of ?VALUE, ADDR and TO will work flawlessly with floating-point values without needing to be changed.

Since TO gets the compound data type of a local or a value from ADDR, all it needs to do is compiling or executing ! in order to complete it's operation. With neglectible effort, it is even possible to define another word that is pretty similar to TO and has the same syntax:

: +TO ( -- ) 
  POSTPONE ADDR POSTPONE +! ; IMMEDIATE

+TO adds a number to a value or a local, provided a suitable overloaded version of +! exists:

COUNTER .
7  OK
8 +TO COUNTER
 OK
COUNTER .
15  OK

R@ - An Implicit Local

ANS Forth specifies the words >R, R@ and R> to transfer single-cell items between the data stack and the return stack. The stack diagrams contain everything you need to know about the semantics:

>R ( x -- ) ( R:  -- x )
R> ( -- x ) ( R:  x -- )
R@ ( -- x ) ( R:  x -- x )
Now, let's see what StrongForth has to offer:
WORDS >R
>R ( -- R-SIZE )
 OK
WORDS R>
R> ( R-SIZE -- )
 OK
WORDS R@
 OK

Oops. What's that? >R and R> seem have a totally different stack diagram with a strange data type called R-SIZE. And R@ doesn't even seem to exist! This must be a mistake.

Well, by now you should be used to the fact that StrongForth offers some surprises. What is really happening here? Let's assume StrongForth had R@ in it's dictionary. What would be the stack diagram? That's difficult to say, because it depends on what data type >R has actually pushed onto the return stack. In the definition

: X1 ... 8753 >R ... R@ ... R> ... ;

both R@ and R> are expected to have the stack diagram ( -- UNSIGNED ), while in

: X2 ... PAD >R ... R@ ... R> ... ;

the stack diagrams of R@ and R> should be ( -- CDATA -> CHARACTER ). Therefore, it makes no sense for StrongForth to provide a pre-defined version of R@ in the dictionary. The easiest way to solve this problem is to make R@ a local. Each local has a stack diagram that is specified at the definition of the local. In this case, >R defines R@ as a local, and R> removes the most recent local from the local dictionary. And that's exactly how it works. Both >R and R> are immediate words:

DT SINGLE PROCREATES R-SIZE

: >R ( -- R-SIZE )
  ?COMPILE POSTPONE (>R)
  DTP@ @ SIZE DUP " R@" TRANSIENT ROT CREATE-LOCAL
  CAST R-SIZE ; IMMEDIATE

: R> ( R-SIZE -- )
  ?COMPILE POSTPONE R@ CAST SIGNED DUP FORGET-LOCAL
  CASE +1 OF POSTPONE (RDROP)  ENDOF
       +2 OF POSTPONE (DRDROP) ENDOF
  -271 THROW ENDCASE ; IMMEDIATE

>R starts with compiling (>R) in order to push a single-cell or double-cell item onto the return stack. You already know the two overloaded versions of (>R) as the runtime code of (LOCAL). CREATE-LOCAL creates a local with the name R@ and with the same compound data type as the item that has been taken from the data stack. Furthermore, CREATE-LOCAL increments #LOCALS by 1 or 2, depending on whether the item that has just been pushed onto the return stack occupies one or two cells. Note that after compiling (>R), the compiler data type heap pointer still points to the data type of (>R)'s input parameter.

An interesting detail is the fact that the index of the local R@ is positive, whereas locals defined by (LOCAL) always have negative index values. This allows distinguishing R@ and other special locals from ordinary locals. On the other hand, you need to ensure that every calculation of the return stack offset, like the one in TO, uses the absolute value of the index.

>R returns the size of the local in cells as an item of data type RSIZE. RSIZE is a special data type to be used only by >R and R>. Using a dedicated data type ensures that >R and R> are always used in pairs and with proper nesting. Syntax violations like

... R> ... >R ... ;

or

... IF ... >R ... THEN ... R> ... ;

are simply rejected by the compiler. On the other hand, the requirement for pairwise use of >R and R> prevents usages like

... IF ... >R ... ELSE ... >R ... THEN ... R> ... ;

or

... >R ... IF ... R> ... ELSE ... R> ... THEN ... ;

which would be allowed in ANS Forth. Note that this strict syntax does not apply to R@. R@ is a local whose scope is between >R and R>, independently of the control structure inbetween. This means for example, that you can use R@ within a DO loop even if >R and R> are outside of the loop:

... >R ... DO ... R@ ... LOOP ... R> ...

Another interesting consequence of R@ being a local is the fact that TO and +TO can be applied to it the same way as with all other locals:

: TEST ( UNSIGNED -- )
  >R R@ . R@ 2* TO R@ R> . ;
 OK
7 TEST
7 14  OK

However, you should be careful using R@ in such a way, because this is not ANS Forth compliant. Always keep in mind that >R and R> are immediate words that build a control structure, just like IF ... ELSE ... THEN, BEGIN ... UNTIL and DO ... LOOP, and that R@ is a local that is being defined by >R, and that is being removed by R>.

Finally, let's see how R> works. R> compiles R@, removes this local from the local dictionary, and then compiles virtual code to clean up the return stack. (RDROP) and (DRDROP) drop a single and a double cell from the return stack, respectively:

(RDROP) ( -- )
(DRDROP) ( -- )
But why is the virtual code splitted into R@ and (RDROP) or (DRDROP)? Why doesn't StrongForth provide a single low-level word called (R>) for this purpose? The reason is that only R@ contains the information on the data type and the return stack offset that is needed for R>. (RDROP) and (DRDROP) just provide the additional semantics of R> with respect to R@. Splitting the semantics into two low-level words is the easiest way to go. You can try to implement a more efficient version of R>, for example by defining something like (R>) and (DR>) as machine code words and patching the compiled token of R@ with their tokens. It would definitely be an interesting exercise.


Dr. Stephan Becher - February 8th, 2008