Chapter 2: Memory Access

Variables

For doing things with items on the stack, you normally don't need to bother about the memory locations of these items. Even if the stack is located in memory, accessing items is almost completely handled by the system. However, as soon as you want to write one or more items to any other place in memory than the stack, and later want to read it from the same location, you surely need the address of this memory location.

How can you get a memory address? Well, the easiest way is to define a variable:

VARIABLE LONGITUDE

VARIABLE ? undefined word

What happened? This is supposed to work in ANS Forth. Doesn't StrongForth provide variables? Yes, of course it does. But since StrongForth is a strongly typed language, you need to specify the data type of the variable. And this is easily done by supplying an initialiser for each variable:

+49 VARIABLE LONGITUDE
 OK

VARIABLE expects an item of any data type (i. e., SINGLE or DOUBLE), on the stack, which explains the error message we got on the first try, where VARIABLE was applied to an empty stack. Note that StrongForth always issues ? undefined word if it doesn't find a word in the vocabulary that matches the name just parsed and whose stack diagram (left side) fits to the items on the stack. If a word with the correct name exists, but it can not be applied because its expected parameters are not on the stack, the word is ignored.

The parameter to VARIABLE, +49, has two meanings. First, it specifies the data type of the variable, which is SIGNED in this example. Second, it initialises the variable to the given value:

LONGITUDE @ .S
SIGNED  OK
.
49  OK

But what is the data type of the address of the variable? Let's see:

LONGITUDE .S
DATA -> SIGNED  OK

Oops, is this really a data type? It is. LONGITUDE is an address of data type DATA that points to an item of data type SIGNED. Whenever two or more basic data types like DATA and SIGNED are combined by the symbol ->, the result is called a compound data type. Compound data types are mostly used for addresses, because an address is somehow incomplete if the information about what kind of item it points to is not available. If LONGITUDE would just push something of data type DATA onto the stack, @ wouldn't know the data type of whatever it should fetch from the address. You will later see that several overloaded versions of @ exist, and that each of these versions expect an address on the stack that has a compound data type similar to DATA -> SIGNED.

But why is the address of a signed number not ADDRESS -> SIGNED? As explained in the next section,ADDRESS is a data type with several subtypes that each describe a different area of memory. And variables are always stored in a memory area called DATA.

Memory Areas and Address Data Types

ANS Forth does not distinguish between different kinds of addresses. An address is an address, and any special meaning, if required, comes along with the word reading from or writing to memory. In contrast to that, StrongForth knows several kinds of addresses. Let's have a look at the data types that semantically are addresses:

ADDRESS
   DATA
   CONST
   CODE
   PORT
   CADDRESS
      CDATA
      CCONST
      CCODE
      CPORT

ADDRESS is a direct subtype of SINGLE. All other addresses are direct or indirect subtypes of ADDRESS. You will normally not directly deal with items of data type ADDRESS, because this data type is not specific enough to access a memory location. What does this mean? Well, in an embedded system you typically have different areas of memory.

First, there is RAM (random access memory), which is a part of the totally available memory that can be read from and written to. All non-constant data items are stored in RAM, including variables, the data stack and the return stack. Addresses of memory locations in RAM have the data type DATA, which is a direct subtype of ADDRESS.

Next, an embedded system always has at least one area of ROM (read only memory). As the name reveals, it is not possible to write data into this memory area. It is rather used to permanently store the program code of the embedded application as well as all constant data items. Because code and constant data may reside in different areas of ROM, and because the processor might use different instructions or addressing modes to access code and data, StrongForth provides two separate address data types for accessing ROM: CONST to access constant data and CODE to access the program code. Again, both are direct subtypes of ADDRESS.

Finally, each embedded system has some peripheral registers, which are accessed by reading from and writing to I/O addresses. Some processors use so-called memory-mapped I/O, which means that the contents of peripheral registers are accessed by normal memory access instructions. Other processors use a separate address space for accessing peripheral registers and provide specific input and output instructions to read from and write to these registers. Anyway, each peripheral register has an address, and in StrongForth this address has the data type PORT.

From the data type of an address, which might be either DATA, CONST, CODE or PORT, StrongForth knows which memory area is to be accessed. Although not all processors require this distinction, it is pretty handy on systems that don't have a linear address space or that don't have memory-mapped I/O. If the compiler knows the type of memory, it can easily select the required memory access instructions and/or addressing modes.

Now, what about the data type CADDRESS and it's subtypes CDATA, CCONST, CCODE and CPORT? These are addresses of memory locations that store data items in character size instead of in cell size. In ANS-Forth, you need to use words like C! and C@ to make sure to read or write a character size item. This is not necessary in StrongForth, because the compiler is able to find out whether it should access a cell size item or a character size item.

This is an important point. The information about what is stored at a memory location, like the data type and the size (character, single cell or double cell) of the item, is an attribute of the memory location itself. You, the programmer, need not take care about whether to use 2!, C!, P! (port write) or whatever instead of !, because the compiler will always select the semantically correct word for you. Consequently, ! is a heavily overloaded word:

WORDS !
! ( SINGLE CFAR-ADDRESS -> 1ST -- )
! ( DOUBLE FAR-ADDRESS -> 1ST -- )
! ( SINGLE FAR-ADDRESS -> 1ST -- )
! ( SINGLE CCODE -> 1ST -- )
! ( DOUBLE CODE -> 1ST -- )
! ( SINGLE CODE -> 1ST -- )
! ( SINGLE CCONST -> 1ST -- )
! ( DOUBLE CONST -> 1ST -- )
! ( SINGLE CONST -> 1ST -- )
! ( SINGLE CDATA -> 1ST -- )
! ( DOUBLE DATA -> 1ST -- )
! ( SINGLE DATA -> 1ST -- )
 OK

@ is even more overloaded. But before going into detail with ! and @, have a closer look at the stack diagrams of !. There is no overloaded version of ! that directly applies to addresses of data type ADDRESS, because ADDRESS is not associated with a specific memory area. For each one of the memory areas, StrongForth provides three versions of !: one for single-cell items, one for double-cell items and one for character size items. In ANS Forth, these three versions actually match !, 2! and C!.

You already know the four memory areas DATA, CONST, CODE and PORT. However, there seems to be a fifth memory area called FAR-ADDRESS. What about that? Well, FAR-ADDRESS is a processor-specific kind of address to be used on systems where the full address range can not be represented by a single-cell item. ANS Forth specifies that addresses are single-cell items, so ADDRESS has to be a subtype of SINGLE. For example, on a 16-bit system, single-cell items are 16 bits wide. If a full address on this system is wider that 16 bits, let's say 20 bits, the system normally uses techniques called banking or segmentation to calculate full addresses by somehow combining a 16-bit address offset with a so-called bank or segment address. Since the DATA, CODE and CONST memory areas are nothing else but predefined banks or segments, calculating full addresses within these memory areas just requires a 16-bit address offset. That's why addresses fit into single-cell items even on systems with banking or segmentation.

However, it is sometimes necessary to access a memory location that is not in one of the predefined banks or segments. A full address, consisting of a bank or segment address and a cell size address offset, can only be kept in double-cell items. In StrongForth, the data types of full addresses are FAR-ADDRESS and CFAR-ADDRESS. FAR-ADDRESS is a direct subtype of DOUBLE, and CFAR-ADDRESS is a direct subtype of FAR-ADDRESS. You'll later see that the name space, which is a part of the dictionary, can only be accessed by full addresses, because it is not kept in one of the predefined memory areas.

Memory Store

Let's now have a closer look at the overloaded versions of !, starting at the bottom of the above list. The stack diagram reveals that this version of ! expects two items on the stack. The first one is any single-cell item and the second one is a DATA address of an item. Specifying the compound data type DATA -> 1ST guarantees that SINGLE can only be stored at a memory location for exactly the same data type. Here's an example using the variable LONGITUDE as defined in the previous section:

-32 LONGITUDE !
 OK
15 LONGITUDE !

15 LONGITUDE ! ? undefined word
UNSIGNED DATA -> SIGNED

Obviously, the second command failed because StrongForth does not allow storing an item of data type UNSIGNED in a variable of data type SIGNED. On the other hand, the first command worked well, because -32 actually is of data type SIGNED.

The second overloaded version of ! (from bottom to top) is for storing items of data type DOUBLE and its subtypes into locations in the DATA memory area. It (partly) replaces the ANS Forth word 2!. Addresses for double-cell items can be created by defining DOUBLE variables:

2500000. VARIABLE INHABITANTS
 OK
2630000. INHABITANTS !
 OK
INHABITANTS .S
DATA -> UNSIGNED-DOUBLE  OK

Note that StrongForth provides two overloaded versions of VARIABLE, one for single-cell and one for double-cell items. In ANS Forth, 2VARIABLE would have to be used to define the variable in the latest example.

The ANS Forth words 2VARIABLE, 2@ and 2! do not exist in StrongForth. For defining double-cell variables and to fetch and store a double-cell item, you can use the overloaded versions of VARIABLE, @ and !, respectively. But defining two single-cell variables simultaneously is not supported by StrongForth. Each variable has to be defined individually, because each may have a unique data type. In ANS Forth, this functionality comes for free, because it is just a reuse of 2VARIABLE, 2@ and 2!. Though it is possible to define 2VARIABLE, 2@ and 2! in StrongForth, this is considered to be more harmful than useful, because it might obscure your source code.

The third overloaded version of ! has the stack diagram ( SINGLE CDATA -> 1ST -- ). It stores SINGLE into a character size memory location. If the size of a character in bits is less than the size of a cell, only the number of lower bits of SINGLE, that just fill up a character, are stored in the memory location. Actually, the semantic of this version of ! is identical to that of the ANS Forth word C!.

Character size memory locations are normally used for packed arrays of data. Defining a single variable as character sized won't save any memory space, because the necessary alignment for the next variable would just skip the saved memory. Therefore, something like CVARIABLE does not exist.

A typical example of a packed array of data is a character string, and a predefined transient area for character strings is PAD:

CHAR P PAD !
 OK
PAD .S
CDATA -> CHARACTER  OK
@ .
P OK

However, using packed arrays is not restricted to character strings. You can easily create packed arrays of integer numbers, flags, or any other data type. For example, CDATA -> UNSIGNED is the address of a character size memory location containing the lower bits of an item of data type UNSIGNED. Of course, you have to make sure that the numeric range of the unsigned numbers you store in this location is limited accordingly. This condition is normally not fulfilled for addresses. Note also that items of data type DOUBLE and it's direct and indirect subtypes can not be stored in a character size memory location.

So far, you've seen three overloaded versions of ! that handle addresses in the DATA memory area. Addresses of data type DATA will be the vast majority of addresses you have to deal with. Addresses of other data types, namely CONST, CODE, FAR-ADDRESS and PORT are handled in the same way. For each of these data types except for PORT, StrongForth provides three overloaded versions of ! to store single-cell and double-cell items as well as items of character size. ! for PORT addresses is not included in StrongForth, because most operating systems prevent direct access to I/O registers. For systems running on an embedded controllers, the three missing versions of ! for PORT addresses can easily be defined as machine code definitions.

But why does StrongForth provide overloaded versions ! for CODE and CONST addresses, although it was stated earlier in this section, that these memory areas are read-only? Well, CODE and CONST are indeed read-only in typical embedded systems, but during the development phase it is nevertheless necessary to write program code and constant data into these memory areas. So, if you chose to develop your software directly on the target hardware, it may become necessary to temporarily replace ROM, EPROM, Flash or whatever kind of read-only memory by RAM. If you develop software on a PC, you don't even have to consider read-only memory, because it can be assumed that all available memory areas are in RAM.

Memory Fetch

As mentioned in the previous section, @ has even more overloaded versions than !:

WORDS @
@ ( CFAR-ADDRESS -> FLAG -- 2ND )
@ ( CFAR-ADDRESS -> SIGNED -- 2ND )
@ ( CFAR-ADDRESS -> SINGLE -- 2ND )
@ ( FAR-ADDRESS -> DOUBLE -- 2ND )
@ ( FAR-ADDRESS -> SINGLE -- 2ND )
@ ( CCODE -> FLAG -- 2ND )
@ ( CCODE -> SIGNED -- 2ND )
@ ( CCODE -> SINGLE -- 2ND )
@ ( CODE -> DOUBLE -- 2ND )
@ ( CODE -> SINGLE -- 2ND )
@ ( CCONST -> FLAG -- 2ND )
@ ( CCONST -> SIGNED -- 2ND )
@ ( CCONST -> SINGLE -- 2ND )
@ ( CONST -> DOUBLE -- 2ND )
@ ( CONST -> SINGLE -- 2ND )
@ ( CDATA -> FLAG -- 2ND )
@ ( CDATA -> SIGNED -- 2ND )
@ ( CDATA -> SINGLE -- 2ND )
@ ( DATA -> DOUBLE -- 2ND )
@ ( DATA -> SINGLE -- 2ND )
 OK

For each memory area, StrongForth provides 5 versions of @. Let's again start from the bottom. Similar to !, there are three versions of @ to fetch single-cell, double-cell and character size items from memory locations. These three versions are the semantic equivalents to the ANS Forth words @, 2@ and C@, respectively.

Normally, a character occupies less space than a cell. When fetching an item from a character size memory location, it is zero extended to fit into a cell. Zero extension means that the high-order bits of the value are filled with zero bits. This is the behaviour that ANS Forth specifies for C@, and it is certainly correct when fetching a character or an unsigned number from a character size memory location. However, zero extension is not desirable when you want to fetch a signed number or a flag from a character size memory location, as the following example demonstrates:

-119 PAD CAST CDATA -> SIGNED !
 OK
PAD CAST CDATA -> SINGLE @ .
137  OK

What happened? The bit pattern of decimal -119 on a 16-bit system is 1111111110001001. If this is stored in a character size memory location, the bit pattern in memory is 10001001, assuming character size is 8 bits. Fetching this value with zero-extending yields a result of 0000000010001001, which is 137 decimal. The correct way to handle this case would have been to use sign extension instead of zero extension. Sign extension fills the high-order bits of the single cell with copies of the most significant bit of the contents of the character size memory location, which is 1 in the above example. Fetching binary 10001001 with sign extension correctly yields 1111111110001001, or -119 decimal.

PAD CAST CDATA -> SIGNED @ .
-119  OK

That's the reason why StrongForth provides more than one overloaded version for fetching from character size memory locations in the DATA memory area. Sign extension has to be applied to items of data type SIGNED or FLAG. Items of all other data types have to be zero extended when fetching them from a character size memory location. Zero extending a character size TRUE flag would result in binary 0000000011111111, which is not a valid flag, while sign extending the same value correctly results in binary 1111111111111111.

Since data types SIGNED and FLAG are not subtypes of the same data type, it is necessary to provide separate overloaded versions of @ for each of them. Note that the dictionary order is important here. The special versions

@ ( CDATA -> FLAG -- 2ND )
@ ( CDATA -> SIGNED -- 2ND )

with sign extension are found before the general version

@ ( CDATA -> SINGLE -- 2ND )

with zero extension. This is a general rule. When implementing overloaded words, you always have to define the general version before defining the special versions, because interpreter and compiler search the dictionary starting with the most recently defined words. If a general version is defined after the special versions, interpreter and compiler would never find the special versions, because the general version also applies to the special cases and would thus hide the special versions of the overloaded word. In ANS Forth, a word always hides previously defined words with the same name.

Finally, let's have a look at the following example:

LONGITUDE VARIABLE POINTER
 OK
POINTER .S
DATA -> DATA -> SIGNED  OK
@ .S
DATA -> SIGNED  OK
@ .S
SIGNED  OK
.
-32  OK

POINTER is a variable, that contains the address of another variable, LONGITUDE. Though the data type of POINTER looks strange, it should be rather obvious what happens here. POINTER is an address of an address of a signed number, DATA -> DATA -> SIGNED, so applying @ results in an item of data type DATA -> SIGNED, and applying @ again finally yields a signed number. Items of compound data types like DATA -> SIGNED can certainly be stored in variables just like any items of basic data types.

Note that overloaded versions of @ for PORT addresses are not provided by default. Most multitasking operating systems protect I/O registers from being directly accessed by applications. You can define these words yourself as machine code definitions, if you need them for an applicaiton running on an embedded controller.

Memory Blocks

ANS Forth specifies a set of words that perform operations on memory blocks: MOVE, CMOVE, CMOVE>, FILL and ERASE. MOVE and ERASE apply to address units, while CMOVE, CMOVE> and FILL apply to characters and may thus be interpreted as string operations. As usual, StrongForth provides overloaded versions of these words, which can be applied to all available data types. Let's start with FILL.

The StrongForth equivalent to the ANS Forth word FILL is

FILL ( CDATA -> SINGLE UNSIGNED 2ND -- )

Note that CDATA -> SINGLE is not exactly an address of a character. It's an address of a character size item. The last parameter 2ND ensures that the item to be filled in has the same data type as the one the address points to.

However, StrongForth provides two additional versions of FILL:

FILL ( DATA -> SINGLE UNSIGNED 2ND -- )
FILL ( DATA -> DOUBLE UNSIGNED 2ND -- )

Obviously, these two versions can be used to fill a block of memory with any single-cell or double-cell item, which is useful for initialising arrays that do not consist of characters. The second parameter UNSIGNED directly specifies the number of items to be filled into the memory block. If, for example, single-cell items occupy two address units in memory, the phrase

DATA-SPACE HERE CAST DATA -> UNSIGNED 5 1000 FILL

affects a total of 10 address units by five times storing the unsigned number 1000 into consecutive memory cells.

FILL can only be applied to memory blocks in the DATA memory area, because this is the only memory area that can be written to. It wouldn't make sense to try filling into CONST and CODE memory areas, because these memory areas are read-only.

Directly derived from FILL is ERASE. Other than in ANS Forth, ERASE does not apply to address units. Instead, ERASE uses FILL to fill the specified number of single cells, double cells or character size items with binary zero. Here's how the three versions of ERASE are defined in StrongForth:

: ERASE ( DATA -> SINGLE UNSIGNED -- )
  NULL SINGLE FILL ;

: ERASE ( DATA -> DOUBLE UNSIGNED -- )
  NULL DOUBLE FILL ;

: ERASE ( CDATA -> SINGLE UNSIGNED -- )
  NULL SINGLE FILL ;

Finally, StrongForth provides 9 overloaded versions of MOVE:

WORDS MOVE
MOVE ( CFAR-ADDRESS -> SINGLE CDATA -> 2ND UNSIGNED -- )
MOVE ( FAR-ADDRESS -> DOUBLE DATA -> 2ND UNSIGNED -- )
MOVE ( FAR-ADDRESS -> SINGLE DATA -> 2ND UNSIGNED -- )
MOVE ( CCONST -> SINGLE CDATA -> 2ND UNSIGNED -- )
MOVE ( CONST -> DOUBLE DATA -> 2ND UNSIGNED -- )
MOVE ( CONST -> SINGLE DATA -> 2ND UNSIGNED -- )
MOVE ( CDATA -> SINGLE CDATA -> 2ND UNSIGNED -- )
MOVE ( DATA -> DOUBLE DATA -> 2ND UNSIGNED -- )
MOVE ( DATA -> SINGLE DATA -> 2ND UNSIGNED -- )
 OK

Again, you can see that MOVE can be applied to areas of memory that contain either single-cell items, double-cell items or character size items. Therefore, it's not necessary to provide a version that operates on address units. Using CHARS and CELLS for calculating the number of address units to be moved, like in ANS Forth, is not necessary.

As with FILL and ERASE, the destination memory block of MOVE has to be in the DATA memory area. However, the source memory block may be in either DATA or CONST memory area, or in a location specified by an address that includes a given bank or segment, because this memory block does not require write access. Using a memory block in the CONST memory area as source for a memory move is useful whenever an array is initialised with values that can all be calculated at compilation time, like in the following example:

CONST-SPACE HERE CAST CONST -> UNSIGNED
 OK
675 , 318 , 812 , 550 ,
 OK
CONSTANT INIT
 OK
0 VARIABLE ARRAY DATA-SPACE 3 CELLS ALLOT
 OK
: INIT-ARRAY
  INIT ARRAY 4 MOVE ;
 OK
INIT-ARRAY
 OK
ARRAY 1+ @ .
318  OK

First, INIT is defined as the address of four consecutive unsigned numbers located in the CONST memory area. Then, ARRAY is defined as an array of four unsigned numbers. Using a simple MOVE, INIT-ARRAY initialises the contents of ARRAY with the four unsigned numbers.

CMOVE and CMOVE> are not available in StrongForth. In most cases, they can be substituted by MOVE. However, there's no replacement for the propagation feature of CMOVE and CMOVE>. This feature is considered to be obsolete, because it's main application, filling memory blocks with single-cell or double-cell size patterns, can easily be achieved by the respective versions of FILL. The few special applications can be handled by hand-written code. For example, MOVE can not fill the PAD with repeating sequences of 3 characters in the same way a single CMOVE could have done that. In cases like that, you have to use MOVE repeatedly:

: 5*ABC
  [CHAR] A PAD ! [CHAR] B PAD 1+ ! [CHAR] C PAD 2 + !
  PAD 15 + PAD 3 + DO PAD I 3 MOVE 3 +LOOP ;
 OK
5*ABC
 OK
PAD 15 TYPE
ABCABCABCABCABC OK

Dr. Stephan Becher - October 8th, 2007