Chapter 20: Dynamic Memory Allocation

Chunks And Data Blocks

StrongForth provides a simple implementation of the ANS Forth Memory-Allocation word set. Dynamic memory can be allocated from a dedicated region in the DATA memory area. The dynamic memory region is divided into contiguous chunks, which have a predefined structure:

size = n

data block (n cells)

The first cell always contains the size in cells of the actual data block. After ALLOCATE has obtained a chunk, it returns the address of the first cell of its data block as an item of data type DATA. The size of a data block can easily be determined with SIZE:

: SIZE ( DATA -- UNSIGNED )
  -> UNSIGNED 1- @ CELLS ;

Note that the sizes of chunks and data blocks are always multiples of one cell, although ALLOCATE according to the ANS Forth specification expects the number of address units to be allocated. The requested number of address units can be converted into the number of required cells with the word UNITS>CELLS:

: UNITS>CELLS ( UNSIGNED -- 1ST )
  1 CELLS /MOD + ;

Chunks that have not yet been allocated or that have been released with FREE are included in a linked list of free chunks. A free chunk contains a pointer to the next free chunk in the first cell of its data block, or zero if it is the last free chunk:

size = n
pointer to next free chunk

n-1 free cells of data

As long as no chunk has been allocated, the whole dynamic memory region is included in a single chunk. FREELIST is a pointer to the first cell of the dynamic memory region, and a pointer to the first free chunk. The total size of the dynamic memory region is 1024, but it can easily be changed by defining a different value as the constant SIZE-FREELIST.

0    VARIABLE FREELIST
1024 CONSTANT SIZE-FREELIST

: CLOBBER ( -- )
  SIZE-FREELIST 1- FREELIST ! 0 FREELIST 1+ ! ;

DATA-SPACE SIZE-FREELIST 1- CELLS ALLOT CLOBBER

CLOBBER initializes the dynamic memory region by making the whole dynamic memory region a single free chunk. It actually frees all allocated data blocks and thus makes the whole region again available for allocating.

Two words are provided for easyly walking from one chunk to the next. NEXT-CHUNK expects the address of a free chunk and returns the address of the pointer to the next free chunk. Since the address of the free chunk and the pointer to the next free chunk both point to a size parameter, the input parameter of NEXT-CHUNK has data type DATA -> UNSIGNED, and the output parameter has data type DATA -> DATA -> UNSIGNED:

: NEXT-CHUNK ( DATA -> UNSIGNED -- DATA -> 1ST )
  1+ CAST DATA -> DATA -> UNSIGNED ;

The second word calculates from any chunk, either allocated or free, the adjacent chunk towards higher addresses in the dynamic memory region. It simply adds the size of the chunk's data block plus one for the cell containing the size attribute to the address of the chunk:

: ADJACENT-CHUNK ( DATA -> UNSIGNED -- 1ST )
  DUP @ 1+ + ;

ALLOCATE

Now it's time to have a look at the implementation of the higher-level memory-allocation words. This is the definition of ALLOCATE:

: ALLOCATE ( UNSIGNED -- DATA SIGNED )
  UNITS>CELLS DUP FIND-CHUNK DUP
  IF SWAP SPLIT-CHUNK 1+ CAST DATA +0
  ELSE NIP CAST DATA -268
  THEN ;

ALLOCATE scans the list of free chunks for a chunk that is large enough for allocating a data block of the desired size. If a free chunk of that size exists, it is spitted into two chunks. The first one stays in the free chunk list, while the second one becomes an allocated chunk. ALLOCATE returns the address of the allocated chunk's data block as an address of data type DATA. If no sufficiently large free chunk is available, ALLOCATE returns a null pointer and an error code.

In order for the allocated data block to fit into a free chunk, the data block of the free chunk has to be at least two cells larger than the required numbers of cells to be allocated, because the free chunk remains after the allocation. This technique has two advantages. First, the linked list of free chunks does not need to be touched by ALLOCATE, apart from changing the size of one free chunk. And secondly, it is ensured that the dynamic memory region always begins with a free chunk, even if its size may shrink down to one cell only, which is not enough for allocating another new data block. As a result, several special cases do not need to be considered in ALLOCATE, FREE, and RESIZE.

: FIND-CHUNK ( UNSIGNED -- DATA -> 1ST )
  FREELIST
  BEGIN DUP
  WHILE OVER 2 + OVER @ >
  WHILE NEXT-CHUNK @
  REPEAT
  THEN NIP ;

: SPLIT-CHUNK ( DATA -> UNSIGNED 2ND -- 1ST )
  >R DUP DUP @ R@ 1+ - DUP ROT ! 1+ + R> OVER ! ;

FIND-CHUNK is a factor of ALLOCATE that tries to locate free chunk large enough to allocate a data block of the given number cells from. It returns a pointer to the free chunk, or a null pointer if no free chunk is large enough. SPLIT-CHUNK expects a pointer to a free chunk and the number of cells to be allocated, and returns a pointer to the chunk containing the allocated data block. After execution of SPLIT-CHUNK, the memory previously occupied by the free chunk looks like this, with n being the size of the original free chunk, and m being the number of cells to be allocated:

size = n-m-1
pointer to next free block (unchanged)

n-m-2 free cells of data

size = m

data block (m cells)

FREE

Freeing an allocated chunk is more complicated than allocating one. A number of different cases have to be distinguished. If the allocated chunk is preceeded or succeeded by a free chunk, it has to be merged with it. It is even possible that an allocated chunk is squeezed in between two free chunks, in which case all three chunks have to be merged and the result is one big free chunk. To keep things simple, the chunks in the free chunk list are strictly ordered according to their addresses. FREELIST is always the address of the first free chunk, and the successor of a free chunk (if one exists) is always one with a higher memory address. Assuming such an ordered free chunk list, ENCLOSE-CHUNK locates the free chunks before and after a given chunk. But beware that the two free chunks are not necessarily adjacent to the given chunk, and that not all chunks have a predecessor and a successor.

: ENCLOSE-CHUNK ( DATA -> UNSIGNED -- 1ST 1ST )
  LOCALS| ADDR | FREELIST DUP
  BEGIN NIP DUP NEXT-CHUNK @ DUP ADDR > OVER 0= OR
  UNTIL ;

To merge two adjacent chunks, it is sufficient to add the size of the second one, plus one for the cell containing its size, to the size of the first one. This is what MERGE-CHUNKS does. If both chunks are free chunks, it is necessary to additionally remove the second one from the free chunks list. MERGE-FREE-CHUNKS accomplishes this task by storing the pointer of the second free chunk as the pointer of the first free chunk:

: MERGE-CHUNKS ( DATA -> UNSIGNED 1ST -- )
  @ 1+ SWAP +! ;

: MERGE-FREE-CHUNKS ( DATA -> UNSIGNED 1ST -- )
  OVER OVER MERGE-CHUNKS 1+ @ SWAP 1+ ! ;

ENCLOSE-CHUNK, MERGE-CHUNKS and MERGE-FREE-CHUNKS are used by FREE. FREE has to distinguish four cases:

Case Previous Chunk Next Chunk
1 free free
2 free allocated or none
3 allocated free
4 allocated allocated or none

Note that the first chunk in the dynamic memory region, the one pointed to by FREELIST, is always a free chunk. Here's the definition of FREE:

: FREE ( DATA -- SIGNED )
  DUP 0= IF DROP +0 EXIT THEN
  -> UNSIGNED 1- LOCALS| ADDR |
  ADDR ENCLOSE-CHUNK OVER ADJACENT-CHUNK ADDR =
  IF OVER ADDR MERGE-CHUNKS OVER ADJACENT-CHUNK OVER =
     IF MERGE-FREE-CHUNKS
     ELSE DROP DROP
     THEN
  ELSE DUP ADDR ADJACENT-CHUNK =
     IF ADDR SWAP MERGE-FREE-CHUNKS
     ELSE ADDR NEXT-CHUNK !
     THEN ADDR SWAP NEXT-CHUNK !
  THEN +0 ;

FREE simply exits if it is executed with a null address instead of a valid memory address. It keeps the address of the allocated chunk to be freed in a local called ADDR. If the allocated chunk is immediately preceeded by a free chunk (cases 1 and 2), it merges the allocated chunk with the free chunk. If the merged free chunk is succeeded by another free chunk (case 1), these two chunks are merged as well. The ELSE branch of the outer conditional clause handles cases 3 and 4 If the allocaed chunk is succeeded by a free chunk, the two chunks are merged. Otherwise, the allocaed chunk becomes a new free chunk, that has to be inserted into the free chunks list. In both cases, the links in the free chunks list have to be updated.

RESIZE

The definition of RESIZE also needs to consider quite a number of different cases:

: RESIZE ( DATA UNSIGNED -- 1ST SIGNED )
  OVER 0= IF NIP ALLOCATE EXIT THEN
  UNITS>CELLS OVER -> UNSIGNED 1- LOCALS| ADDR SIZE |
  SIZE ADDR @ >
  IF ADDR ENCLOSE-CHUNK ADDR ADJACENT-CHUNK OVER =
     OVER @ SIZE ADDR @ - > AND
     IF ADDR SIZE 1+ + OVER @ ADDR @ + SIZE - OVER !
        SWAP 1+ @ OVER 1+ ! SWAP NEXT-CHUNK ! SIZE ADDR ! +0
     ELSE DROP DROP SIZE CELLS ALLOCATE DUP 0=
        IF DROP OVER -> SINGLE OVER -> SINGLE ADDR @ MOVE
           SWAP FREE
        ELSE NIP
        THEN
     THEN
  ELSE SIZE ADDR @ 1- <
     IF ADDR SIZE SPLIT-CHUNK' 1+ FREE
     ELSE +0
     THEN
  THEN ;

First of all, RESIZE checks whether the address that specifies the block to be resized is zero. If it is, no previous block exists and RESIZE just performs the semantics of ALLOCATE. Otherwise, the address of the chunk that contains the allocated data block, and the desired new size, are kept in locals ADDR and SIZE, respectively. The IF branch of the outer conditional clause handles the case where the desired size is larger than the size of the existing chunk. If the allocated chunk is directly succeeded by a sufficiently large free chunk, the size of the allocated chunk is increased, and the size of free chunk is decreased by the same amount. Otherwise, a new chunk with the desired size needs to be allocated. After copying the contents of the old allocated chunk's data block to the new allocated chunk's data block, the old allocated chunk is freed. The application has to consider the fact that a resized data block might appear at a different location.

If the new size is smaller than the allocated chunk, things get fairly easy. If the allocated chunk is at least 2 cells larger, it is split into two blocks, one of which is then freed. Otherwise, the allocated chunk remains unchanged, even if its size is now up to one cell larger than the desired size. The semantics of SPLIT-CHUNK' is similar to SPLIT-CHUNK. The difference is that SPLIT-CHUNK' expects the size of the first chunk on the stack, whereas SPLIT-CHUNK expects the size of the second chunk:

: SPLIT-CHUNK' ( DATA -> UNSIGNED 2ND -- 1ST )
  >R DUP DUP @ R@ 1+ - R@ ROT ! SWAP R> 1+ + SWAP OVER ! ;

Using Memory Allocation

The following example illustrates the usage of memory allocation:

100 ALLOCATE THROW -> SIGNED VALUE MEM
 OK
MEM SIZE .
100  OK
\ ... using memory ...
 OK
MEM 120 RESIZE THROW TO MEM
 OK
\ ... using more memory ...
 OK
MEM FREE THROW
 OK

Note that the memory address returned by ALLOCATE usually has to be casted to make it point to a specific data type. This is what -> SIGNED does in this example. If memory is allocated in order to store character-size data you'd have to write something like this:

255 ALLOCATE THROW CAST CDATA -> CHARACTER VALUE PAD2

This phrase can be shortened by using an overloaded version of ALLOCATE that directly returns an address of data type CDATA instead of DATA:

255 CALLOCATE THROW -> CHARACTER VALUE PAD2
 OK
PAD2 .S
CDATA -> CHARACTER  OK

Consequently, additional versions of SIZE, FREE and RESIZE are available as well:

' ALLOCATE ALIAS CALLOCATE ( UNSIGNED -- CDATA SIGNED )
' SIZE ALIAS SIZE ( CDATA -- UNSIGNED )
' FREE ALIAS FREE ( CDATA -- SIGNED )
' RESIZE ALIAS RESIZE ( CDATA UNSIGNED -- 1ST SIGNED )

Dr. Stephan Becher - November 22nd, 2007