Chapter 19: Exception Handling

StrongForth supports ANS Forth's Exception and Exception Extension word sets. Exception handling is available after loading blocks 175 to 179. If the Floating-Point word set has been loaded, some extensions from blocks 895 to 898 have to be loaded as well:

175 179 THRU
 OK
895 898 THRU
 OK

Exception handling in ANS Forth is based on the two words CATCH and THROW. CATCH creates an exception frame and executes a token. If no exception was thrown during execution of the token, CATCH removes the exception frame and continues execution. If, on the other hand, an exception is thrown before the execution of the token is finished, the exception frame is removed by THROW. THROW ensures that the flow of execution is continued at the same point where CATCH would have returned if the execution of the token had terminated normally.

Specialized CATCH

The ANS Forth stack diagram of CATCH looks quite similar to the one of EXECUTE, because both words execute a token:

EXECUTE ( i*x xt -- j*x )
CATCH   ( i*x xt -- j*x 0 | i*x n )

In order to keep the consistency of StrongForth's data type system, CATCH needs to consider the stack effect of executing the token. But the actual value of the token at runtime is not yet known at compile time. This is the same problem as with EXECUTE, and consequently, it is resolved in the same way. If the token to be provided to CATCH is a qualified token, the compiler knows its stack effect. Does this mean that we need to define a separate version of CATCH for each qualified token, just as with EXECUTE? No, this is not necessary. You will see later in this section how this problem is resolved.

But there's another problem. CATCH as specified by ANS Forth does not have unique output parameters. If no exception is thrown during execution of the token, CATCH has the same stack effect as if the token had been executed by EXECUTE, plus an additional output parameter of data type SIGNED. But if an exception is thrown, the depth of the data stack is supposed to remain unchanged. For example, if the stack effect of the token is

( addr u -- flag )

the stack effect of CATCH would be

( addr u token -- flag 0 | addr u n )

with output parameters addr u having undefined values. Since StrongForth cannot handle ambiguous stack diagrams, CATCH needs to have the same stack effect in both cases, i. e.,

( addr u token -- flag n )

This is an important deviation from the ANS Forth standard.

The implementation of one universal version of CATCH that can be applied to all qualified tokens is pretty complicated. Therefore, let's start the easy way by defining a version of CATCH for one specific qualified token:

NULL DATA VARIABLE HANDLER
( LOGICAL UNSIGNED -- 1ST )PROCREATES (LU--1)

: CATCH ( LOGICAL UNSIGNED (LU--1) -- 1ST SIGNED )
  >IN @ >R
  SOURCE-ID >R
  SOURCE-SPEC >R
  SP@ -> SINGLE 1+ >R
  HANDLER @ >R
  RP@ HANDLER !
  EXECUTE
  R> HANDLER !
  R> DROP
  R> DROP
  R> DROP
  R> DROP
  +0 ;

HANDLER contains zero as long as no exception frame is present. It is being initialized by an updated version of QUIT. ABORT needs to be updated as well in order to incorporate the new version of QUIT:

: QUIT ( -- )
  NULL DATA HANDLER ! QUIT ;

:NONAME ( -- )
  POSTPONE [ SP0 SP! DTP! CR QUIT ; IS ABORT

CATCH creates an exception frame on the return stack, whose memory image looks like this (from high to low addresses):

value of >IN
value of SOURCE-ID

value of SOURCE-SPEC

data stack pointer after CATCH
pointer to previous exception frame

The top three entries represent the input source specification at the point immediately before the execution of CATCH. Storing the input source specification is necessary, because it needs to be restored if an exception is thrown during execution of the token. Note that SOURCE-SPEC is a double-cell value. The next entry is the value of the data stack pointer after CATCH is done. Because the depth of the data stack after executing CATCH is one cell less than immediately before executing CATCH, the size of one cell has to be added to the present value of the data stack pointer. Of course, the net effect on the data stack depth depends on the stack effect of the qualified token. For (LU--1), the net effect is "one cell less". The final entry in the exception frame is the present value of the system variable HANDLER, i. e., a pointer to the previous exception frame. CATCH stores a pointer to the current exception frame in HANDLER, where it can be obtained by THROW. Saving the old value of variable HANDLER in the exception frame ensures that CATCH can be nested.

After creating the exception frame, CATCH executes the token. If no exception is being thrown, CATCH simply removes the exception frame from the return stack and pushes zero as SIGNED on the data stack to indicate that execution terminated normally.

Now, what happens if somewhere during the execution of the token, an exception is thrown? An exception is thrown by executing THROW with an appropriate error code. THROW uses the data in the exception frame to restore data and return stack and continue execution at exactly the same point as CATCH would have done if the execution had terminated normally. Here's the (deferred) definition of THROW:

:NONAME ( SIGNED -- )
  DUP
  IF HANDLER @ 0=
     IF ERROR
     ELSE HANDLER @ RP!
        RP@ -> DATA @ HANDLER !
        RP@ -> SIGNED !
        RP@ -> DATA -> SIGNED 1+ @ 1+ SP!
        RP@ -> SIGNED @ ( SIGNED -- )CAST
        RP@ 2 CELLS + -> DOUBLE @ TO SOURCE-SPEC
        RP@ 4 CELLS + -> FILE @ TO SOURCE-ID
        RP@ 5 CELLS + -> UNSIGNED @ >IN !
        (RDROP) (RDROP) (DRDROP) (RDROP) (RDROP)
        ?REFILL
     THEN
  ELSE DROP
  THEN ; IS THROW

THROW does nothing if the error code of data type SIGNED is zero. Otherwise, it checks the value of HANDLER. If no exception frame exists, HANDLER still contains the null pointer it has been initialized with by QUIT. In this case, the exception handling is actually done by ERROR. In the other case, i. e., if HANDLER contains a valid pointer to an exception frame, THROW starts with cleaning up the return stack by making the return stack pointer point to the latest exception frame. The first entry in the exception frame is a pointer to the previous exception frame, or a null pointer if no previous exception frame exists. The semantics of

RP@ -> DATA @

is actually the same as the one of R@ in ANS Forth. It pushes a copy of the top of the return stack to the data stack. In StrongForth, R@ is not available at this point, because R@ is a local variable created by >R. Since THROW accesses a cell on the return stack that has been placed there by a different word (CATCH), a low-level phrase with a type cast has to be used instead.

After the previous value of variable HANDLER has been restored, THROW reuses the first cell of the exception frame to store a temporary copy of the error code. This is necessary, because THROW's next action, restoring the data stack pointer, will make the error code unavailable. The second entry of the exception frame is the calculated value of the data stack pointer if CATCH had returned normally. This value has to be corrected by an offset of one cell for the error code. After restoring the data stack pointer, THROW retrieves the error code from the exception frame and uses a type cast in order to forget about it at compile time. According to its stack diagram, THROW may not leave anything on the data stack at all. Instead, THROW is actually returning the error code in the name of CATCH.

The contents of the next four cells of the exception frame are used to restore the input source specification to the state immediately before the corresponding CATCH was executed. Eventually THROW removes the exception frame from the return stack. Again, it has to use low-level words for this purpose, because R> would only work if THROW had created the exception frame itself. At the end, THROW uses ?REFILL to perform any additional tasks that are required to fully restore the input source specification. ?REFILL is a deferred definition with no semantics so far. A non-trivial semantics will become necessary once files as input sources are introduced with the File word set.

CATCH As CATCH Can

Now, let's get back to the problem of implementing a universal version of CATCH. Remember that the simplified version, which was presented at the beginning of this section, only applies to one specific qualified token. An equivalent version for a different qualified token would only differ in two places:

  1. The stack diagram, which is the same as the one for the corresponding version of EXECUTE, plus an additional output parameter of data type SIGNED.
  2. The net effect on the data stack depth, which is incorporated into the above version of CATCH as 1+.

In StrongForth, different stack diagrams cannot be assigned to one word. But it is possible to define a universal version of CATCH as a state-smart immediate word, which takes care of the stack effect and calculates the net effect on the data stack depth. This word may then compile or execute a low-level word with a generic stack effect and the net effect on the data stack depth as a parameter. Here's the definition of this low-level word:

: (CATCH) ( TOKEN INTEGER -- SIGNED )
  >IN @ >R SOURCE-ID >R SOURCE-SPEC >R
  SP@ -> SINGLE SWAP + >R HANDLER @ >R RP@ HANDLER !
  (EXECUTE)
  R> HANDLER ! R> DROP R> DROP R> DROP R> DROP +0 ;

It looks very similar to the previous version of CATCH for qualified tokens of data type (LU--1). The stack effect of executing the token has been removed, and EXECUTE has been replaced by (EXECUTE). An additional input parameter of data type INTEGER contains the net effect on the data stack depth. (CATCH) is compiled or executed by the state-smart, immediate word CATCH:

: CATCH ( -- )
  " EXECUTE" TRANSIENT FALSE MATCH SEARCH-ALL
  IF DEPTH-SP SWAP STATE @ DT>DT DROP DEPTH-SP -
     STATE @
     IF [DT] TOKEN >DT [LITERAL] POSTPONE (CATCH)
     ELSE ( UNSIGNED -- TOKEN INTEGER )CAST (CATCH)
        ( SIGNED -- )CAST [DT] SIGNED >DT
     THEN
  ELSE DROP -13 THROW
  THEN ; IMMEDIATE

CATCH does not have an explicit stack diagram, because it calculates its stack effect dynamically. It first tries to find a version of EXECUTE that matches the contents of the data type heap. DT>DT applies EXECUTE's stack diagram to the interpreter or compiler data type heap, depending on the value of system variable STATE. The net effect of EXECUTE's stack diagram on the data stack depth is calculated by subtracting the results of DEPTH-SP before and after applying the stack diagram. DEPTH-SP returns the depth of the data stack in cells based on the contents of the data type heap. See chapter 7 for a detailed description of DEPTH-SP:

: DEPTH-SP ( -- UNSIGNED )
  0 TRUE DTP@ DUP DEPTH -
  ?DO IF I @ ?SIZE + THEN I @ DT-PREFIX ATTRIBUTE? INVERT
  LOOP DROP ;

The only difference between this version of DEPTH-SP and the one presented in chapter 7 is that this one uses ?SIZE instead of SIZE. ?SIZE does the same thing as SIZE, but it throws an exception if the size of the data type cannot be determined. This happens with the null data type and with data type TUPLE or its direct or indirect subtypes:

: ?SIZE ( DATA-TYPE -- UNSIGNED )
  SIZE DUP 0= IF -271 THROW THEN ;

This means that CATCH cannot be applied to words whose stack diagram contains one or more tuples. Since the size of a tuple cannot be determined at compile time, it is not possible to calculate the net effect on the data stack depth. For example, it is not possible catch an exception in SAVE-INPUT or RESTORE-INPUT:

( -- INPUT-SOURCE )PROCREATES (--IS)
 OK
DT (--IS) ?TOKEN SAVE-INPUT CAST (--IS) CATCH

DT (--IS) ?TOKEN SAVE-INPUT CAST (--IS) CATCH ? invalid item size
INPUT-SOURCE

Now let's get back to the definition of CATCH, to the point after calculating the change in the data stack depth. From now on, CATCH has to distinguish between interpretation and compilation state. In order to compile (CATCH), two parameters of data types TOKEN and INTEGER have to be provided. Data type TOKEN is just added to the compiler data type heap, because the qualified token is still on top of the data stack. [LITERAL] compiles the calculated net effect on the data stack depth as a literal of data type UNSIGNED, which is a subtype of INTEGER. Finally, (CATCH) is compiled.

Messing around with the data type heap might look somewhat confusing to you. To make this easier to understand, here are the contents of the compiler data type heap during the execution of CATCH with a qualified token of data type (LU--1):

Immediately before DT>DT:  LOGICAL UNSIGNED (LU--1)
Immediately after DT>DT:  LOGICAL
Immediately before POSTPONE:  LOGICAL TOKEN UNSIGNED
Immediately after POSTPONE:  LOGICAL SIGNED

If CATCH is used in interpretation state, the data type manipulations are different. Because CATCH does not have a stack diagram, a type cast is required to make the qualified token visible. For the same reason, another type cast has to remove data type SIGNED afterwards. But since (CATCH) actually returns an item of data type SIGNED, data type SIGNED is then manually pushed to the interpreter data type heap.

To summarize, the two type casts before and after (CATCH) just correct the obvious mistake that CATCH is defined without a stack diagram. Since state-smart words generally have different stack effects in interpretation and compilation state, their implementation in StrongForth is often difficult. The necessary data type manipuations can make state-smart words pretty complicated.

An Example

This section contains a small example about how exception handling may be used in StrongForth. First, we define a new version of / that throws an exception if the divisor is zero:

: / ( UNSIGNED-DOUBLE UNSIGNED -- 1ST )
  DUP 0= IF DROP -10 THROW ELSE / THEN ;
 OK

To be able to provide CATCH with the token of this version of /, we need to create a suitable qualified token:

( UNSIGNED-DOUBLE UNSIGNED -- 1ST )PROCREATES (UDU--1)
 OK

We can now try to catch exceptions thrown by /:

605686950. 825 DT (UDU--1) ?TOKEN / CAST (UDU--1) CATCH .S . .
UNSIGNED-DOUBLE SIGNED 0 734166  OK
605686950. 0 DT (UDU--1) ?TOKEN / CAST (UDU--1) CATCH .S . .
UNSIGNED-DOUBLE SIGNED -10 605686950  OK

In the first case, / does not throw an exception, because the divisor is positive. CATCH returns 0 to indicate that the operation terminated normally. In the second case, the divisor is zero, and CATCH returns -10 as the error code. The result of the division is undefined, because the operation could not be terminated. Remember that StrongForth's version of CATCH has a unique stack diagram, no matter whether the operation terminated normally or not.

The Exception Extension Word Set

StrongForth supports the ANS Forth Exception Extension word set, which consists of the words ABORT and ABORT". ABORT is defined exactly as suggested in the standard:

: ABORT ( -- )
  -1 THROW ;

ABORT" is specified to execute -2 THROW. If no exception frame is present, THROW with the parameter -2 shall display the given character string. In order to pass the character string to THROW, (ABORT") stores the address of the first character and the length of the string in the two variables ERROR-ADDR and ERROR-COUNT, respectively. The implementation of ABORT" itself is unchanged with respect to the one from the Core word set, because ABORT" always compiles the latest version of (ABORT"):

NULL CCONST -> CHARACTER VARIABLE ERROR-ADDR
NULL UNSIGNED VARIABLE ERROR-COUNT

: (ABORT") ( SINGLE CCONST -> CHARACTER UNSIGNED -- )
  ROT
  IF ERROR-COUNT ! ERROR-ADDR ! -2 THROW
  ELSE DROP DROP
  THEN ;
  
: ABORT" ( -- )
  ?COMPILE POSTPONE " POSTPONE (ABORT") ; IMMEDIATE

If an exception is thrown without an exception frame being present, THROW simply executes ERROR. ERROR in turn executes the original version of ABORT from the Core word set. The version from the Core word set is only used by ERROR. In the StrongForth dictionary, the Core version of ABORT is hidden by the version from the Exception Extension word set. Note that the version from the Core word set is a deferred definition, while the version from the Exception Extension word set is not deferred.

But ERROR still has to deal with the two special cases, that are yet not considered. ABORT or (ABORT") expect special treatments for their error codes -1 and -2. This means, ERROR has to be replaced by a new version, which is to be compiled into the definition of THROW:

: ERROR ( SIGNED -- )
  CASE
     -1 OF ABORT ENDOF
     -2 OF ERROR-ADDR @ ERROR-COUNT @ TYPE ABORT ENDOF
     DUP ERROR
  ENDCASE ;

If the error code is -1, i. e., if the exception is thrown by ABORT, ERROR executes the Core version of ABORT without displaying an error message. If the error code is -2, ERROR displays the character string provided by (ABORT") before executing ABORT. In all other cases where the error code is non-zero, the new version of ERROR just displays the error message generated by the old version of ERROR.

Exception Handling With Blocks

If blocks are supported, variable BLK becomes a part of the input source specification. The exception frame created by (CATCH) has to be extended by one cell containing the value of BLK. (CATCH) stores the value of BLK in the exception frame, while THROW uses this value to restore BLK as part of the input source specification. This is the memory image of the extended exception frame:

value of BLK
value of >IN
value of SOURCE-ID

value of SOURCE-SPEC

data stack pointer after CATCH
pointer to previous exception frame

The necessary extensions of (CATCH) and THROW are straight-forward. Additions with respect to the non-block definitions are emphasized. Note that CATCH remains unchanged, because the exception frame is created solely by (CATCH):

: (CATCH) ( TOKEN INTEGER -- SIGNED )
  BLK @ >R >IN @ >R SOURCE-ID >R SOURCE-SPEC >R
  SP@ -> SINGLE SWAP + >R HANDLER @ >R RP@ HANDLER !
  (EXECUTE) 
  R> HANDLER ! R> DROP R> DROP R> DROP R> DROP R> DROP +0 ;

: CATCH ( -- )
  " EXECUTE" TRANSIENT FALSE MATCH SEARCH-ALL
  IF DEPTH-SP SWAP STATE @ DT>DT DROP DEPTH-SP -
     STATE @
     IF [DT] TOKEN >DT [LITERAL] POSTPONE (CATCH)
     ELSE ( UNSIGNED -- TOKEN INTEGER )CAST (CATCH)
        ( SIGNED -- )CAST [DT] SIGNED >DT
     THEN
  ELSE DROP -13 THROW
  THEN ; IMMEDIATE
  
:NONAME ( SIGNED -- )
  DUP
  IF HANDLER @ 0=
     IF ERROR
     ELSE HANDLER @ RP!
        RP@ -> DATA @ HANDLER !
        RP@ -> SIGNED !
        RP@ -> DATA -> SIGNED 1+ @ 1+ SP!
        RP@ -> SIGNED @ ( SIGNED -- )CAST
        RP@ 2 CELLS + -> DOUBLE @ TO SOURCE-SPEC
        RP@ 4 CELLS + -> FILE @ TO SOURCE-ID
        RP@ 5 CELLS + -> UNSIGNED @ >IN !
        RP@ 6 CELLS + -> UNSIGNED @ BLK !
        (RDROP) (RDROP) (DRDROP) (RDROP) (RDROP) (RDROP)
		?REFILL
     THEN
  ELSE DROP
  THEN ; IS THROW

Exception Handling With Floating-Point Numbers

The introduction of a hardware floating-point stack affects StrongForth's exception handling, because the floating-point stack needs to be separately restored when an exception is being thrown. The exception frame has to be extended by a field containing the value of the floating-point stack pointer as it would be after CATCH returned successfully:

floating-point stack pointer after CATCH
value of BLK
value of >IN
value of SOURCE-ID

value of SOURCE-SPEC

data stack pointer after CATCH
pointer to previous exception frame

The extended exception frame requires changes to (CATCH), CATCH and THROW. The new version of (CATCH) has one additional parameter of data type INTEGER for the change in the floating-point stack pointer caused by the word to be executed.

: (CATCH) ( TOKEN INTEGER INTEGER -- SIGNED )
  FP@ SWAP + >R
  BLK @ >R
  >IN @ >R
  SOURCE-ID >R
  SOURCE-SPEC >R
  SP@ -> SINGLE SWAP + >R
  HANDLER @ >R
  RP@ HANDLER ! (EXECUTE)
  R> HANDLER !
  R> DROP R> DROP R> DROP R> DROP R> DROP R> DROP +0 ;

The changes of the data stack pointer and the floating-point stack pointer implied by the word to be executed can be calculated from its stack diagram. CATCH uses DEPTH-SP and DEPTH-FP for this purpose. The original version of DEPTH-SP, which was used by the non floating-point version of CATCH, is obsolete, because it counts each floating-point number as four cells on the data stack. This is wrong, because floating-point numbers are not stored on the data stack. The new version of DEPTH-SP simply skips all floating-point numbers. DEPTH-FP, on the other hand, only counts the floating-point numbers and ignores all other data types. Both words use FLOAT? to distinguish floating-point numbers from other data types:

: FLOAT? ( DATA-TYPE -- FLAG )
  ANCESTOR [DT] FLOAT = ;

: DEPTH-SP ( -- UNSIGNED )
  0 TRUE DTP@ DUP DEPTH -
  ?DO IF I @ DUP FLOAT? IF DROP ELSE ?SIZE + THEN
      THEN I @ DT-PREFIX ATTRIBUTE? INVERT
  LOOP DROP ;

: DEPTH-FP ( -- UNSIGNED )
  0 TRUE DTP@ DUP DEPTH -
  ?DO IF I @ FLOAT? IF 1+ THEN
     THEN I @ DT-PREFIX ATTRIBUTE? INVERT
  LOOP DROP ;

: CATCH ( -- )
  " EXECUTE" TRANSIENT FALSE MATCH SEARCH-ALL
  IF DEPTH-FP DEPTH-SP ROT STATE @ DT>DT DROP
     DEPTH-SP - SWAP DEPTH-FP - 8 MOD STATE @
     IF [DT] TOKEN >DT SWAP [LITERAL] [LITERAL] POSTPONE (CATCH)
     ELSE ( UNSIGNED UNSIGNED -- TOKEN INTEGER INTEGER )CAST
        (CATCH) ( SIGNED -- )CAST [DT] SIGNED >DT
     THEN
  ELSE DROP -13 THROW
  THEN ; IMMEDIATE

Finally, THROW needs to be replaced by a version that restores the floating-point stack pointer from the value stored in the exception frame. Since THROW is a deferred definition, the changes apply to all existing words that execute THROW.

:NONAME ( SIGNED -- )
  DUP
  IF HANDLER @ 0=
     IF ERROR
     ELSE HANDLER @ RP! RP@ -> DATA @ HANDLER ! RP@ -> SIGNED !
        RP@ -> DATA -> SIGNED 1+ @ 1+ SP!
        RP@ -> SIGNED @ ( SIGNED -- )CAST
        RP@ 2 CELLS + -> DOUBLE @ TO SOURCE-SPEC
        RP@ 4 CELLS + -> FILE @ TO SOURCE-ID
        RP@ 5 CELLS + -> UNSIGNED @ >IN !
        RP@ 6 CELLS + -> UNSIGNED @ BLK !
        RP@ 7 CELLS + -> UNSIGNED @ FP!
        (RDROP) (RDROP) (DRDROP) (RDROP) (RDROP) (RDROP) (RDROP)
        ?REFILL
     THEN
  ELSE DROP
  THEN ; IS THROW

Dr. Stephan Becher - August 23rd, 2007