# sanctuary (working title) sanctuary is a 64-bit subroutine threaded forth for amd64 linux systems. ## stack effect notation labels outside of the ones listed here are specific to a certain word's documentation and will be obvious or documented in the description. - `a`: memory address - `c`: one byte value - `e`: error code - `n`: signed integer - `u`: unsigned integer - `z`: null-terminated string - `?`: boolean flag - `xt`: execution token - `ht`: header token - `""`: string in input buffer - `|`: 'or' - `,`: separates multiple stack effects when multiple are needed ## Glossary the following is a list of words available in this forth. certain constants for linux system interaction are not documented; they are identical to their C versions. ### `! ( u a -- )` store the 64 bit value u into the memory address a. ### `# ( u -- u' )` add a numeric digit to the numeric output buffer by dividing u by `base` and appending the remainder. ### `#> ( u -- a u )` drop the remaining digit processing debris from the stack and return the numeric output buffer contents as a string. ### `#pad ( -- u )` length of the pad buffer. ### `#tib ( -- u )` variable containing the amount of characters in the input buffer. ### `' ( "word" -- xt )` read a word from the input buffer, push to the stack its execution token. ### `'h ( "word" -- ht )` read a word from the input buffer, push to the stack its header token. ### `( ( -- ) IMMEDIATE` start a comment which lasts until the next closed bracket. if the unclosed bracket in the description above bothers you, have a closing bracket: ). ### `(0handler) ( -- )` the very early error handler, which simply quits the program. ### `(abort") ( a u -- )` perform the runtime actions of `abort"`: print string to error and abort. ### `(create) ( -- )` the default behaviour of a word made by `create`, which simply pushes the address following the definition to the stack. this messes with the return stack and is not meant to be called outside of its specific context. ### `(defer) ( -- )` activate a deferred word from the address following this word's call. this messes with the return stack and is not meant to be called outside of its specific context. ### `(does>) ( -- )` run non-default behaviour of a `create`d word. pushes the data location onto the stack and calls the word immediately following the `(does>)` call. this messes with the return stack and is not meant to be called outside of its specific context. ### `(header) ( a u -- ht )` create a dictionary header for a word named the provided string. this word does not set the code field. this word does not update latest. ### `(hide) ( ht -- )` set the smudge bit on the header ht. ### `(unhide) ( ht -- )` unset the smudge bit on the header ht. ### `* ( u1 u2 -- u )` multiply u1 and u2. ### `*/mod ( n1 n2 n3 -- n4 n5 )` multiply n1 and n2, divide the result by n3. remainder is in n4, result is in n5. ### `+ ( u1 u2 -- u )` add u2 to u1. ### `+! ( a -- )` add one to the value at memory address a. ### `+to ( comp: "name" -- , intr: u "name" -- ) IMMEDIATE` compile or execute (depending on `state`) code to add u to the contents of a `value`. (in compile mode u is whatever was on the stack already.) ### `, ( u -- )` write a 64 bit value to user memory and increment the user memory pointer. ### `- ( u1 u2 -- u )` subtract u2 from u1. ### `-! ( a -- )` subtract one from the value at memory address a. ### `-to ( comp: "name" -- , intr: u "name" -- ) IMMEDIATE` compile or execute (depending on `state`) code to subtract u from the contents of a `value`. (in compile mode u is whatever was on the stack already.) ### `-rot ( u1 u2 u3 -- u3 u1 u2 )` rotate the three topmost values on the stack so that the topmost value is moved to the third highest. ### `." ( -- ) IMMEDIATE COMPILE-ONLY` compile into the current definition the following string (terminated by `"`) being written to output. ### `/buffer ( -- u )` the size of an input buffer. ### `/linebuf ( -- u )` the size of a line buffer. ### `/mod ( u1 u2 -- u3 u4 )` divide u1 by u2. result is in u4, remainder is in u3. ### `[ ( -- ) IMMEDIATE` set the system to interpret mode. ### `['] ( "word" -- ) IMMEDIATE COMPILE-ONLY` read a word from the input buffer, compile into the current definition a stack push of the xt of the word. ### `[compile] ( "word" -- ) IMMEDIATE COMPILE-ONLY` compile into the current definition a call to a normally immediate word. ### `] ( -- ) IMMEDIATE` set the system to compiling mode. ### `: ( "name" -- )` start compilation of the word 'name'. ### `:noname ( -- xt )` start compiling a nameless, headerless word and yield its xt. ### `; ( -- ) IMMEDIATE` end compilation of the currently compiling word. ### `\ ( -- ) IMMEDIATE` start a comment that lasts until the end of the current line. ### `@ ( a -- u )` fetch the 64 bit value at memory address a. ### `= ( n1 n2 -- ? )` return true if n1 and n2 are equal. ### `< ( n1 n2 -- ? )` return true if n1 is less than n2. ### `<# ( -- )` initialise the system for numeric output. ### `<= ( n1 n2 -- ? )` return true if n1 is less than or equal to n2. ### `<> ( n1 n2 -- ? )` return true if n1 and n2 are not equal. ### ` ( n1 n2 -- ? )` return true if n1 is greater than n2. ### `>= ( n1 n2 -- ? )` return true if n1 is greater than or equal to n2. ### `>body ( ht -- xt )` yield the code field of header token. ### `>defer ( xt -- a )` get the xt storage address of the deferred execution token. ### `>errno ( u -- val err )` transform the result of a system call into a value/error pair. if no error occured, err is zero and val is the result, if an error has occurred, val is zero and err is a negative integer (the exact value depends on the error) ### `>in ( -- a )` variable containing the index of the first unparsed character in the input buffer. ### `>mark ( -- a )` mark the source of a forward branch. ### `>r ( u -- ) ( R: -- u )` move a value from the working stack to the return stack. ### `>resolve ( a -- )` mark the destination of a forward branch. ### `?allocate ( u -- a e )` allocate a dynamic block of memory, producing an error on failure. ### `?branch ( -- )` compile into user memory an incomplete conditional branch. if the value on the stack is zero the branch is taken. a 32 bit branch offset must be written immediately after. ### `?dup ( n -- 0 | n n )` if n is not zero, perform `dup`. ### `?find ( a u -- ht )` look in the dictionary for the word a (of u characters). if a word was found, its link field address is returned along with the true flag. if no word was found or the string is of length zero, abort. ### `?componly? ( -- )` produce a compile-only error. ### `?notfound? ( -- )` produce a word not found error. ### `?overflow? ( -- )` produce a stack overflow error. ### `?underflow? ( -- )` produce a stack underflow error. ### `0= ( n -- ? )` return true if n is equal to zero. ### `0< ( n -- ? )` return true if n is less than zero. ### `0<= ( n -- ? )` return true if n is less than or equal to zero. ### `0<> ( n -- ? )` return true if n is not equal to zero. ### `0> ( n -- ? )` return true if n is greater than zero. ### `0>= ( n -- ? )` return true if n is greater than or equal to zero. ### `1+ ( u -- u')` add one to u. ### `1- ( u -- u')` subtract one from u. ### `2drop ( u1 u2 -- )` remove the two topmost values from the stack. ### `2dup ( u1 u2 -- u1 u2 u1 u2 )` duplicate the two topmost values on the stack. ### `abort ( -- )` call the error handler (the address of which is in the variable `handler`) ### `abort" ( -- ) IMMEDIATE COMPILE-ONLY` write the message that follows (terminated by `"`) to standard out then call the error handler. (the address of which is in the variable `handler`) ### `again ( -- ) IMMEDIATE COMPILE-ONLY` complete an infinite loop began by the word `begin`. ### `allocate ( u -- a e )` allocate a dynamic block of memory. ### `allot ( u -- )` reserve u bytes of user memory. ### `and ( u1 u2 -- u )` perform bitwise AND on u1 and u2. ### `base ( -- a )` a variable containing the current numeric input/output base. by default this is 10. ### `base-buffer ( -- a )` the address of the base input buffer which reads from the terminal. ### `base-linebuffer ( -- a )` the address of the base line input buffer. ### `begin ( -- ) IMMEDIATE COMPILE-ONLY` mark the beginning of a begin-again, begin-until, or begin-while-repeat loop. ### `binary ( -- )` set current base to binary. ### `branch ( -- )` compile into user memory an incomplete branch. a 32 bit branch offset must be written immediately after. ### `brk@ ( -- a )` yields current program break. ### `bye ( -- )` exits the forth system. ### `c, ( c -- )` write an 8 bit value to user memory and increment the user memory pointer. ### `c! ( u a -- )` store the 8 bit value u into the memory address a. ### `c@ ( a -- c )` fetch the 8 bit value at memory address a. ### `cell+ ( u -- u' )` increment u by the size of one cell. ### `cell- ( u -- u' )` decrement u by the size of one cell. ### `cells ( u -- u' )` transform an amount of cells into an amount of bytes. ### `char ( "c" -- c )` yield the value of the first character of the next word in the input stream. ### `close-file ( fd -- e )` close the file at fd. ### `cmove ( a1 a2 u -- )` copy u bytes of memory from a1 to a2. bytes are copied in low memory to high memory order. ### `cmove, ( a u -- )` copy u bytes of memory from a1 to `here`, then increment `here` appropriately. bytes are copied in low memory to high memory order. ### `cmove> ( a1 a2 u -- )` copy u bytes of memory from a1 to a2. bytes are copied in high memory to low memory order. ### `compile, ( xt -- )` compile a call to xt into user memory. ### `compile-only ( -- )` mark the most recently defined word as compile-only. ### `compile-only? ( ht -- ? )` true if ht is marked compile-only, false otherwise. ### `constant ( u "name" -- )` create a word that pushes a cell value u to the stack. ### `create ( "name" -- )` create a word in the dictionary that, by default, pushes the address directly following the header to the stack. this behaviour can be modified with `does>`. ### `d, ( n -- )` write a 32 bit value to user memory and increment the user memory pointer. ### `d! ( u a -- )` store the 32 bit value u into the memory address a. ### `decimal ( -- )` set current base to decimal. ### `defer ( "name" -- )` create a new word, the behaviour of which can be controlled with `defer!`, `defer@`, `is` and `action-of`. initially it is set to yield an error. ### `defer! ( xt1 xt2 -- )` set the deferred word xt2's behaviour to xt1. ### `defer@ ( xt -- xt' )` retrieve the xt' which the deferred word xt is set to execute. ### `does> ( -- )` modify the behaviour of the most recent `create`d word. (non-`create`d words will be corrupted.) ### `dp ( -- a )` a variable that contains the lowest free byte of memory in user memory. ### `dp0 ( -- )` a variable that contains the first byte of user memory. ### `dp$ ( -- )` a variable that contains the last available byte of user memory. ### `drop ( u -- )` remove the value at the top of the stack. ### `dup ( u -- u u )` duplicate the value at the top of the stack. ### `e." ( -- ) IMMEDIATE COMPILE-ONLY` compile into the current definition the following string (terminated by `"`) being written to error output. ### `else ( -- ) IMMEDIATE COMPILE-ONLY` update the current if statement to branch here when the flag is false, and skip to `then` if the corresponding `if` was true. ### `emit ( c -- )` print the single character c to output. ### `executable ( a u -- )` marks the u bytes starting at address a as executable. this is used primarily to mark the program break, which is used as the user memory space. ### `execute ( xt -- )` call the word xt. ### `exit ( -- ) IMMEDIATE COMPILE-ONLY` compile into the current definition a return instruction. ### `false ( -- u )` a cell with no bits set. ### `find ( a u -- a u 0 | a -1 )` look in the dictionary for the word a (of u characters). a zero is returned along with the original given string if no word was found. if a word was found, its link field address is returned along with the true flag. ### `free ( a u -- e )` free the given block of memory created by `allocate`. ### `grow ( u -- )` grows, and marks as executable, the user memory space by u bytes. ### `handler ( -- a )` variable containing the address of the current error handler. ### `here ( -- a )` yields the address of the first available byte in user memory. ### `hex ( -- )` set current base to hexadecimal. ### `hide ( "word" -- )` set the smudge bit on the given word. ### `hijacks ( xt "word" -- )` 'hijack' an existing definition to perform the action of xt. this word *will* corrupt the dictionary if used outside its very specific context (replacing core assembly words with better versions in forth), so it should be avoided in favour of `defer` and friends. ### `hld ( -- a )` the address of the beginning of the used section of the pad buffer. ### `hold ( c -- )` add the given charater into the numeric output buffer. ### `if ( ? -- ) IMMEDIATE COMPILE-ONLY` if the flag is true, execute the following if statement, terminated by `else` or `then`. ### `init-source ( -- n )` yield the value of source-id when processing the initialisation script. ### `immediate ( -- )` mark the most recently defined word as immediate. ### `immediate? ( ht -- ? )` true if ht is marked immediate, false otherwise. ### `interpret ( -- )` interprets the contents of the terminal input buffer until it runs out. ### `invert ( u -- u' )` invert all bytes in u. ### `is ( xt "name" -- ) IMMEDIATE` set the deferred word name to execute xt. ### `latest ( -- a )` a variable containing the execution token of the most recently created word. ### `literal ( n -- ) IMMEDIATE COMPILE-ONLY` compile a push of the literal value n into the currently compiling word. ### `mmap ( offset fd flags prot u a -- u ) ` perform a mmap(2) system call. ### `munmap ( u a -- u ) ` perform a munmap(2) system call. ### `nip ( u1 u2 -- u2 )` drop the second-highest value from the stack. ### `nonaming ( -- ? )` a `value`: true if the currently compiling word is a `:noname` word. ### `number ( a u -- n -1 | 0 )` convert given string into a number along with a flag. if parsing a number fails then 0 (false) is returned and no number is provided. ### `octal ( -- )` set current base to octal. ### `open-file ( z mode -- fd e )` open the given file with the given mode (probably r/w, r/o or w/o) and yield the resulting file descriptor note that this word uses the pad. ### `or ( u1 u2 -- u )` perform bitwise OR on u1 and u2. ### `over ( u1 u2 -- u1 u2 u1 )` copy the second-highest value on the stack and move it to the top of the stack. ### `pad ( -- a )` the address of the start of the pictured numeric output buffer. ### `pad$ ( -- a )` the address of the end of the pictured numeric output buffer. useful because numeric output uses the buffer high-memory first. ### `parse ( "name" c -- a u )` parse one word from the input buffer, separated by a newline or the character c, and return as a string. ### `parse-name ( "name" -- a u )` parse one whitespace-separated word from the input buffer, and return as a string. tabs (ascii 0x09), newlines (ascii 0x10), and spaces (ascii 0x20) are considered whitespace. ### `postpone ( "name" -- ) IMMEDIATE COMPILE-ONLY` compile the execution behaviour of a word into the current definition. if the word is immediate, that will execute the word at runtime (like `[compile]`). if the word is not immediate, this will compile code that compiles that word. ### `private{ ( -- )` mark the start of a private section closed by `}private` and activated with `privatise`. ### `}private ( -- )` mark the end of a private section opened by `private{` and activated with `privatise`. ### `privatise ( -- )` activate a private section. ### `r/o ( -- 0 )` a constant, meaning 'read only', used for file I/O. ### `r/w ( -- 2 )` a constant, meaning 'read and write', used for file I/O. ### `r> ( -- u ) ( R: u -- )` move a value from the return stack to the working stack. ### `rdrop ( R: u -- )` remove the value at the top of the return stack. ### `read-file ( a u fd -- u' e )` read u bytes from fd into memory location a. u' is the number of bytes read. ### `repeat ( -- ) IMMEDIATE COMPILE-ONLY` in a begin-while-repeat loop, loop back to the condition. ### `rot ( u1 u2 u3 -- u2 u3 u1 )` rotate the top three values on the stack so that the third highest value is moved to the top. ### `rp ( -- a )` yield the address of the return pointer. note that the address points to the return stack *before* this word was called. ### `rp0 ( -- a )` a variable containing the value of the return stack at the beginning of the program. ### `s" ( "string" -- , COMPILES: -- a u ) IMMEDIATE COMPILE-ONLY` compile into the definition code to push the given string, terminated by a double quote. the string data and length are stored inline in the definition. ### `s>z, ( a u -- a )` compile into user memory a copy of the given regular string converted to a null-terminated string. ### `sign ( n -- )` add a minus sign to the numeric output buffer if n is less than zero. ### `smudge ( -- )` toggles the smudge bit on the xt in latest. ### `source-id ( -- n )` yield either the file descriptor of the current input file, -1 if the current input is from a string, or -2 if the current input is the initialisation script built into the binary. ### `sp ( -- a )` yield the address of the stack pointer. note that the address points to the stack *before* this value is pushed. ### `sp-reset ( -- )` reset the working stack pointer to its starting value. ### `state ( -- a )` a variable containing a boolean value. if 0 (false), the system is in interpreting mode, if -1 (true), the system is in compiling mode. ### `stderr ( -- 2 )` push the file descriptor of stderr to the stack. ### `stdin ( -- 0 )` push the file descriptor of stdin to the stack. ### `stdout ( -- 1 )` push the file descriptor of stdout to the stack. ### `string-source ( -- n )` yield the value of source-id when processing a string. ### `swap ( u1 u2 -- u2 u1 )` swap the two topmost values on the stack. ### `sys-close ( fd -- n )` perform a `close(2)` system call on the given file descriptor. ### `sys-open ( mode flag z -- n )` perform an `open(2)` system call on the file named in the null-terminated string z, using the given file mode and flags. ### `sys-read ( u a fd -- n )` perform a `read(2)` system call, reading into the buffer `u a` from file descriptor `fd`. n is the resulting value of the register `rax`. ### `sys-write ( u a fd -- n )` perform a `write(2)` system call, writing the string `u a` to file descriptor `fd`. n is the resulting value of the register `rax`. ### `syscall0 ( rax -- u )` perform the syscall with the id in `rax`, and push the value of the `rax` register to the stack. ### `syscall1 ( rdi rax -- u )` perform the syscall with the id in `rax`, taking one parameter placed in `rdi`, and push the value of the `rax` register to the stack. ### `syscall2 ( rsi rdi rax -- u )` perform the syscall with the id in `rax`, taking two parameters placed in `rdi` and `rsi`, and push the value of the `rax` register to the stack. ### `syscall3 ( rdx rsi rdi rax -- u )` perform the syscall with the id in `rax`, taking three parameters placed in `rdi`, `rsi` and `rdx`, and push the value of the `rax` register to the stack. ### `syscall4 ( r10 rdx rsi rdi rax -- u )` perform the syscall with the id in `rax`, taking four parameters placed in `rdi`, `rsi`, `rdx` and `r10`, and push the value of the `rax` register to the stack. ### `syscall5 ( r8 r10 rdx rsi rdi rax -- u )` perform the syscall with the id in `rax`, taking five parameters placed in `rdi`, `rsi`, `rdx`, `r10` and `r8`, and push the value of the `rax` register to the stack. ### `syscall6 ( r9 r8 r10 rdx rsi rdi rax -- u )` perform the syscall with the id in `rax`, taking six parameters placed in `rdi`, `rsi`, `rdx`, `r10`, `r8` and `r9`, and push the value of the `rax` register to the stack. ### `then ( -- ) IMMEDIATE COMPILE-ONLY` conclude an if statement. ### `tib ( -- a )` a variable containing the address of the current input buffer. ### `to ( comp: "name" -- , intr: u "name" -- ) IMMEDIATE` compile or execute (depending on `state`) code to modify the contents of a `value`. ### `true ( -- u )` a cell with all bits set. ### `tuck ( u1 u2 -- u2 u1 u2 )` place a copy of the highest value on the stack below the second highest value on the stack. ### `type ( a u -- )` write u characters at a to output. ### `u< ( u1 u2 -- ? )` return true if u1 is less than u2. ### `u<= ( u1 u2 -- ? )` return true if u1 is less than or equal to u2. ### `u<> ( u1 u2 -- ? )` return true if u1 and u2 are not equal. ### `u> ( u1 u2 -- ? )` return true if u1 is greater than u2. ### `u>= ( u1 u2 -- ? )` return true if u1 is greater than or equal to u2. ### `until ( ? -- ) IMMEDIATE COMPILE-ONLY` if the given flag is true, loop back to `begin`. ### `value ( u "name" -- )` create a value called name, the initial value of which is u. ### `variable ( "name" -- )` create a variable word, which yields an address that can be written and read. ### `w/o ( -- 1 )` a constant, meaning 'write only', used for file I/O. ### `warn ( a u -- )` write u characters at a to error output. ### `while ( ? -- ) IMMEDIAT COMPILE-ONLYE` if given flag is true, continue the current begin-while-repeat loop, otherwise branch to after. ### `write-file ( a u fd -- u' e )` write u bytes from a into fd. u' is the number of bytes written. ### `xor ( u1 u2 -- u )` perform bitwise XOR on u1 and u2. ### `z" ( "string" -- , COMPILES: -- a ) IMMEDIATE COMPILE-ONLY` compile into the definition code to push the given string, terminated by a double quote. the string is null terminated and does not store a length; this is meant for interfacing with the linux system. ### `zstrlen ( a -- u )` the length of a null terminated string in bytes. the ending null byte is not counted. ## dictionary format note that the string length of one byte limits a word's name to 255 characters. | field | size | | :---- | :--- | | link to previous word | 8 bytes | | flag field | 1 byte | | string length | 1 byte | | string | <256 bytes | | code | variable length | ## reserved registers the register `r15` is reserved for the parameter stack pointer. ## differences from standard forth for the most part this forth intends to be in line with standards but it diverges in a few notable places: - the most visally obvious one by far, this forth uses lower case word names for core words. - `find` takes `a u` instead of a counted string, and does not return 1 for immediate words. - PNO words (`<# # #>` etc.) work with single cell numbers. this is because this forth has no double number support. (128 bit integer arithmetic does not seem all that useful to me) - the dynamic allocation `free` word requires a length. this is because munmap requires a length. - `abort"` does not take a flag and always executes. ## license public domain. although: a modified version of john hayes' test suite is used which is under 'distribute but you have to include the copyright notice' it's included in the source