lib4th details

[deutsch] : [back] : [main page] : [get L4] :-: [mail] :

Fight Patents abuse by EC authorities - Gegen Patentmißbrauch durch die EU-Behörden.

lib4th (L4) is

an (almost) ANS-Forth compliant implementation of the Forth Environment, with some minor exceptions which were required due to this particular implementation, for real-time control and multi-tasking (both of which yet remain to be implemented), scripting, immediately executable (subroutine threaded) native code compiling and interpreting.
The Forth kernel is supplied as an independent sharable dynamically loadable library which requires no other libraries present (no "libc"!) as well as by statically linked executeables. It enables hi-level Forth programming as well as easy and much simplified direct access at assembly level; L4 compilation includes all system-calls of the rsp Linux host (2.2, 2.4 tested) which then will be available by name, in the Forth system.

Btw, though neither executional speed nor sophisticated code optimization, which currently is not implemented at all, were ever considered overly important design goals, the 'subroutine threaded compiled native code interpreting' concept - which was chosen mainly for ease of implementation and access - worked out such that lib4th provides quite efficient and fastly executing code. A simple example, screenfile to sequential conversion, proved by about twice as fast as its "C"-ish counterpart - even though "on the fly" compiling the additionally required words from text source. Timing taken w. Linux "time ..." command, which isn't highly exact, i.e. deviates by about 30% (and more) and, the results apparently are a function of 'physical' library size, too, due to memory pages swapping &c.

The sources are (about 1MB of) plain IA-32, intel syntax assembly, commented to every detail and thus might also serve as some tutorial, not on how things should be done - I leave that to the "experts" - but, on how things work.

Applications
Not many ready, yet...
The publicly available ANS-4th testing programs, "coretest", "bench", etc. were run successfully, though, even an assembler and a disassembler. The 'block-file' editor "f8ed" is an example to intitial commands execution by a basic binary, which compiles the editor from the Forth source <edit.f8> Further, a minimal example application, "f8" which enables access to the Forth compiler & interpreter, is just 1K of size. "calc4th", another small example the name of which doesn't mean more than the original intention of what soon grew to "lib4th", is a quite comfortable commandline or inter-active r.p.n. calculator tool for integer values in the range of -(2^63)..2^63, up to unlimited, "BIGNUMS" (including -ve values) or, "REAL NUMBERS" in floating point or integer with fractions display mode - wrt (almost) *any* number radix 2...256. Not much different to "f8", only the minimal program I managed to do, 628 bytes, simply the entry to the Forth kernel library.
"htmlxref.f8" generates the asm sources cross references in about 5 min ("compile-by-disp" mode), fairly fast if compared to the 50 min by the previously used bash script. It scans about 1M of source text for any occurrences of 2200+ Forth entries and generates a list w. links to the rsp words' assembly source, asm files and lines where they are referred and, links to the concerning glossary entries. It is also an example of how to efficiently (by more than three times faster than w. direct file access) use {MMAP}-ed files and, {EVALUATE} which generates {VALUES} etc for file ptrs, memory address and running ptrs, from the source-files names, passed w. command execution.

ANS-4th specific IEEE 754 style FLOATING words won't be implemented, for which I have two strong reasons (and one excuse:):

Such numbers human readable representation is not clearly defined.

Fancy rounding algorithms applied to floats leave another uncertainty.

lib4th implements "real numbers" and conversion to/from IEEE format and, unlimited size integer arithmetics.

ANS-Forth Wordsets complete, per v1.0.3, with the exception of a few words,

CORE - fig-4th version LEAVE supplied, ANS-word additionally from file.

EXCEPTION - CATCH/THROW not implemented, (untested) ANS-examples from file.

Additional words in separate vocabularies,

ANS

BIGNUM

BLKFILE

COMPILER

EDITOR

FIG

FLOAT

LINUX

RATIONAL

VT

.. &c ..

DPANS94 conformance:

"lib4th"

implements all ANS-Forth wordsets (with the above mentioned exceptions) but, is not an "ANS-Forth" system.

can execute ANS-Forth programs, which do not rely on (silly) programming "tricks".

can be used to ANS-Forth compliant programming.

consistently, stores any data items in "little endian" byte order.

Notes to

Documentation

Compiling an ANS-Forth program

FLOATing words

Numeric input

Local Values

>BODY

>DOES

BLOCK

Strings

Terminal

re glossary files for further information (or the german page for a more detailed overview).

Documentation
supplied by ascii text 'Glossary' and several, ordered and linked by different criteria html files, 'l4gls.html', 'l4toc.html' &c; plus an assembly labels' cross reference list, 'l4xref.html'. Those files can be derived from the L4 sources and, by default would be installed to '/usr/share/doc/lib4th/' plus, some examples to '/usr/local/lib/f8/' (all glossary files archive).
While in an L4 program { HELP forthword } can, at any time, be used to get the respective glossary entry displayed, as found in sequence of all {voc-link}ed definitions; { HV HELP forthword } restricting the search to the actually stacked vocabularies, top (most recent) down.
{ V forthword } displays the rsp. "lfa" and the vocabulary name where it is defined in or, "-?-" if the word isn't known to the entire system. Use { VOCS } to get the information of which vocabularies exist and, { ORDER } to view the actually stacked search order and the currently extended vocabulary's name.

Compiling an ANS-Forth program
most probably requires very little modifications. If not 'ticking' and compiling immediate, state-dependently operating words to non-immediate definitions, or applying "assumptions" about mixing arithmetic "numbers" with bit-wise or flags operations there should be no modifications required, at all. For instance, deriving a '2s-complement signed number' by { 0 INVERT 1 RSHIFT INVERT CONSTANT MIN-INT } probably is 'numeric overflow into sign bit' but, otherwise plain nonsense - re DPPANS94 3.1.3.2 - which yields a meaningless bit pattern ("negative zero", i.e. ordinary "zero" w. sign bit set).
"little endian" ordered data storeage also conflicts with some ANS-Forth "assumptions" which the standard document justifies by the 'requirement' of enabling inconsistent data access, e.g. { .. >R .. >R ... 2R> .. } - which won't always give the expected result! Consequently, store and fetch any data items w/ the rsp. words in a strictly consistent manner and do not "assume" about multi-cells order in memory, to safely avoiding any such problems. Besides, data-stack push by 'increment & store' would comply but, returnstack and memory operations won't.
Most such words are included to the {ans} vocabulary in a compliant operating mode, to efficiently isolating the basic system from those ANS-ish oddities.
There are many more aspects to which it would not be wise to 'assuming' or, applying nebulous 'common practice' - which could yield quite impressively stupid programming faults, not just w/ lib4th...

FLOATing words
in "rational" and "FLOAT" vocabularies, implement the complete ANS wordsets and more but, are not standard: Floating point numbers are implemented as (multiple cells groups of) ordered cells' pairs representing the numerator and denumerator of (improper) fractions which easily interface w/ the integer Forth operations, facilitate high accuracy and, often resolve to something quite trivial, e.g. the reciprocal is, basically, just a 2SWAP, division by { 2ROT D* 2ROT 2ROT D* }. The wordset includes a FOR/NEXT loop construct which operates on floats and, provides additional words for conversion between the internal 'ranum' and IEEE 754 normalized double floats representation.

Numeric input
according to BASE setting, whether "FLOAT" or integer and, by praefix:

%(num) binary

!(num) quaternal(?, base = 4)

&(num) octal

@(num) octal

#(num) decimal

§(num) duodecimal

$(num) sedecimal

0x(num) sedecimal

-(praefix)(num) the sign character "-" may precede any or, follow a single char base setting praefix.
\(praefix)(num) FLOATing (double "ranum") format, base setting praefix may follow,
or, fetching a "quad" integer (128 bits) if in persistent {FLT} mode, which defaults to FLOATing input.

^(char) ascii control code by (char.code - 64)

"(chars) up to four characters - leading "-" negates

Local Values
implementation based on a common local space allocation scheme of unlimited size which does not interfere w/ any other data access or reference method, i.e. stacks, dictionary, word names, etc, in compiling or in interpreting state, enabling 'dynamic' allocation, at runtime. The number of locals is not limited other than by available code-memory and data-stack area (at runtime) or data-memory (while compiling). Locals' names are 'normal' Forth names, up to 255 non-ctrl chars (bytes) w/o any specific restrictions. - I.e. locals are just locally valid, temporary VALUEs, etc, within a particular Forth word, accessible across any number of nested levels, nothing more, nothing less!

>BODY
is not a constant, neither wrt word header nor any other reference! Which also applies to >DATA, the aequivalent for kernel defined variables which were copied to writable memory, at lib4th startup.

>DOES
is not clearly described in the dpans94 document wrt 'multiple DOES>', modifying the same definition. Besides the document mentioning #the# DOES> which I'd interpret as inhibition of 'multiple..', according to a 'behaviour' apparently agreed upon in rsp discussions at c.l.f, L4 if __mdoes defined while library compilation, implements it such that after invocation of a newly defined word it would consecutively invalidate the latest executed DOES> and execute the remaining ones, until the last which always remains effective. Note that though dpans94 being quite un-precise, section A.6.1.1250 clearly inhibits the nesting of DOES> in any kind of control flow structures!

BLOCK
"screenfile" buffers are allocated as required, by Linux "PAGE_SIZE"d blocks (4k) of four block-buffers at page-aligned address in data-space which thus may also be <mmap>ped, and can be removed with {forget}. Buffers are self-maintaing w/ blockfile wordsets.
{L-LOAD} by hi-level defn from <load.f8> provided for hi-level Forth library access, by a selector name and the (optional) library-file name. {LOAD} and {L-LOAD} may be nested to as many levels as the available memory permits; LOAD using about 30 cells in return-stack, L-LOAD additionally using 4 cells plus a previously valid selector words' size in local data-space and, thus somewhat limited, a temporary entry in the file "channels" table (of up to 54 user items), until return. {-->} fastly, w/o any memory or stack overhead, by {REFILL} proceding to next screen, efficiently obsoleting the use of {THRU}.
Pse, note that BLK by its own cannot be used to revert to keyboard input, as long as SOURCE-ID holds a non-keyboard channel - which, btw, enables LOADing from block 0 of a screenfile.

Strings
are 'counted strings' of up to 255 characters (bytes), by default, which will be stored in zero counted "asciz" format, automatically, if that limit is exceded, by {."}, {S"}, {PLACE} &c, and modified as required by {+PLACE}. L4 "asciz" strings begin and end with a <nul> byte which enables the {COUNT} word to fetching the correct value, by its own. {Z"} and {SZLITERAL} can be used to forcedly storing a string in true "asciz"-mode, i.e. wo. the leading <nul> byte; {ZCOUNT} provided to getting the length of that variant. Consequently, there isn't #the# count byte, { 1+ DUP 1- C@ } not necessarily aequivalent to {COUNT}; lazy, 'common practice' might be badly mis-leading...
Other string related words:

Determining a character's position, per byte, by {skip}, {scan}, {rskip}, {rscan}.
{scan} for the 1st character equal to a supplied one, {skip} scanning for the 1st character not equal to the supplied one, {rskip} and {rscan} correspondingly, but beginning at the rsp string's end. All returning the ptr to and remaining count at the found position.
A special mode, by two <bl>-s, coded as 8224, can be used to including the control characters, code 0..31, when scanning for the blank space which, for instance, in L4 is used with {PARSE} and related words to also recognize the line end as a word's delimiter.

String slicing
{bslice} w/ delimiting characters' positions, {cslice} slice delimited by chars, or {sslice} by enclosing text segments.

Exchanging of text sequences determined from two separate lists, w/w.o a given marker sequence, by {replace} and {substitute}.

{type} by default is vectored to {[type]} which is deferred to an escape sequences evaluating variant, {[etype]}, recognizing the 'bash' abbreviations, "\n" &c, plus "\-" as a 'noop' operator which can be used to delimiting numeric "\nnn" escapes followed by regular numeric characters, and insertion of any characters by their ascii-code, octal or, optionally prefixed binary, decimal, sedecimal.
{e\stg} provided separately, for the escape sequences conversion.
No other words contain the conversion; particularly, any strings are stored by unmodified text source, converted only by {type} if vectored to {[etype]} or, with {e\stg}. Revert to "normal", non-escaping {type}, e.g. with { ' [ctype] is [type] }.

Terminal
'capabilities' by "termcap" as well as "terminfo" are in an utterly confused, bad state. Linux distributions and even single programs supplying own definitions which their authors due to whatever reason seem to consider superior to the defaults but, none of those substitutes appear less confused. Therefore "lib4th" won't care, at all, but leave the rsp. definitions up to the user. Which isn't that difficult because all those terminals and emulations provide some means of configuration. Only, they can't be relied upon by an externally supplied program because of the rotten 'termcap' (etc.) state.
Consequently, to provinding a minimum of certainty the rsp L4 words are implemented such that the Linux console and the "rxvt" terminal emulator for X-Windows are equally controllable, as much as possible, to which the screen-file editor, "f8ed", may serve as an example.

Which is all, hopefully, sufficiently explained in the docs, and shouldn't arise any problems.
I'd be much interested in comments and suggestions, pse. mailme.

# L4-Forth vs C:
"f8" simple lib4th script which enters the 4th system and, compiles and executes the required words for screenfile to sequential conversion:
	    	#! /usr/local/bin/4 ss
		editor also hidden also linux also forth
		in-chan ch-pipe?
		[IF] in-chan [ELSE] 4 argstg r/o open-file drop [THEN] 
		prot-r 2 rot mmap-file
		out-chan ch-pipe? 
		[IF] out-chan work to-chan [ELSE]
		    5 argstg
		    -dup 0= if drop 4 argstg endif
		    w/o create-file 2drop
		[THEN] 
		-2 value cl (eol) @ integer <nl>
		enter begin
		 -trailing cl over if drop -2 endif 1+ dup to cl
		 0< if <nl> work write-char endif work write-file 2drop
		entry readm-c/l ?d0= until 
		<nl> work write-char 0 bye-r
	
The timing figures, w/ ca. 200K <forth_scr> screenfile:

$ L4-1-1-1+70: time blk2seq <forth.scr >forth.scr.seq

real 0m0.060s

user 0m0.030s

sys 0m0.030s

$ C: time fromblock <forth.scr >forth.scr.seq

real 0m0.131s

user 0m0.130s

sys 0m0.000s

[deutsch] : [back] : [main page] : [get L4] :-: [mail] :

: f.i.g. UK : FORTH e.V. :

H.-Peter Recktenwald, Berlin, 12.Juni 2000 = .hpr.l0 = :