snobol4 - SNOBOL4 interpreter
SYNOPSIS
snobol4 [ options ... ] [ file(s) ... ] [ params ... ]
DESCRIPTION
This manual page describes a port of the original Bell
Telephone Labs (BTL) Macro Implementation of SNOBOL4
(MAINBOL) to machines with 32-bit C compilers by Philip L.
Budne. The language and it's implementation are described
in [1] and [2]. Extensions from Catspaw SNOBOL4+, SPITBOL
and SITBOL have been added.
Limitations
All aspects of the language are implemented except;
o Trapping of arithmetic exceptions.
o Dynamic use of the LOAD() on platforms not using
the standard dlopen(3) interface for shared
libraries or a.out(5) executable format is not sup-
ported. User functions can be staticly linked (poor
man's loading) on ALL platforms. See load.doc for
more information.
Changes
The following behaviors have been changed from the origi-
nal Macro SNOBOL4;
o Listings are disabled by default. Default listing
side (when enabled by -LIST or the -l option is
LEFT. Listings are directed to standard output.
o Error messages, the startup banner and statistics
are directed to standard error. Compilation error
messages (including erroneous lines) appear on
standard error as well as in the listings. Error
messages now reference the source file name and
line number.
o Character set (see below).
o The PUNCH output variable no longer exists (see
TERMINAL variable below).
o I/O is not performed using FORTRAN I/O. The 3rd
argument to the OUTPUT() and INPUT() functions are
interpreted as a string of I/O options (see below).
Listing statement numbers show the statement number
of the LAST statement on the line (rather than the
first).
o Setting the &ABEND keyword causes a core dump upon
termination!
o The value of the &CODE keyword determines the exit
status of the snobol4 application.
o The DATE() function returns strings of the form:
MM/DD/YYYY HH:MM:SS. See extensions section for
arguments to the DATE() function.
o Keyword &STLIMIT now defaults to -1. When &STLIMIT
is less than zero there is no limit to the number
of statements executed, and &STCOUNT is not incre-
mented.
o VALUE tracing applies to variables modified by
immediate value assignment ($ operator) and value
assignment (. operator) during pattern matching.
o The BACKSPACE() function is not implemented. See
SET() below.
o I/O unit numbers up to 256 can be used.
Character set
The internal character set and collating sequence are
USASCII-8. Any 8-bit byte pattern is accepted as a SNOBOL
datum or in a string constant of a SNOBOL source program.
The value of the SNOBOL protected keyword &ALPHABET is a
256-character string of all Bytes from 0 to 255, in
ascending order. Programs may be entered in mixed case.
By default lower case identifiers are folded to upper case
(see &CASE and -CASE extensions).
The following operator character sequences are permitted
and represent a cross between PDP-10 MAINBOL (a.k.a.
DECBOL), SITBOL and Catspaw SPITBOL usage;
Exponentiation: ^ **
Alternation: | !
Unary negation: ~ \
Assignment: = _
Comment line: * # | ; !
Continuation line: + .
Both brackets ([]) and angle-brackets (<>) may be used to
subscript arrays and tables. The TAB (ASCII 9) character
sequence at the top of a file (ie;
"#!/usr/local/bin/snobol4 -b"). Underscore (_) and period
(.) are legal within identifiers and labels.
Extensions
ARRAY/TABLE access
Multiple ARRAY and/or TABLE index operations may
appear in a row, without having to resort to use of
the ELEMENT function, so long as no intervening
spaces (or line continuations) appear.
BREAKX()
The BREAKX() function is a pattern function used
for fast scanning. BREAKX(str) is equivalent to
BREAK(str) ARBNO(LEN(1) BREAK(str)). In other
words BREAKX matches a sequence of ever larger
strings terminated by a break set. BREAKX can be
used as a faster matching replacement for ARB since
BREAKX('S') 'STRING' always runs faster than ARB
'STRING' since it only attempts matching 'STRING'
at locations where an 'S' has been detected.
Case folding
By default the compiler folds identifiers and
directives to upper case, so programs can be
entered in either case. To disable case folding
use the directive -CASE 0 or -CASE. To re-enable
case folding use directive -CASE n where n is a
non-zero integer. The status of case folding may
be examined and controlled from a running program
by the unprotected system keyword &CASE.
CHAR()
The CHAR() function takes an integer from 0 to 255
and returns the n'th character in &ALPHABET .
DATE() For compatilbility with new versions of Catspaw
SPITBOL DATE(0) returns strings of the form
MM/DD/YY HH:MM:SS, and DATE(2) returns strings of
the form YYYY-MM-YY HH:MM:SS. With any other argu-
ments DATE() returns strings of the form MM/DD/YYYY
HH:MM:SS.
-ERROR/-NOERRORS
Directives -ERROR and -NOERRORS control execution
of program with compiler errors. If the -ERROR
directive is given, the program will be executed
(but any attempt to execute a statement with a com-
piler error will cause a fatal execution error).
By default programs with compiler errors will not
be started, this can be restored using -NOERRORS.
After an otherwise fatal error is curtailed due to
a non-zero value in &ERRLIMIT, the protected key-
word &ERRTEXT will contain the error message.
-EXECUTE/-NOEXECUTE
Directives -EXECUTE and -NOEXECUTE control execu-
tion of programs. If the -NOEXECUTE directive is
given, the program will be not executed after com-
pilation. -EXECUTE Cancels any previous -NOEXE-
CUTE.
FREEZE()/THAW()
The FREEZE() function prohibits new entry creation
in the referenced table. This is useful once a
table has been initialized to avoid the creation of
empty entries that normally occurs on table entry
lookup failure. This can greatly improve program
speed, since frozen tables will not become clogged
with random entries. Lookups for uninitialized
entries will return the null string, however
attempts to create a new entry will cause a "Vari-
able not present where required" error. The THAW()
function restores normal entry creation behavior.
HOST()
A limited simulation of the SPITBOL HOST() function
is included. It does not perform argument type
conversions, so all arguments must be supplied as
either INTEGER or STRING data types.
HOST() with no parameters returns a string describ-
ing the system the program is running on. The
string contains three parts, separated by colons.
The first part describes the physical architecture,
the second describes the operating system, and the
third describes the language implementation name.
Example: sun4c:SunOS 4.1.4:C-MAINBOL 0.98.3
HOST(0) returns a string containing the command
line parameter supplied to the -u option, if any.
If no -u option was given, HOST(0) returns the con-
catenation of all user parameters following the
input filename(s).
HOST(1,string) passes the string to the system(3) c
library function, and returns the subprocess exit
status.
HOST(2,n) for integer n returns the n'th command
line argument (regardless of whether the argument
was the command name, an option, a filename or a
user parameter) as a string, or failure if n is out
indicating the first command line argument avail-
able as a user parameter.
HOST(4,string) returns the value of the environment
variable named string.
-INCLUDE
The -INCLUDE directive causes the compiler to
interpolate the contents of the named file enclosed
in single or double quotes. Any filename will be
included only once, this can be overridden by
appending a trailing space to the filename. Trail-
ing spaces are removed from the filename before
use. If the file is not found in the current work-
ing directory an attempt will be made to find it in
the directory specified by the SNOLIB environment
variable, or if that is not set, a predetermined
library directory.
-COPY is a synonym for -INCLUDE for compatibility
with SPITBOL/370.
IO_FINDUNIT()
The IO_FINDUNIT() function returns an unused I/O
unit number for use with the INPUT() or OUTPUT()
functions. IO_FINDUNIT() is meant for use in sub-
routines which can be reused. IO_FINDUNIT() will
never return a unit number below 20.
&LINE/&FILE
The &LINE and &FILE keywords can be used to deter-
mine the source file and file line associated with
the current statement. The &LASTLINE and &LASTFILE
return the source file and file line associated
with the previous statement.
-LINE
The -LINE directive can be used to alter SNOBOL's
idea of the current source file and line (ie; for
use by preprocessors). -LINE takes a line number
and an optional quoted string filename.
Lexical comparison
A full set of lexical (string) comparison predi-
cates have been added to complement the standard
LGT() function; LEQ(), LGE(), LLE(), LLT(), LNE().
LPAD()/RPAD()
The LPAD() and RPAD() functions take the first
argument (subject) string, and pad it out to the
length specified in the second argument, using the
first character of the optional third argument. If
ject will be returned unmodified if already long
enough.
Named files
Filenames can be supplied to the INPUT() and OUT-
PUT() functions via an optional fourth argument.
If the filename begins with a vertical bar (|), the
remainder is used as a shell command whose stdin
(in the case of OUTPUT()) or stdout (in the case of
INPUT()) will be connected to the file variable.
The filename - (hypen) is interpreted as stdin on
INPUT() and stdout on OUTPUT(). The magic file-
names /dev/stdin, /dev/stdout, and /dev/stderr
refer to the current process standard input, stan-
dard output and standard error I/O streams respec-
tively regardless of whether those special file-
names exist on your system. The magic pathname
/dev/fd/n, opens a new I/O stream associated with
file descriptor number n. The magic pathname
/tcp/hostname/service can be used to open connec-
tion to a TCP server. If the path ends in the
optional suffix /priv the connection will be origi-
nated from a privileged local port using rresv-
port(3).
&PARM
The entire command line is available via the &PARM
protected keyword for compatibility with Catspaw
SNOBOL4+. Use of the SPITBOL compatible HOST()
function is probably preferable.
REVERSE()
REVERSE() returns it's subject string in reverse
order.
Scientific notation
REAL number syntax has been expanded to allow expo-
nents of the form:
ANY('Ee') ('+' | '-' | '') SPAN('0123456789').
Exponential format reals need not contain a decimal
point.
SET()
The SET() function can be used to seek the file
pointer of an open file. The first argument is an
I/O unit number, the second is an integer offset.
The third argument, an integer determines from
whence the file pointer will be adjusted. If
whence is zero the starting point is the begining
of the file, if whence is one, the starting point
is the current file pointer, and if whence is two,
the starting point is the end of the file. SET()
4.4BSD on i386) the return value will be truncated
to 32-bits, and only the first and last 4 gigabytes
of a file can be accessed directly.
SITBOL file functions
FILE(string) is a predicate which returns the null
string if it's argument is the name of a file that
exists, and fails if it does not. DELETE(string)
is a predicate which tries to remove the file named
by it's argument, and fails if it cannot.
RENAME(string1,string2) is a predicate which
attempts to rename the file named by string2 to the
file named by string1. Unlike the SITBOL version,
if the target file exists, it will be removed.
SNOBOL4+ real functions
EXP(), LOG() and CHOP() functions are available for
compatibility with SNOBOL4+. EXP() returns the
value e ** x, LOG() returns the natural logarithm
of it's REAL argument, and CHOP() truncates the
fractional part of it's REAL argument (rounding
towards zero), and returns a REAL.
SORT()/RSORT()
The SORT() and RSORT() functions take a array (or
table, which is first converted to an array). The
array may be singly-dimensioned if which case , if
non- null, will indicate the field of a programmer-
defined data type on which the sort is based. A
may also be a table or a doubly-dimensioned array.
In these cases F may be an integer indicating the
column on which to sort. If F is null, it is taken
to be 1. The array A is not modified; a new array
is allocated and returned. SORT() sorts elements
in ascending order, while RSORT() sorts in descend-
ing order.
SPITBOL operators
The SPITBOL scan (?) and assignment (=) operators
have been added. A pattern match can appear within
an expression, and returns the matched string as
it's value. Similarly assignment can appear in an
expression, and returns the assigned value. An
assignment after a scan (ie; STRING ? PATTERN =
VALUE) performs a scan and replace. Assignment is
right associative, and has the lowest precedence,
while scan is left associative and has a precedence
just higher than assignment.
The SPITBOL selection/alternative construction can
be used in any expression. It consists of a comma
separated list of expressions inside parentheses.
struction may result in incomprehensible code.
The type NUMERIC with CONVERT() and the removal of
leading spaces from strings converted to numbers
(implicitly or explicitly) are also legal when
SPITBOL extensions are enabled. SPITBOL extensions
can be enabled and disabled using the -PLUSOPS
directive. -PLUSOPS 0 disables SPITBOL operators,
while -PLUSOPS or -PLUSOPS n where n is a non-zero
integer enables them. SPITBOL extensions are
enabled by default.
SQRT()
The SQRT() function is available for compatibility
with SPARC SPITBOL. SQRT() fails if the argument
is negative, but does not cause a fatal error.
SUBSTR()
SUBSTR() takes a subject string as it's first argu-
ment, and returns the substring starting at the
position specified by the second argument (one-
based) with a length specified by the third argu-
ment. If the third argument is missing or zero,
the remainder of the string is returned.
TERMINAL I/O variable
The variable TERMINAL is associated with the stan-
dard error file descriptor for both input and out-
put.
Trig functions
SIN(), COS() and TAN() functions are available for
compatibility with SPARC SPITBOL and take arguments
in radians.
&UCASE/&LCASE
Protected keywords &UCASE and &LCASE contain upper
and lower case characters respectively.
I/O Associations
I/O is performed by associating a variable name with a
numbered I/O unit using the INPUT() and OUTPUT() func-
tions. The following associations are available by
default;
Variable Unit Association
INPUT 5 standard input
OUTPUT 6 standard output
TERMINAL 7 standard error (output)
TERMINAL 8 standard error (input)
The third argument of the INPUT() and OUTPUT() functions
is interpreted as a string of single letter options, com-
mas are ignored. Some options effect only the I/O vari-
able named in the first argument, others effect any vari-
able associated with the unit number in the second argu-
ment.
digits A span of digits will set the input record length
for the named I/O variable. This controls the max-
imum string that will be returned for regular text
I/O, and the number of bytes returned for binary
I/O. Record length is per-variable; multiple vari-
ables may be associated with the same unit, but
with different record lengths.
A For OUTPUT() the unit will be opened for append
access (noop for INPUT()).
B The unit will be opened for binary access. On
input newline characters have no special meaning;
the number of bytes transferred deppends on record
length (see above). On output no newline is
appended. For terminal devices, all I/O to this
unit will be performed in "raw" mode, however I/O
on other units continues to perform I/O in "cooked"
mode.
C Character at a time I/O. A synonym for B,1.
T Terminal mode. No newline characters are added on
output, and any newline characters are returned on
input. Terminal mode effects only the referenced
unit. Terminal mode is useful for outputting
prompts in interactive programs.
Q Quiet mode. Turns off input echo on terminals.
Effects only input from this unit.
U Update mode. The unit is opened for both input and
output.
W Unbuffered writes. Each output variable assignment
causes an I/O transfer to occur.
OPTIONS
-b Toggle startup banner output (by default on).
-d BBB Allocate BBB bytes of "dynamic storage" for pro-
gram code and data. A suffix of k multiplies the
number by 1024. A larger dynamic region may
result in fewer garbage collections (storage
garbage collect. Most programs do not need an
increased dynamic region to run. If your pro-
gram terminates with an "Insufficient storage to
continue" message you need to increase the
dynamic storage region size.
-f Toggle folding of identifiers to upper case (see
-CASE and &CASE).
-h Give help. Shows usage message, includes default
sizes for "dynamic region" and pattern match
stack.
-k Toggle running programs with compilation errors
(see -ERROR and -NOERRORS extensions). By
default programs with compilation errors will
not be run.
-l Re-enable listing to stdout. (default is
-UNLIST). Default listing side is LEFT.
-n Toggle running programs after compilation (see
-EXECUTE and -NOEXECUTE extensions). By default
programs are run after compilation.
-p Toggle SPITBOL extensions (also controlled by
-PLUSOPS).
-r Toggle reading INPUT from input file(s) after
END label. Otherwise INPUT defaults (back) to
standard input after program compilation is com-
plete.
-s Toggle termination statistics (off by default).
-u params specifies a parameter string available via
HOST(0).
-- Terminates processing items as options. Any
remaining strings are treated as files or user
parameters.
-M Specifies that all items left on the command
line after option processing is complete are to
be treated as filenames. The files are read in
turn until an END statement is found (Any
remaining data is available via the INPUT vari-
able if the -r option is also given). A -- ter-
minates processing of arguments as files, and
makes the remaining arguments available as user
parameters (see the HOST() function).
ber by 1024. The pattern match stack is used to
save backtracking and conditional assignment
information. If your program terminates with an
"Overflow during pattern matching" message you
need to increase the pattern match stack size.
SEE ALSO
[1] R. E. Griswold, J. F. Poage, and I. P. Polonsky The
SNOBOL4 Programming Language, 2nd ed., Prentice-
Hall Inc., 1971.
[2] R. E. Griswold, The Macro Implementation of
SNOBOL4, W. H. Freeman and Co., 1972.
AUTHOR
Philip L. Budne
BUGS
I/O retains some record oriented flavor.
I/O is still tied to unit numbers.
"Dynamic" storage cannot be expanded after startup.