snobol4 - SNOBOL4 interpreter


SYNOPSIS

       snobol4 [ options ...  ] [ file(s) ...  ] [ params ...  ]



DESCRIPTION

       This  manual  page  describes  a port of the original Bell
       Telephone  Labs  (BTL)  Macro  Implementation  of  SNOBOL4
       (MAINBOL) to machines with 32-bit C compilers by Philip L.
       Budne.  The language and it's implementation are described
       in [1] and [2].  Extensions from Catspaw SNOBOL4+, SPITBOL
       and SITBOL have been added.


   Limitations
       All aspects of the language are implemented except;

       o      Trapping of arithmetic exceptions.

       o      Dynamic use of the LOAD() on  platforms  not  using
              the   standard   dlopen(3)   interface  for  shared
              libraries or a.out(5) executable format is not sup-
              ported. User functions can be staticly linked (poor
              man's loading) on ALL platforms.  See load.doc  for
              more information.


   Changes
       The  following behaviors have been changed from the origi-
       nal Macro SNOBOL4;

       o      Listings are disabled by default.  Default  listing
              side  (when  enabled  by  -LIST or the -l option is
              LEFT.  Listings are directed to standard output.

       o      Error messages, the startup banner  and  statistics
              are  directed to standard error.  Compilation error
              messages  (including  erroneous  lines)  appear  on
              standard  error  as well as in the listings.  Error
              messages now reference the  source  file  name  and
              line number.

       o      Character set (see below).

       o      The  PUNCH  output  variable  no longer exists (see
              TERMINAL variable below).

       o      I/O is not performed using FORTRAN  I/O.   The  3rd
              argument  to the OUTPUT() and INPUT() functions are
              interpreted as a string of I/O options (see below).

              Listing statement numbers show the statement number
              of the LAST statement on the line (rather than  the
              first).

       o      Setting  the &ABEND keyword causes a core dump upon
              termination!

       o      The value of the &CODE keyword determines the  exit
              status of the snobol4 application.

       o      The  DATE()  function  returns strings of the form:
              MM/DD/YYYY HH:MM:SS.  See  extensions  section  for
              arguments to the DATE() function.

       o      Keyword &STLIMIT now defaults to -1.  When &STLIMIT
              is less than zero there is no limit to  the  number
              of  statements executed, and &STCOUNT is not incre-
              mented.

       o      VALUE tracing  applies  to  variables  modified  by
              immediate  value  assignment ($ operator) and value
              assignment (. operator) during pattern matching.

       o      The BACKSPACE() function is  not  implemented.  See
              SET() below.

       o      I/O unit numbers up to 256 can be used.


   Character set
       The  internal  character  set  and  collating sequence are
       USASCII-8.  Any 8-bit byte pattern is accepted as a SNOBOL
       datum  or in a string constant of a SNOBOL source program.
       The value of the SNOBOL protected keyword &ALPHABET  is  a
       256-character  string  of  all  Bytes  from  0  to 255, in
       ascending order.  Programs may be entered in  mixed  case.
       By default lower case identifiers are folded to upper case
       (see &CASE and -CASE extensions).

       The following operator character sequences  are  permitted
       and  represent  a  cross  between  PDP-10  MAINBOL (a.k.a.
       DECBOL), SITBOL and Catspaw SPITBOL usage;

       Exponentiation:     ^ **
       Alternation:        | !
       Unary negation:     ~ \
       Assignment:         = _
       Comment line:       * # | ; !
       Continuation line:  + .

       Both brackets ([]) and angle-brackets (<>) may be used  to
       subscript  arrays and tables.  The TAB (ASCII 9) character
       sequence    at    the    top    of     a     file     (ie;
       "#!/usr/local/bin/snobol4 -b").  Underscore (_) and period
       (.) are legal within identifiers and labels.


   Extensions
       ARRAY/TABLE access
              Multiple ARRAY and/or TABLE  index  operations  may
              appear in a row, without having to resort to use of
              the ELEMENT function, so  long  as  no  intervening
              spaces (or line continuations) appear.

       BREAKX()
              The  BREAKX()  function  is a pattern function used
              for fast scanning.  BREAKX(str)  is  equivalent  to
              BREAK(str)   ARBNO(LEN(1)  BREAK(str)).   In  other
              words BREAKX matches  a  sequence  of  ever  larger
              strings  terminated  by a break set.  BREAKX can be
              used as a faster matching replacement for ARB since
              BREAKX('S')  'STRING'  always  runs faster than ARB
              'STRING' since it only attempts  matching  'STRING'
              at locations where an 'S' has been detected.

       Case folding
              By  default  the  compiler  folds  identifiers  and
              directives  to  upper  case,  so  programs  can  be
              entered  in  either  case.  To disable case folding
              use the directive -CASE 0 or -CASE.   To  re-enable
              case  folding  use  directive  -CASE n where n is a
              non-zero integer.  The status of case  folding  may
              be  examined  and controlled from a running program
              by the unprotected system keyword &CASE.

       CHAR()
              The CHAR() function takes an integer from 0 to  255
              and returns the n'th character in &ALPHABET .

       DATE() For  compatilbility  with  new  versions of Catspaw
              SPITBOL  DATE(0)  returns  strings  of   the   form
              MM/DD/YY  HH:MM:SS,  and DATE(2) returns strings of
              the form YYYY-MM-YY HH:MM:SS.  With any other argu-
              ments DATE() returns strings of the form MM/DD/YYYY
              HH:MM:SS.

       -ERROR/-NOERRORS
              Directives -ERROR and -NOERRORS  control  execution
              of  program  with  compiler  errors.  If the -ERROR
              directive is given, the program  will  be  executed
              (but any attempt to execute a statement with a com-
              piler error will cause a  fatal  execution  error).
              By  default  programs with compiler errors will not
              be started, this can be restored using -NOERRORS.
              After an otherwise fatal error is curtailed due  to
              a  non-zero  value in &ERRLIMIT, the protected key-
              word &ERRTEXT will contain the error message.

       -EXECUTE/-NOEXECUTE
              Directives -EXECUTE and -NOEXECUTE  control  execu-
              tion  of  programs.  If the -NOEXECUTE directive is
              given, the program will be not executed after  com-
              pilation.   -EXECUTE  Cancels  any previous -NOEXE-
              CUTE.

       FREEZE()/THAW()
              The FREEZE() function prohibits new entry  creation
              in  the  referenced  table.   This is useful once a
              table has been initialized to avoid the creation of
              empty  entries  that normally occurs on table entry
              lookup failure. This can  greatly  improve  program
              speed,  since frozen tables will not become clogged
              with  random  entries.  Lookups  for  uninitialized
              entries   will  return  the  null  string,  however
              attempts to create a new entry will cause a  "Vari-
              able  not present where required" error. The THAW()
              function restores normal entry creation behavior.

       HOST()
              A limited simulation of the SPITBOL HOST() function
              is  included.   It  does  not perform argument type
              conversions, so all arguments must be  supplied  as
              either INTEGER or STRING data types.

              HOST() with no parameters returns a string describ-
              ing the system the  program  is  running  on.   The
              string  contains  three parts, separated by colons.
              The first part describes the physical architecture,
              the  second describes the operating system, and the
              third describes the language  implementation  name.
              Example: sun4c:SunOS 4.1.4:C-MAINBOL 0.98.3

              HOST(0)  returns  a  string  containing the command
              line parameter supplied to the -u option,  if  any.
              If no -u option was given, HOST(0) returns the con-
              catenation of all  user  parameters  following  the
              input filename(s).

              HOST(1,string) passes the string to the system(3) c
              library function, and returns the  subprocess  exit
              status.

              HOST(2,n)  for  integer  n returns the n'th command
              line argument (regardless of whether  the  argument
              was  the  command  name, an option, a filename or a
              user parameter) as a string, or failure if n is out
              indicating the first command line  argument  avail-
              able as a user parameter.

              HOST(4,string) returns the value of the environment
              variable named string.

       -INCLUDE
              The  -INCLUDE  directive  causes  the  compiler  to
              interpolate the contents of the named file enclosed
              in single or double quotes.  Any filename  will  be
              included  only  once,  this  can  be  overridden by
              appending a trailing space to the filename.  Trail-
              ing  spaces  are  removed  from the filename before
              use.  If the file is not found in the current work-
              ing directory an attempt will be made to find it in
              the directory specified by the  SNOLIB  environment
              variable,  or  if  that is not set, a predetermined
              library directory.

              -COPY is a synonym for -INCLUDE  for  compatibility
              with SPITBOL/370.

       IO_FINDUNIT()
              The  IO_FINDUNIT()  function  returns an unused I/O
              unit number for use with the  INPUT()  or  OUTPUT()
              functions.   IO_FINDUNIT() is meant for use in sub-
              routines which can be reused.   IO_FINDUNIT()  will
              never return a unit number below 20.

       &LINE/&FILE
              The  &LINE and &FILE keywords can be used to deter-
              mine the source file and file line associated  with
              the current statement.  The &LASTLINE and &LASTFILE
              return the source file  and  file  line  associated
              with the previous statement.

       -LINE
              The  -LINE  directive can be used to alter SNOBOL's
              idea of the current source file and line  (ie;  for
              use  by  preprocessors).  -LINE takes a line number
              and an optional quoted string filename.

       Lexical comparison
              A full set of lexical  (string)  comparison  predi-
              cates  have  been  added to complement the standard
              LGT() function; LEQ(), LGE(), LLE(), LLT(),  LNE().

       LPAD()/RPAD()
              The  LPAD()  and  RPAD()  functions  take the first
              argument (subject) string, and pad it  out  to  the
              length  specified in the second argument, using the
              first character of the optional third argument.  If
              ject  will  be  returned unmodified if already long
              enough.

       Named files
              Filenames can be supplied to the INPUT()  and  OUT-
              PUT()  functions  via  an optional fourth argument.
              If the filename begins with a vertical bar (|), the
              remainder  is  used  as a shell command whose stdin
              (in the case of OUTPUT()) or stdout (in the case of
              INPUT())  will  be  connected to the file variable.
              The filename - (hypen) is interpreted as  stdin  on
              INPUT()  and  stdout  on OUTPUT().  The magic file-
              names  /dev/stdin,  /dev/stdout,  and   /dev/stderr
              refer  to the current process standard input, stan-
              dard output and standard error I/O streams  respec-
              tively  regardless  of  whether those special file-
              names exist on your  system.   The  magic  pathname
              /dev/fd/n,  opens  a new I/O stream associated with
              file  descriptor  number  n.   The  magic  pathname
              /tcp/hostname/service  can  be used to open connec-
              tion to a TCP server.  If  the  path  ends  in  the
              optional suffix /priv the connection will be origi-
              nated from a privileged  local  port  using  rresv-
              port(3).

       &PARM
              The  entire command line is available via the &PARM
              protected keyword for  compatibility  with  Catspaw
              SNOBOL4+.  Use  of  the  SPITBOL  compatible HOST()
              function is probably preferable.

       REVERSE()
              REVERSE() returns it's subject  string  in  reverse
              order.

       Scientific notation
              REAL number syntax has been expanded to allow expo-
              nents of the form:
              ANY('Ee') ('+'  |  '-'  |  '')  SPAN('0123456789').
              Exponential format reals need not contain a decimal
              point.

       SET()
              The SET() function can be used  to  seek  the  file
              pointer  of an open file.  The first argument is an
              I/O unit number, the second is an  integer  offset.
              The  third  argument,  an  integer  determines from
              whence the  file  pointer  will  be  adjusted.   If
              whence  is  zero the starting point is the begining
              of the file, if whence is one, the  starting  point
              is  the current file pointer, and if whence is two,
              the starting point is the end of the  file.   SET()
              4.4BSD  on i386) the return value will be truncated
              to 32-bits, and only the first and last 4 gigabytes
              of a file can be accessed directly.

       SITBOL file functions
              FILE(string)  is a predicate which returns the null
              string if it's argument is the name of a file  that
              exists,  and  fails if it does not.  DELETE(string)
              is a predicate which tries to remove the file named
              by   it's   argument,   and  fails  if  it  cannot.
              RENAME(string1,string2)  is   a   predicate   which
              attempts to rename the file named by string2 to the
              file named by string1.  Unlike the SITBOL  version,
              if the target file exists, it will be removed.

       SNOBOL4+ real functions
              EXP(), LOG() and CHOP() functions are available for
              compatibility with  SNOBOL4+.   EXP()  returns  the
              value  e  ** x, LOG() returns the natural logarithm
              of it's REAL argument,  and  CHOP()  truncates  the
              fractional  part  of  it's  REAL argument (rounding
              towards zero), and returns a REAL.

       SORT()/RSORT()
              The SORT() and RSORT() functions take a  array  (or
              table,  which  is first converted to an array). The
              array may be singly-dimensioned if which case ,  if
              non- null, will indicate the field of a programmer-
              defined data type on which the sort  is  based.   A
              may  also be a table or a doubly-dimensioned array.
              In these cases F may be an integer  indicating  the
              column on which to sort.  If F is null, it is taken
              to be 1.  The array A is not modified; a new  array
              is  allocated  and returned.  SORT() sorts elements
              in ascending order, while RSORT() sorts in descend-
              ing order.

       SPITBOL operators
              The  SPITBOL  scan (?) and assignment (=) operators
              have been added.  A pattern match can appear within
              an  expression,  and  returns the matched string as
              it's value.  Similarly assignment can appear in  an
              expression,  and  returns  the  assigned  value. An
              assignment after a scan (ie;  STRING  ?  PATTERN  =
              VALUE)  performs a scan and replace.  Assignment is
              right associative, and has the  lowest  precedence,
              while scan is left associative and has a precedence
              just higher than assignment.

              The SPITBOL selection/alternative construction  can
              be  used in any expression.  It consists of a comma
              separated list of expressions  inside  parentheses.
              struction may result in incomprehensible code.

              The  type NUMERIC with CONVERT() and the removal of
              leading spaces from strings  converted  to  numbers
              (implicitly  or  explicitly)  are  also  legal when
              SPITBOL extensions are enabled.  SPITBOL extensions
              can  be  enabled  and  disabled  using the -PLUSOPS
              directive.  -PLUSOPS 0 disables SPITBOL  operators,
              while  -PLUSOPS or -PLUSOPS n where n is a non-zero
              integer  enables  them.   SPITBOL  extensions   are
              enabled by default.

       SQRT()
              The  SQRT() function is available for compatibility
              with SPARC SPITBOL.  SQRT() fails if  the  argument
              is negative, but does not cause a fatal error.

       SUBSTR()
              SUBSTR() takes a subject string as it's first argu-
              ment, and returns the  substring  starting  at  the
              position  specified  by  the  second argument (one-
              based) with a length specified by the  third  argu-
              ment.   If  the  third argument is missing or zero,
              the remainder of the string is returned.

       TERMINAL I/O variable
              The variable TERMINAL is associated with the  stan-
              dard  error file descriptor for both input and out-
              put.

       Trig functions
              SIN(), COS() and TAN() functions are available  for
              compatibility with SPARC SPITBOL and take arguments
              in radians.

       &UCASE/&LCASE
              Protected keywords &UCASE and &LCASE contain  upper
              and lower case characters respectively.


   I/O Associations
       I/O  is  performed  by  associating a variable name with a
       numbered I/O unit using the  INPUT()  and  OUTPUT()  func-
       tions.    The  following  associations  are  available  by
       default;

       Variable  Unit      Association
       INPUT     5         standard input
       OUTPUT    6         standard output
       TERMINAL  7         standard error (output)
       TERMINAL  8         standard error (input)

       The third argument of the INPUT() and  OUTPUT()  functions
       is  interpreted as a string of single letter options, com-
       mas are ignored.  Some options effect only the  I/O  vari-
       able  named in the first argument, others effect any vari-
       able associated with the unit number in the  second  argu-
       ment.

       digits A  span  of digits will set the input record length
              for the named I/O variable.  This controls the max-
              imum  string that will be returned for regular text
              I/O, and the number of bytes  returned  for  binary
              I/O.  Record length is per-variable; multiple vari-
              ables may be associated with  the  same  unit,  but
              with different record lengths.

       A      For  OUTPUT()  the  unit  will be opened for append
              access  (noop for INPUT()).

       B      The unit will be  opened  for  binary  access.   On
              input  newline  characters have no special meaning;
              the number of bytes transferred deppends on  record
              length   (see  above).  On  output  no  newline  is
              appended.  For terminal devices, all  I/O  to  this
              unit  will  be performed in "raw" mode, however I/O
              on other units continues to perform I/O in "cooked"
              mode.

       C      Character at a time I/O.  A synonym for B,1.

       T      Terminal  mode.  No newline characters are added on
              output, and any newline characters are returned  on
              input.   Terminal  mode effects only the referenced
              unit.   Terminal  mode  is  useful  for  outputting
              prompts in interactive programs.

       Q      Quiet  mode.   Turns  off  input echo on terminals.
              Effects only input from this unit.

       U      Update mode.  The unit is opened for both input and
              output.

       W      Unbuffered writes.  Each output variable assignment
              causes an I/O transfer to occur.



OPTIONS

       -b        Toggle startup banner output (by default on).

       -d BBB    Allocate BBB bytes of "dynamic storage" for pro-
                 gram code and data. A suffix of k multiplies the
                 number by 1024.  A  larger  dynamic  region  may
                 result  in  fewer  garbage  collections (storage
                 garbage collect.  Most programs do not  need  an
                 increased  dynamic  region to run.  If your pro-
                 gram terminates with an "Insufficient storage to
                 continue"  message  you  need  to  increase  the
                 dynamic storage region size.

       -f        Toggle folding of identifiers to upper case (see
                 -CASE and &CASE).

       -h        Give help. Shows usage message, includes default
                 sizes for "dynamic  region"  and  pattern  match
                 stack.

       -k        Toggle  running programs with compilation errors
                 (see  -ERROR  and  -NOERRORS  extensions).    By
                 default  programs  with  compilation errors will
                 not be run.

       -l        Re-enable  listing  to  stdout.    (default   is
                 -UNLIST).  Default listing side is LEFT.

       -n        Toggle  running  programs after compilation (see
                 -EXECUTE and -NOEXECUTE extensions).  By default
                 programs are run after compilation.

       -p        Toggle  SPITBOL  extensions  (also controlled by
                 -PLUSOPS).

       -r        Toggle reading INPUT from  input  file(s)  after
                 END  label.   Otherwise INPUT defaults (back) to
                 standard input after program compilation is com-
                 plete.

       -s        Toggle  termination statistics (off by default).

       -u params specifies  a  parameter  string  available   via
                 HOST(0).

       --        Terminates  processing  items  as  options.  Any
                 remaining strings are treated as files  or  user
                 parameters.

       -M        Specifies  that  all  items  left on the command
                 line after option processing is complete are  to
                 be  treated as filenames.  The files are read in
                 turn  until  an  END  statement  is  found  (Any
                 remaining  data is available via the INPUT vari-
                 able if the -r option is also given).  A -- ter-
                 minates  processing  of  arguments as files, and
                 makes the remaining arguments available as  user
                 parameters (see the HOST() function).

                 ber  by 1024. The pattern match stack is used to
                 save  backtracking  and  conditional  assignment
                 information.  If your program terminates with an
                 "Overflow during pattern matching"  message  you
                 need to increase the pattern match stack size.



SEE ALSO

       [1]    R. E. Griswold, J. F. Poage, and I. P. Polonsky The
              SNOBOL4 Programming Language,  2nd  ed.,  Prentice-
              Hall Inc., 1971.

       [2]    R.   E.   Griswold,  The  Macro  Implementation  of
              SNOBOL4, W. H. Freeman and Co., 1972.



AUTHOR

       Philip L. Budne



BUGS

       I/O retains some record oriented flavor.

       I/O is still tied to unit numbers.

       "Dynamic" storage cannot be expanded after startup.