4.1.7.5 Syntax of Tokens as Character Strings

SICStus Prolog supports wide characters (up to 31 bits wide), interpreted as a superset of UNICODE.

Each character in the code set has to be classified as belonging to one of the character categories, such as small-letter, digit, etc. This classification is called the character-type mapping, and it is used for defining the syntax of tokens.

Only character codes 0..255 can be part of tokens, i.e. the ISO 8859/1 (Latin 1) subset of UNICODE1. This restriction may be lifted in the future.

layout-char
These are character codes 0..32 and 127..160. This includes characters such as <TAB>, <LFD>, and <SPC>.
small-letter
These are character codes 97..122, i.e. the letters `a' through `z', as well as the non-ASCII character codes 223..246, and 248..255.
capital-letter
These are character codes 65..90, i.e. the letters `A' through `Z', as well as the non-ASCII character codes 192..214, and 216..222.
digit
These are character codes 48..57, i.e. the digits `0' through `9'.
symbol-char
These are character codes 35, 36, 38, 42, 43, 45..47, 58, 60..64, 92, 94, and 126, i.e. the characters:
          + - * / \ ^ < > = ~ : . ? @ # $ &
     

In addition, the non-ASCII character codes 161..191, 215, and 247 belong to this character type.

solo-char
These are character codes 33 and 59 i.e. the characters `!' and `;'.
punctuation-char
These are character codes 37, 40, 41, 44, 91, 93, and 123..125, i.e. the characters:
          % ( ) , [ ] { | }
     

quote-char
These are character codes 34, 39, and 96 i.e. the characters `"', `'', and ``'.
underline
This is character code 95 i.e. the character `_'.

Other characters are unclassified and may only appear in comments.

token ::= name
| natural-number
| unsigned-float
| variable
| string
| punctuation-char
| layout-text
| full-stop

name ::= quoted-name
| word
| symbol
| solo-char
| [ ?layout-text ]
| { ?layout-text }

word ::= small-letter ?alpha...

symbol ::= symbol-char... { except in the case of a full-stop or where the first 2 chars are `/*' }

natural-number ::= digit...
| base-prefix alpha... { where each alpha must be digits of the base indicated by base-prefix, treating a,b,... and A,B,... as 10,11,... }
| 0 ' char-item { yielding the character code for char }

unsigned-float ::= simple-float
| simple-float exp exponent

simple-float ::= digit... . digit...

exp ::= e | E

exponent ::= digit... | sign digit...

sign ::= - | +

variable ::= underline ?alpha...
| capital-letter ?alpha...

string ::= " ?string-item... "

string-item ::= quoted-char { other than `"' or `\' }
| ""
| \ escape-sequence

quoted-atom ::= ' ?quoted-item... '

quoted-item ::= quoted-char { other than `'' or `\' }
| ''
| \ escape-sequence

backquoted-atom ::= ` ?backquoted-item... `

backquoted-item ::= quoted-char { other than ``' or `\' }
| ``
| \ escape-sequence

layout-text ::= layout-text-item...

layout-text-item ::= layout-char | comment

comment ::= /* ?char... */ { where ?char... must not contain `*/' }
| % ?char... <LFD> { where ?char... must not contain <LFD> }

full-stop ::= . { the following token, if any, must be layout-text}

char ::= layout-char
| printing-char

printing-char ::= alpha
| symbol-char
| solo-char
| punctuation-char
| quote-char

alpha ::= capital-letter | small-letter | digit | underline

escape-sequence ::= b { backspace, character code 8 }
| t { horizontal tab, character code 9 }
| n { newline, character code 10 }
| v { vertical tab, character code 11 }
| f { form feed, character code 12 }
| r { carriage return, character code 13 }
| e { escape, character code 27 }
| d { delete, character code 127 }
| a { alarm, character code 7 }
| other-escape-sequence

quoted-name ::= quoted-atom
| backquoted-atom

base-prefix ::= 0b { indicates base 2 }
| 0o { indicates base 8 }
| 0x { indicates base 16 }

char-item ::= quoted-item

other-escape-sequence ::= x alpha... \ {treating a,b,... and A,B,... as 10,11,... } in the range [0..15], hex character code }
| o digit... \ { in the range [0..7], octal character code }
| <LFD> { ignored }
| \ { stands for itself }
| ' { stands for itself }
| " { stands for itself }
| ` { stands for itself }

quoted-char ::= <SPC>
| printing-char

Footnotes

[1] Characters outside this range can still be included in quoted atoms and strings by using escape sequences (see ref-syn-syn-esc).



Send feedback on this subject.