ref-syn-syn-tok - SICStus Prolog

4.1.7.5 Syntax of Tokens as Character Strings

SICStus Prolog supports wide characters (up to 31 bits wide), interpreted as a superset of UNICODE.

Each character in the code set has to be classified as belonging to one of the character categories, such as small-letter, digit, etc. This classification is called the character-type mapping, and it is used for defining the syntax of tokens.

Only character codes 0..255 can be part of tokens, i.e. the ISO 8859/1 (Latin 1) subset of UNICODE¹. This restriction may be lifted in the future.

layout-char

These are character codes 0..32 and 127..160. This includes characters such as <TAB>, <LFD>, and <SPC>.

small-letter

These are character codes 97..122, i.e. the letters `a' through `z', as well as the non-ASCII character codes 223..246, and 248..255.

capital-letter

These are character codes 65..90, i.e. the letters `A' through `Z', as well as the non-ASCII character codes 192..214, and 216..222.

digit

These are character codes 48..57, i.e. the digits `0' through `9'.

symbol-char

These are character codes 35, 36, 38, 42, 43, 45..47, 58, 60..64, 92, 94, and 126, i.e. the characters:

          + - * / \ ^ < > = ~ : . ? @ # $ &

In addition, the non-ASCII character codes 161..191, 215, and 247 belong to this character type.

solo-char

These are character codes 33 and 59 i.e. the characters `!' and `;'.

punctuation-char

These are character codes 37, 40, 41, 44, 91, 93, and 123..125, i.e. the characters:

          % ( ) , [ ] { | }

quote-char

These are character codes 34, 39, and 96 i.e. the characters `"', `'', and ``'.

underline

This is character code 95 i.e. the character `_'.

Other characters are unclassified and may only appear in comments.

`token`	::= `name`
	\| `natural-number`
	\| `unsigned-float`
	\| `variable`
	\| `string`
	\| `punctuation-char`
	\| `layout-text`
	\| `full-stop`

`name`	::= `quoted-name`
	\| `word`
	\| `symbol`
	\| `solo-char`
	\| `[` `?layout-text` `]`
	\| `{` `?layout-text` `}`

`word`	::= `small-letter` `?alpha...`

`symbol`	::= `symbol-char...`	{ except in the case of a `full-stop` or where the first 2 chars are ``/*`' }

`natural-number`	::= `digit...`
	\| `base-prefix` `alpha...`	{ where each `alpha` must be digits of the base indicated by `base-prefix`, treating a,b,... and A,B,... as 10,11,... }
	\| `0` `'` `char-item`	{ yielding the character code for `char` }

`unsigned-float`	::= `simple-float`
	\| `simple-float` `exp` `exponent`

`simple-float`	::= `digit...` `.` `digit...`

`exp`	::= `e` \| `E`

`exponent`	::= `digit...` \| `sign` `digit...`

`sign`	::= `-` \| `+`

`variable`	::= `underline` `?alpha...`
	\| `capital-letter` `?alpha...`

`string`	::= `"` `?string-item...` `"`

`string-item`	::= `quoted-char`	{ other than ``"`' or ``\`' }
	\| `""`
	\| `\` `escape-sequence`

`quoted-atom`	::= `'` `?quoted-item...` `'`

`quoted-item`	::= `quoted-char`	{ other than ``'`' or ``\`' }
	\| `''`
	\| `\` `escape-sequence`

`backquoted-atom`	::= ` `?backquoted-item...` `

`backquoted-item`	::= `quoted-char`	{ other than ``' or ``\`' }
	\| ``
	\| `\` `escape-sequence`

`layout-text`	::= `layout-text-item...`

`layout-text-item`	::= `layout-char` \| `comment`

`comment`	::= `/` `?char...` `/`	{ where `?char...` must not contain ``*/`' }
	\| `%` `?char...` <LFD>	{ where `?char...` must not contain <LFD> }

`full-stop`	::= `.`	{ the following token, if any, must be `layout-text`}

`char`	::= `layout-char`
	\| `printing-char`

`printing-char`	::= `alpha`
	\| `symbol-char`
	\| `solo-char`
	\| `punctuation-char`
	\| `quote-char`

`alpha`	::= `capital-letter` \| `small-letter` \| `digit` \| `underline`

`escape-sequence`	::= `b`	{ backspace, character code 8 }
	\| `t`	{ horizontal tab, character code 9 }
	\| `n`	{ newline, character code 10 }
	\| `v`	{ vertical tab, character code 11 }
	\| `f`	{ form feed, character code 12 }
	\| `r`	{ carriage return, character code 13 }
	\| `e`	{ escape, character code 27 }
	\| `d`	{ delete, character code 127 }
	\| `a`	{ alarm, character code 7 }
	\| `other-escape-sequence`

`quoted-name`	::= `quoted-atom`
	\| `backquoted-atom`

`base-prefix`	::= `0b`	{ indicates base 2 }
	\| `0o`	{ indicates base 8 }
	\| `0x`	{ indicates base 16 }

`char-item`	::= `quoted-item`

`other-escape-sequence`	::= `x` `alpha...` `\`	{treating a,b,... and A,B,... as 10,11,... } in the range [0..15], hex character code }
	\| `o` `digit...` `\`	{ in the range [0..7], octal character code }
	\| <LFD>	{ ignored }
	\| `\`	{ stands for itself }
	\| `'`	{ stands for itself }
	\| `"`	{ stands for itself }
	\| `	{ stands for itself }

`quoted-char`	::= <SPC>
	\| `printing-char`

Footnotes

[1] Characters outside this range can still be included in quoted atoms and strings by using escape sequences (see ref-syn-syn-esc).

Send feedback on this subject.