SICStus Prolog supports character codes up to 31 bits wide where the codes are interpreted as for Unicode for the common subset.
When a character code (a “code point” in Unicode terminology) is read or written to a stream it must be endoded into a byte sequence. The method by which each character code is encoded to or decoded from a byte sequence is called “character encoding”.
The following character encodings are currently supported by SICStus Prolog.
ANSI_X3.4-1968
ISO-8859-1
ISO-8859-15
windows 1252
UTF-8
UTF-16
UTF-16LE
UTF-16BE
UTF-32
UTF-32LE
UTF-32BE
LE
and BE
denotes respectiviely little endian
and big endian.
These encodings can be auto-detected if a Unicode signature is present in a file opened for read. A Unicode signature is also known as a Byte order mark (BOM)
The encoding to use can be specified when using open/4
and
similar predicates using the option encoding/1
. When openening a
file for input the encoding can often be determined automatically. The
default is ISO-8859-1
if no encoding is specified and no encoding
can be detected from the file contents.
The encoding used by a text stream can be queried using
stream_property/2
.