| CONCEPT |
| unicode |
| |
| DESCRIPTION |
| LPC strings come in two flavors: As byte sequences and as unicode |
| strings. For both types almost the full range of string operations |
| is available, but the types are not to be mixed. So for example |
| you cannot add a byte sequence to an unicode string or vice versa. |
| |
| Byte sequences can store only bytes (values from 0 to 255), |
| but unicode strings can store the full unicode character set |
| (values from 0 to 1114111). |
| |
| There are two conversion functions to convert between byte sequences |
| and unicode strings: to_text() which will return a unicode string, |
| and to_bytes() which returns a byte sequence. Both take either |
| a string or an array, and when converting between bytes and unicode |
| also the name of the encoding (to be) used for the byte sequence. |
| |
| -- File handling -- |
| |
| When a file is accessed either by compiling, read_file(), write_file() |
| (not read_bytes() or write_bytes(), or when an explicit encoding was |
| given), the master is asked via the driver hook H_FILE_ENCODING for |
| the encoding of the file. If none is given, 7 bit ASCII is assumed. |
| Whenever codes are encounted that are not valid in the given encoding |
| a compile or runtime error will be raised. |
| |
| -- File names -- |
| |
| The filesystem encoding can be set with a call to |
| configure_driver(DC_FILESYSTEM_ENCODING, <encoding>). The default |
| encoding is derived from the LC_CTYPE environment setting. |
| If there is no environment setting (or it is set to the default |
| "C" locale), then UTF-8 is used. |
| |
| -- Interactives -- |
| |
| Each interactive has its own encoding. It can be set with |
| configure_interactive(IC_ENCODING, <encoding>). The default is |
| "ISO-8859-1//TRANSLIT" which maps each incoming byte to the |
| first 256 unicode characters and uses transliteration to encode |
| characters that are not in this character set. If an input or |
| output character can not be converted to/from the configured |
| encoding it will be silently discarded. |
| |
| -- ERQ / UDP -- |
| |
| Only byte sequences can be sent to the ERQ or via UDP, |
| and only byte sequences can be received from them. |
| |
| HISTORY |
| Introduced in LDMud 3.6. |
| |
| SEE ALSO |
| to_text(E), to_bytes(E), configure_driver(E) |