Blame - doc/concepts/unicode - mudlib-public

blob: 5c22ecc15385ea9b6c80035dc5b6022942dc951e [file] [log] [blame]

Zesstra	7ea4a03	2019-11-26 20:11:40 +0100	[diff] [blame]	1	CONCEPT
				2	unicode
				3
				4	DESCRIPTION
				5	LPC strings come in two flavors: As byte sequences and as unicode
				6	strings. For both types almost the full range of string operations
				7	is available, but the types are not to be mixed. So for example
				8	you cannot add a byte sequence to an unicode string or vice versa.
				9
				10	Byte sequences can store only bytes (values from 0 to 255),
				11	but unicode strings can store the full unicode character set
				12	(values from 0 to 1114111).
				13
				14	There are two conversion functions to convert between byte sequences
				15	and unicode strings: to_text() which will return a unicode string,
				16	and to_bytes() which returns a byte sequence. Both take either
				17	a string or an array, and when converting between bytes and unicode
				18	also the name of the encoding (to be) used for the byte sequence.
				19
				20	-- File handling --
				21
				22	When a file is accessed either by compiling, read_file(), write_file()
				23	(not read_bytes() or write_bytes(), or when an explicit encoding was
				24	given), the master is asked via the driver hook H_FILE_ENCODING for
				25	the encoding of the file. If none is given, 7 bit ASCII is assumed.
				26	Whenever codes are encounted that are not valid in the given encoding
				27	a compile or runtime error will be raised.
				28
				29	-- File names --
				30
				31	The filesystem encoding can be set with a call to
				32	configure_driver(DC_FILESYSTEM_ENCODING, <encoding>). The default
				33	encoding is derived from the LC_CTYPE environment setting.
				34	If there is no environment setting (or it is set to the default
				35	"C" locale), then UTF-8 is used.
				36
				37	-- Interactives --
				38
				39	Each interactive has its own encoding. It can be set with
				40	configure_interactive(IC_ENCODING, <encoding>). The default is
				41	"ISO-8859-1//TRANSLIT" which maps each incoming byte to the
				42	first 256 unicode characters and uses transliteration to encode
				43	characters that are not in this character set. If an input or
				44	output character can not be converted to/from the configured
				45	encoding it will be silently discarded.
				46
				47	-- ERQ / UDP --
				48
				49	Only byte sequences can be sent to the ERQ or via UDP,
				50	and only byte sequences can be received from them.
				51
				52	HISTORY
				53	Introduced in LDMud 3.6.
				54
				55	SEE ALSO
				56	to_text(E), to_bytes(E), configure_driver(E)