Added public files
Roughly added all public files. Probably missed some, though.
diff --git a/doc/concepts/.synonym b/doc/concepts/.synonym
new file mode 100644
index 0000000..06bb40a
--- /dev/null
+++ b/doc/concepts/.synonym
@@ -0,0 +1,19 @@
+effizient effizienz
+optimieren effizienz
+optimierung effizienz
+optimal effizienz
+performanz effizienz
+speicher memory
+swap memory
+swapping memory
+objekt objects
+objekte objects
+objektorientiert oop
+objektorientierung oop
+property properties
+props properties
+terminal terminals
+stil goodstyle
+style goodstyle
+guterstil goodstyle
+kosten ticks
diff --git a/doc/concepts/concepts b/doc/concepts/concepts
new file mode 100644
index 0000000..54bfae0
--- /dev/null
+++ b/doc/concepts/concepts
@@ -0,0 +1,9 @@
+NAME
+ concepts
+
+DESCRIPTION
+ This directory contains man pages about basic concepts of the
+ LPC language as it is provided by Amylaars parser/interpreter.
+
+SEE ALSO
+ driver(D), efun(E), applied(A), master(M), lpc(LPC)
diff --git a/doc/concepts/effizienz b/doc/concepts/effizienz
new file mode 100644
index 0000000..3f99c48
--- /dev/null
+++ b/doc/concepts/effizienz
@@ -0,0 +1,222 @@
+Effizienz
+ BESCHREIBUNG:
+ Effizienz in der Programmierung ist leider nicht ganz so einfach zu
+ beschreiben, da es viel mit der zugrundeliegenden Verarbeitung der
+ Programme zu tun hat. Es geht ganz gut am Beispiel.
+
+ Generell haben Lesbarkeit und Wartbarkeit von Code Vorrang vor dessen
+ Effizienz, gerade weil die wirklich arbeitslastigen Methoden in der Lib
+ stecken. Ausserdem ist es im Allgemeinen nicht empfehlenswert, (viel)
+ Aufwand in die Optimierung von Code zu stecken, solange nicht klar ist,
+ dass dies ueberhaupt notwendig ist.
+ Les-/Wartbarkeit und effizienter Stil schliessen sich aber nicht aus und
+ einige (einfache) Grundregeln lassen sich einfach einhalten.
+
+ Fuer diejenigen unter euch, die gerade erst mit LPC zusammenstossen
+ gibt es (*) an den besonders wichtigen Stellen. Auf Dauer solltet ihr
+ aber mal alle Eintraege ueberfliegen. Den ersten koennen alle hier
+ beherzigen:
+
+ LPC wird beim Laden nicht optimiert:
+ Das was ihr schreibt, wird auch so ausgefuehrt, es werden keine
+ Schleifen optimiert, keine unnoetigen Zuweisungen entfernt, nichts
+ wird veraendert:
+ - ueberlegt euch also euren Code gut, wenn er an kritischen Stellen
+ steht oder sehr viel Rechenzeit kostet (zB geschachtelte Schleifen)
+ - testet einfach mal Varianten und fragt auf -lpc nach Optimierung!
+
+ call_out und heart_beat erzeugen konstante Last:
+ Jeder call_out() steht in einer Liste, die im selben Takt wie der
+ heart_beat() durchsucht wird. Beides kostet Zeit. Beide Methoden
+ verhindern zudem das Ausswappen des entsprechenden Objektes. Deshalb
+ schalten sich Raummeldungen (AddRoomMessage funktioniert ueber
+ call_out()) und der heart_beat() von /std/npc nach dem Verlassen des
+ Raumes durch den letzten Spieler selbst aus.
+ * - bitte achtet darauf, unnoetige call_out/heart_beat zu vermeiden.
+ (Insbesondere sich bewegende NPCs sollten sich auch irgendwann
+ wieder abschalten - es gibt einen funktionierenden MNPC mit diesen
+ Eigenschaften unter /p/service/padreic/mnpc.)
+ - fuer regelmaessige Aufrufe in einem Objekt, wo der genaue Zeitpunkt
+ nicht auf einige Sekunden ankommt, bietet sich auch reset() mit
+ set_next_reset() an
+ - statt call_out()-Ketten in einem Raum laufen zu lassen, kann man
+ sich auch die letzte Aktivierung merken und bei einem init()
+ wieder ein entsprechend langes call_out() starten
+
+ Speicher und das Drumherum:
+ Die Speichersituation ist nicht mehr verzweifelt. Das heisst aber
+ nicht, dass damit geschlampt werden kann. Gleichzeitig ist die
+ Reservierung von Speicher und die Garbage Collection, das Einsammeln
+ freigegebenen Speichers bei Freigaben von Variablen (wie bei x+y,
+ x=0 (x,y==array/mapping)) immer kostspielig. Folgend ein paar
+ Tipps dazu:
+ Groesse:
+ - wenn moeglich, globale Variablen nach Nutzung freigeben - ggf.
+ #defines benutzen: Vorsicht jedoch bei Mapping/Array (siehe unten)
+ - globale oder in Properties abgespeicherte Mappings/Arrays/
+ Strings klein halten und nur dynamisch erweitern
+ - programmiert man an vielen Stellen gleichen Code, dann ist es
+ sinnvoll, diesen in eine eigene Datei/Klasse zu giessen und von
+ dieser zu erben - das spart Speicher und laesst sich besser warten
+ - replace_program bitte nur benutzen, wenn man weiss, was es bewirkt,
+ /std/room verwendet es bereits automagisch
+ * - Objekte in Raeumen und NPCs sollten per AddItem() addiert werden,
+ da die generelle Aufraeumfunktion /std/room::clean_up() dann weiss
+ ob der Raum entfernt werden kann
+ - es sollte keine ewigen Objektquellen geben
+ - Blueprints:
+ - Soll es immer nur ein Objekt von etwas geben, stellt die Blueprint
+ per AddItem(...,...,1) dort hin.
+ Achtung: Blueprints neu zu laden, ist teuer im Vergleich zum clonen.
+ Gerade bei NPCs (die beim Tod zerstoert werden), sollte
+ man das im Hinterkopf behalten.
+ - Die BP von geclonten Objekten muss nicht immer initialisiert werden,
+ speziell bei komplexen Objekten kann es sich lohnen, die
+ Initialisierung der BP im create abzubrechen. (Denn meistens ist nur
+ ihr Programm interessant)
+ protected void create() {
+ if(!clonep(this_object())) {
+ set_next_reset(-1); // falls die Clones im reset() was
+ return; // machen
+ }
+ ::create(); ...
+ }
+
+ Kosten:
+ * - es lohnt, lokale Mappings oder Arrays mit bekannter Groesse via
+ allocate() oder m_allocate() vor Belegung in voller benoetiger
+ Groesse zu reservieren:
+ statt:
+ int *x = ({}); foreach(int i: 10) x+=({i});
+ lieber:
+ int *x = allocate(10); foreach(int i: 10) x[i] = i;
+ * - wiederholtes Ausschneiden (slice) aus Arrays vermeiden, dabei wird
+ staendig Speicher neu alloziiert und benutzter Speicher freigegeben:
+ statt:
+ int *x; ...; while(sizeof(x)) { x[0]...; x=[1..x]; }
+ lieber:
+ int *x; ...; i=sizeof(x); while(j<i) { x[j]...; j++; }
+ * - direkte Mapping/Array ({}), ([]) in Methoden (zB ueber #define)
+ sparen zwar globalen Platz, kosten aber Konstruktionszeit bei jedem
+ Aufruf dieser Methoden - fuer haeufig gerufene Methoden sollten
+ grosse Datenstrukturen einmal global konstruiert werden
+ statt: #define GROSSES_MAPPING ([....])
+ void haeufige_fun() { ... GROSSES_MAPPING ... }
+ lieber: mapping GROSSES_MAPPING = ([....]);
+ void haeufige_fun() { ... GROSSES_MAPPING ... }
+ * - diverse efuns sind genauso schnell zugreifbar wie Variablen,
+ muessen also nur zugewiesen werden, wenn sich der Wert aendern kann:
+ this_player(), this_interactive(), environment(), previous_object(),
+ this_object().
+ * - statt all_inventory() einer Variablen zuzuweisen und darueber
+ zu iterieren, kann man oft mit first_inventory() und next_inventory()
+ ein Inventory durchgehen
+
+ Methoden:
+ Die Methoden eines Objektes werden in einer Liste gespeichert, die
+ beim Aufruf einer Methode ueber call_other() (oder o->fun())
+ durchgesehen wird. Das hat folgende Konsequenzen:
+ * - jede oeffentliche Methode wird bei call_other() durchsucht und
+ das kostet Zeit, wenn eine Methode also nicht oeffentlich sein
+ muss, dann schreibt auch ein "protected" davor, wenn sie in den
+ erbenden Klassen nicht sichtbar sein muss: "private"
+ - nutzt ihr eine fremde Methode mehrfach (zB QueryProp), dann ist es
+ an sehr kritischen Stellen sinnvoll, diese einmal zu suchen und an
+ eine Lfun-Closure zu binden, weitere Aufrufe sind schneller:
+ closure cl;
+ cl=symbol_function("QueryProp",this_player());
+ funcall(cl, P_LEVEL); funcall(cl, P_SIZE); ...
+ Nebenbei bemerkt:
+ - es gibt in LPC kein sog. fruehes Binden, "this_object()->function();"
+ ist fast immer unnoetig und fast immer nur ein Zeichen fuer Faulheit die
+ richtigen Prototypen zu inkludieren/formulieren.
+
+ Lambdas:
+ Lambda-Closures sind nicht nur schwer zu lesen, sondern oft auch langsamer
+ als andere Closures. Speziell wird bei jedem Auftreten von lambda() die
+ Lambda neu erzeugt.
+ Nehmt euch die Zeit aus einer Lambda-Closure eine Lfun-Closure zu
+ machen oder sie zumindest an eine globale Closure-Variable zu binden,
+ damits sie schnell ausgefuehrt werden kann. #define bietet sich hier
+ nicht an.
+ statt: filter(users(),
+ lambda(({'x}), ({#'call_other,'x,
+ "QueryProp",P_SECOND})));
+ lieber: private static int _isasec(object o) {
+ return o->QueryProp(P_SECOND);
+ }
+ ...
+ filter(users(), #'_isasec);
+ oder: closure cl;
+ cl=lambda(({'x}), ... );
+ ...
+ filter(users(), cl);
+ oder:
+ Bessere Alternative zu Lambdas sind uebrigens inline-closures (man
+ inline-closures), die deutlich schneller und einfacher zu lesen sind.
+ filter(users(), function mixed (pl)
+ {
+ pl->QueryProp(P_SECOND);
+ }
+ );
+
+
+ Simul-efun und die Last der Vergangenheit:
+ Es gibt einige Simul-Efuns, die anstelle einer aehnlichen Efun verwendet
+ werden, aber langsamer sind. Beispiel: die sefun m_copy_delete() macht
+ fast das gleiche wie m_delete(), erzeugt aber vorher immer eine Kopie.
+ Wenn man diese nicht braucht, sollte man m_delete() den Vorzug geben.
+
+
+ Generelle Bemerkungen:
+ *** - LAG entsteht vor allem dann, wenn zu viele Dinge auf einmal
+ identifiziert, bewegt, geladen, gecloned oder kopiert werden
+ sollen (in nur einem Kommando, in einem reset(), ...)
+ - zerlegt solche Aufgaben mit call_out/heart_beat in Haeppchen
+ - lasst es einen Erzmagier durchsehen
+ * - Variablen sind immer auf 0 initialisiert,
+ allocate()-Arrays sind mit 0 oder Wunschwert initialisiert.
+ - gleicher Code sollte aus Schleifen sollten entfernt werden,
+ zB bei Iteration ueber ein Array gehoert das sizeof() vor die
+ Schleife, nicht in den Test
+ * - beim Identifizieren eindeutiger Objekte ist present_clone()
+ wesentlich billiger als ein present() + geschuetzten IDs
+ * - aus Arrays koennen mittels "-" viele identische Werte auf einmal
+ entfernt werden, es ist also sinnvoll bei Loeschoperationen
+ zu loeschende Werte auf einen bestimmten Wert zu setzen und diesen
+ dann mittels array-=({wert}) zu entfernen.
+ Wir entfernen alle getoeteten NPC, d.h. alle geloeschten Objekte
+ aus einer Liste: meinelistemitnpcs-=({0})
+ - efuns sind oft schneller als eigene Konstrukte, gerade was
+ Arrays betrifft. Pauschalisiert kann das nicht werden, man muss
+ auch immer die noetige Reservierung von Speicher mitbetrachten!
+ Zusammen mit einer Referenz sind sort_array(), filter(), map() etc.
+ dennoch oft euer Freund:
+ statt: t=allocate(0);
+ for (i=sizeof(a1); i--; )
+ if (member(a2,a1[i])>=0) t+=({a1[i]});
+ lieber: private static mixed _is_member(mixed x, a) {
+ if (member(a,x)>=0) return 1;
+ else return 0;
+ }
+ ...
+ t=filter(a1, #'_is_member, &a2);
+ oder hier noch besser:
+ t=a1&a2;
+ - x&y ist bei zwei grossen Arrays manchmal die schlechtere Wahl:
+ statt: t=all_inventory(TO)&users(); // zwei Arrays
+ lieber: t=filter(all_inventory(TO), // ein Array!
+ #'query_once_interactive);
+ Eventuell lohnt es sich hier, gleich mit first_inventory() und
+ next_inventory() ueber den Raum zu iterieren und auf allen
+ query_once_interactive() die gewuenschten Operationen vorzunehmen.
+ - foreach() ist oft gegenueber for() die bessere Alternative (etwas
+ schneller, einfacher formuliert)
+ - weitere schnelle efuns:
+ query_verb(), interactive(), query_once_interactive(), living(),
+ stringp(), intp(), closurep(), objectp(), ...
+
+ SIEHE AUCH:
+ memory, objekte, mudrechner, goodstyle, ticks
+
+ 6. Sep 2012 Gloinson
diff --git a/doc/concepts/erq b/doc/concepts/erq
new file mode 100644
index 0000000..378d59d
--- /dev/null
+++ b/doc/concepts/erq
@@ -0,0 +1,578 @@
+CONCEPT
+ erq - External Request Demon
+
+DESCRIPTION
+ Up to version 3.2.1@61, LPMud utilized two external programs
+ in an ad-hoc manner to solve problems: the 'hname' program to
+ resolve IP addresses into meaningful hostnames, and the
+ 'indent' program to properly indent LPC files.
+ In version 3.2.1@61 both functions were united in a
+ generalized 'erq' process, to which additional functions may
+ be attached. Unfortunately it was never documented by Amylaar,
+ so the information presented here had to be reverse engineered
+ from the sources - better take it with a grain of salt.
+
+ The erq feature is available if the driver is compiled with
+ ERQ_DEMON defined (in config.h).
+
+ When the driver starts up, it tries to fork off the program
+ 'BINDIR/erq --forked <other args>' (with BINDIR defined in
+ the Makefile). If this succeeds, the erq may talk with
+ the driver through stdin and stdout (piped through AF_UNIX
+ sockets). The erq has to signal its successfull start by
+ writing the character '1' back to the driver.
+
+ The erq has to understand these commandline arguments:
+
+ --forked: explained above
+ --execdir <dir>: The directory where the callable executables
+ can be found. If not specified, ERQ_DIR is used.
+ <dir> must not end in a '/' and should be absolute.
+
+ At runtime, the erq may be changed/removed from within the
+ mudlib using the efun attach_erq_demon(). This efun is given
+ an interactive object as argument, and takes the connection
+ away(!) from this object and stores it as the erq connection
+ to use (an old erq connection is closed first). The object
+ (which now no longer is interactive) is then no longer needed,
+ but may continue to exist.
+ The erq attached this way of course has to use the sockets it
+ opened to communicate with the driver.
+
+ Most of the communication between erq and driver is going to
+ be initiated by the driver (the erq has to look up the
+ hostnames for given IP addresses), but using the efun
+ send_erq() the mudlib may talk with the erq as well.
+
+ The communication between driver and erq is done using
+ messages of specified structures and constants (defined in
+ util/erq.h resp. sys/erq.h). The 'int32's are signed integers
+ of four byte length, and are sent with the MSByte first.
+ Every message must be sent atomically!
+
+ The head of the messages is always the same:
+
+ struct erq_msghead {
+ int32 msglen; /* Total size of message in bytes */
+ int32 handle; /* Identification number */
+ }
+
+ The 'handle' number is set by the driver (do not make
+ assumptions about its value) and is used to associated the erq
+ responses with the pending requests. This way the erq is free
+ to respond in an order different to those of the incoming
+ requests.
+
+ The messages send to the erq follow this symbolic format:
+
+ struct to_erq_msg {
+ int32 msglen;
+ int32 handle;
+ char request;
+ char data[0];
+ }
+
+ The 'request' denotes which service is requested from the erq,
+ the size and content of 'data' depends on the requested
+ service.
+
+ The answer message from the erq to the driver (if there is one
+ at all) may have two forms:
+
+ struct from_erq_msg {
+ int32 msglen;
+ int32 handle;
+ char data[0];
+ }
+
+ struct from_erq_keep_msg {
+ int32 msglen;
+ const int32 keep = ERQ_KEEP_HANDLE;
+ int32 handle;
+ char data[0];
+ }
+
+ The replied data from the erq is stored in 'data', which size
+ and content depends on the request answered. The answer is
+ identified by 'header.handle'. Normally, one request results
+ in just one response sent by the erq using struct from_erq_msg,
+ so the handle is recycled after this response.
+ Shall the erq send several responses (or break one response
+ into several parts), the struct from_erq_keep_msg has to be
+ used for all but the last response - this message with its
+ included special handle keeps the real handle alive.
+
+
+ Mudlib generated erq-calls specify the 'request' and the
+ 'data' to be sent, and receive the 'data' replied. When
+ dealing with spawned programs, the first byte of the returned
+ 'data' determines the content type of the received message.
+ The actual 'data' which the lpc programs get to see is sent
+ and retrieved as arrays of byte integers (integers in the
+ range of 0..255).
+
+
+ The actual interface between erq demon and driver is limited
+ to the general message formats and the hostname lookup
+ mechanism. The driver is meant to withstand erq demon failures
+ at least in a garbage-in garbage-out fashion. You could add
+ new requests to the erq demon, or write your own from scratch,
+ without changing the driver.
+
+
+ Currently five services are predefined in the supplied
+ erq-demon (util/erq.c in the driver source archive): looking
+ up a hostname, execution, forking or spawning an external
+ program, authentification of a connection, and handling of
+ external UDP/TCP connections. As mentioned above, only the
+ hostname-lookup is a true must.
+
+ For a program to be executable for erq, it must be placed in
+ or below ERQ_DIR (defined in config.h). On most unix systems,
+ it is possible to use a symlink instead of the whole program
+ if you want a standard binary. You could even symlink entire
+ directories like /usr/sbin, but chances are you make a big
+ security hole this way :-)
+
+
+ Hostname lookup:
+
+ request : ERQ_RLOOKUP
+ data sent: struct in_addr.s_addr addr // the address to resolve
+ data recv: struct in_addr.s_addr addr // the resolved address
+ char[] name // the hostname (if any)
+
+ If the sent address can't be resolved, just the address is
+ to be returned. The string need not be 0-terminated.
+
+
+ Hostname lookup:
+
+ request : ERQ_LOOKUP
+ data sent: char[] name // the name to resolve
+ data recv: struct in_addr.s_addr addr // the resolved address
+
+ If the sent address can't be resolved, no data is returned (the
+ driver will get a message with just the header).
+
+
+ Hostname lookup - IPv6:
+
+ request : ERQ_RLOOKUPV6
+ data sent: char[] addr // the address to resolve
+ data recv: char[] data // the resolved name
+
+ If the address could be resolved, the returned data is a string,
+ with exactly one space, in the form "<addr> <name>". <addr> is
+ the address passed to the erq, <name> is the hostname of the
+ address or, if there is no reverse-IPv6 entry for <addr>, the
+ IPv6 address which may or may not be different from <addr>.
+
+ If the address can not be resolved, the returned data is
+ an error message without a space (currently, just "invalid-format"
+ and "out-of-memory" are returned).
+
+
+ Execute/Fork program:
+
+ request : ERQ_EXECUTE/ERQ_FORK
+ data sent: char[] command // the command to execute
+ data recv: char status = CHILD_FREE
+ char rc // the success/error code
+ char info // additional information
+
+ The erq executes the sent command using the execv().
+ The erq does the processing of the command line arguments
+ (which must not contain '\') and checks the validity of the
+ command (it must not start with '/' nor contain '..'), which
+ is interpreted relative to ERQ_DIR.
+ The external program is executed from a fork()ed instance of
+ the erq, however, with ERQ_EXECUTE the erq waits until the
+ external program finished before replying its response, with
+ ERQ_FORK the response is immediately sent back.
+
+ Possible return codes are:
+ ERQ_OK : Operation succeeded.
+ ERQ_E_ARGLENGTH: Too long command.
+ ERQ_E_ARGFORMAT: Illegal argument given (contains '\');
+ ERQ_E_ARGNUMBER: Too much arguments (>= 96).
+ ERQ_E_ILLEGAL : Command from outside ERQ_DIR requested.
+ ERQ_E_PATHLEN : Commandpath too long.
+ ERQ_E_FORKFAIL : Command could not be forked;
+ info holds the errno value.
+
+ ERQ_EXECUTE features some more return codes:
+ ERQ_OK : Operation succeeded, <info> holds the exit status.
+ ERQ_SIGNALED : Command terminated the signal <info>.
+ ERQ_E_NOTFOUND : No process found to wait() for.
+ ERQ_E_UNKNOWN : Unknown exit condition from wait().
+
+
+ Spawn program:
+
+ request : ERQ_SPAWN
+ data sent: char[] command // the command to execute
+ data recv: Spawn failed:
+ char rc // the error code (see ERQ_FORK)
+ char info // additional information
+ data recv: Spawn succeeded:
+ char rc = ERQ_OK
+ char[] ticket // the spawn ticket.
+
+ The erq executes the sent command as if given an ERQ_FORK
+ command, but returns additional information about the
+ started process to allow further communication.
+ In contrast to ERQ_FORK, ERQ_SPAWNED processes may be
+ controlled via ERQ_KILL, receive data from the mud via
+ ERQ_SEND on their stdin, and output from their stdout/stderr
+ is sent back to the mud.
+ The spawned process is identified by its <ticket> (don't
+ make any assumptions about its length or content), the transaction
+ itself by <handle>.
+
+
+ Send data to spawned program:
+
+ request : ERQ_SEND
+ data sent: char[] ticket // the addressed process ticket.
+ char[] text // the text to send.
+ data recv: char rc // the success/error code.
+ int32 info // opt: additional info.
+
+ The <text> is sent to the stdin of the spawned process
+ identified by <ticket>.
+
+ Possible return codes are:
+ ERQ_OK : Operation succeeded, no <info> is replied.
+ ERQ_E_TICKET : The given ticket is invalid, no <info> replied.
+ ERQ_E_INCOMPLETE: Only <info> chars of the text have been
+ sent.
+ If a callback is specified, the erq will send
+ a ERQ_OK message once all data has been sent
+ (this may never happen).
+ ERQ_E_WOULDBLOCK: Error E_WOULDBLOCK (also stored in <info>)
+ happened while sending the text.
+ ERQ_E_PIPE : Error E_PIPE (also stored in <info>)
+ happened while sending the text.
+ ERQ_E_UNKNOWN : The error with code <info> happened
+ while sending the data.
+
+ Amylaar-erq doesn't try to re-send the remaining data after
+ a ERQ_E_INCOMPLETE, so there will never be an ERQ_OK.
+
+
+ Send a signal to a spawned program:
+
+ request : ERQ_KILL
+ data sent: char[] ticket // the addressed process ticket
+ int32 signal // the signal to send
+ data recv: char rc // the success/error code
+
+ The <signal> is sent to the spawned process identified by <ticket>.
+
+ Possible return codes are:
+ ERQ_OK : Operation succeeded, no <info> is replied.
+ ERQ_E_TICKET : The given ticket is invalid, no <info> replied.
+ ERQ_E_ILLEGAL : The given signal is illegal.
+
+
+ Data replies from spawned programs:
+
+ data recv: char out_or_err // type of text output
+ char[] text // text output by child process
+
+ The child process controlled by the erq did output <text>
+ on stdout (<out_or_err> == ERQ_STDOUT) resp. on stderr
+ (<out_or_err> == ERQ_STDERR).
+
+
+ Exit notifications from spawned programs:
+
+ data recv: char rc // the exit code
+ char info // additional information.
+
+ The child process controlled by the erq did terminate.
+ Possible exit codes are:
+ ERQ_EXITED : Process exited with status <info>.
+ ERQ_SIGNALED : Process terminated by signal <info>.
+ ERQ_E_UNKNOWN : Process terminated for unknown reason.
+
+
+ Authentificate connection (see rfc 931):
+
+ request : ERQ_AUTH
+ data sent: struct sockaddr_in remote // the address to check
+ int32 port // the mud port
+ or
+ data sent: int32 remote_ip // remote ip to check
+ int16 remote_port // remote port to check
+ int16 local_port // the mud port
+
+ data recv: char[] reply // the data received by authd
+
+ The erq attempts to connect the authd on the remote system
+ and to verify the connection between the remote port and the
+ mud port. The latter will normally be the port number of the
+ socket on besides of the gamedriver, retrieveable by
+ query_ip_number().
+
+ The answer from the authd (one line of text) if there is any
+ is returned as result.
+
+ The second form of the ERQ_AUTH command is recognized by
+ the xerq as alternative.
+
+
+ Open an UPD port:
+
+ request : ERQ_OPEN_UDP
+ data sent: char[2] port // the port number to open (network order)
+ data recv: Open failed:
+ char rc // the success/error code.
+ char info // opt: additional info.
+ data recv: Open succeeded:
+ char rc = ERQ_OK
+ char[] ticket // the connection ticket.
+
+ The erq opens an UDP-port on the host machine with the given
+ port number.
+ Possible exit codes are:
+ ERQ_OK : Operation succeeded.
+ ERQ_E_ARGLENGTH : The port number given does not consist
+ of two bytes.
+ ERQ_E_NSLOTS : The max number of child processes (given
+ in <info>) is exhausted.
+ ERQ_E_UNKNOWN : Error <info> occured in one of the system
+ calls done to open the port.
+
+ Once the port is open, it is treated as if is just another
+ spawned program.
+
+
+ Send data over an UDP port:
+
+ request : ERQ_SEND
+ data sent: char[] ticket // the addressed port's ticket.
+ struct in_addr.s_addr addr // address of receiver.
+ struct addr.sin_port port // port of receiver.
+ char[] text // the text to send.
+ data recv: char rc // the success/error code.
+ int32 info // opt: additional info.
+
+ The <text> is sent from our port <ticket> to the network
+ address <addr>, port <port>.
+
+ Possible return codes are:
+ ERQ_OK : Operation succeeded, no <info> is replied.
+ ERQ_E_TICKET : The given ticket is invalid, no <info> replied.
+ ERQ_E_INCOMPLETE: Only <info> chars of the text have been
+ sent. The erq will send a ERQ_OK message
+ once all data has been sent.
+ ERQ_E_WOULDBLOCK: Error E_WOULDBLOCK (also stored in <info>)
+ happened while sending the text.
+ ERQ_E_PIPE : Error E_PIPE (also stored in <info>)
+ happened while sending the text.
+ ERQ_E_UNKNOWN : The error with code <info> happened
+ while sending the data.
+
+
+ Close an UDP port:
+
+ request : ERQ_KILL
+ data sent: char[] ticket // the addressed port's ticket
+ int32 signal // the signal to send (ignored)
+ data recv: char rc = ERQ_OK
+
+ The port <ticket> is closed. The <signal> must be sent, but
+ its value is ignored.
+
+
+ Data received over an UDP connection:
+
+ data recv: char out_or_err = ERQ_STDOUT
+ struct in_addr.s_addr addr // ip-address of sender
+ struct addr.sin_port port // port of sender
+ char[] text // data received
+
+ The UPD port controlled by the erq did receive <text> over
+ the network from the sender at <addr>, reply port number <port>.
+
+
+ Open an TCP to listen for connections:
+
+ request : ERQ_LISTEN
+ data sent: struct addr.sin_port port // the port number to open
+ data recv: Open failed:
+ char rc // the success/error code.
+ char info // opt: additional info.
+ data recv: Open succeeded:
+ char rc = ERQ_OK
+ char[] ticket // the connection ticket.
+
+ The erq opens an TCP-port on the host machine with the given
+ port number to listen for connections.
+ Possible exit codes are:
+ ERQ_OK : Operation succeeded.
+ ERQ_E_ARGLENGTH : The port number given does not consist
+ of two bytes.
+ ERQ_E_NSLOTS : The max number of child processes (given
+ in <info>) is exhausted.
+ ERQ_E_UNKNOWN : Error <info> occured in one of the system
+ calls done to open the port.
+
+ Once the port is open, it is treated as if is just another
+ spawned program.
+
+
+ Open an TCP port:
+
+ request : ERQ_OPEN_TCP
+ data sent: struct in_addr.s_addr ip // the ip to address
+ struct addr.sin_port port // the port to address
+ data recv: Open failed:
+ char rc // the success/error code.
+ char info // opt: additional info.
+ data recv: Open succeeded:
+ char rc = ERQ_OK
+ char[] ticket // the connection ticket.
+
+ The erq opens an TCP-port on the host machine and tries to connect
+ it to the address <ip>:<port>.
+ Possible exit codes are:
+ ERQ_OK : Operation succeeded.
+ ERQ_E_ARGLENGTH : The port number given does not consist
+ of two bytes.
+ ERQ_E_NSLOTS : The max number of child processes (given
+ in <info>) is exhausted.
+ ERQ_E_UNKNOWN : Error <info> occured in one of the system
+ calls done to open the port.
+
+ Once the port is open, it is treated as if is just another
+ spawned program.
+
+
+ Send data over a TCP connection:
+
+ request : ERQ_SEND
+ data sent: char[] ticket // the addressed process ticket.
+ char[] text // the text to send.
+ data recv: char rc // the success/error code.
+ int32 info // opt: additional info.
+
+ The <text> is sent to the stdin of the spawned process
+ identified by <ticket>.
+
+ Possible return codes are:
+ ERQ_OK : Operation succeeded, no <info> is replied.
+ ERQ_E_TICKET : The given ticket is invalid, no <info> replied.
+ ERQ_E_INCOMPLETE: Only <info> chars of the text have been
+ sent. The erq will send a ERQ_OK message
+ once all data has been sent.
+ ERQ_E_WOULDBLOCK: Error E_WOULDBLOCK (also stored in <info>)
+ happened while sending the text.
+ ERQ_E_PIPE : Error E_PIPE (also stored in <info>)
+ happened while sending the text.
+ ERQ_E_UNKNOWN : The error with code <info> happened
+ while sending the data.
+
+
+ Data ready to read on TCP connection:
+
+ data recv: char out_or_err = ERQ_OK
+ char[] ticket // ticket of this connection
+
+ There is data available to read on the specified TCP connection.
+
+
+ Data received over a TCP connection:
+
+ data recv: char out_or_err = ERQ_STDOUT
+ char[] text // data received
+
+ The TCP port controlled by the erq did receive <text>.
+
+
+ TCP connection closes on error:
+
+ data recv: char out_or_err = ERQ_E_UNKNOWN
+ char errno // errno from socket operation
+
+ The TCP connection caused an error <errno> and has been closed.
+
+
+ TCP connection closed:
+
+ data recv: char out_or_err = ERQ_EXITED
+
+ The TCP connection closed regularily (End Of File).
+
+
+ Connection pending on TCP socket:
+
+ data recv: char out_or_err = ERQ_STDOUT
+
+ The TCP 'listen' port controlled by the erq received
+ a connection request.
+
+
+ Accept a pending connections:
+
+ request : ERQ_ACCEPT
+ data sent: char[] ticket // the ticket of this socket
+ data recv: Accept failed:
+ char rc // the success/error code.
+ char info // opt: additional info.
+ data recv: Accept succeeded:
+ char rc = ERQ_OK
+ struct in_addr.s_addr ip // remote side's ip
+ struct addr.sin_port port // remote side's port
+ char[] ticket // the new ticket.
+
+ The erq accepts a new connection on an accept-TCP-port, creates
+ an child and ticket for it and returns its ticket together with
+ the remote's side <ip>:<port> number (in network byte order).
+ Possible exit codes are:
+ ERQ_OK : Operation succeeded.
+ ERQ_E_ARGLENGTH : The port number given does not consist
+ of two bytes.
+ ERQ_E_NSLOTS : The max number of child processes (given
+ in <info>) is exhausted.
+ ERQ_E_TICKET : the ticket didn't match
+ ERQ_E_UNKNOWN : Error <info> occured in one of the system
+ calls done to open the port.
+
+ Once the port is open, it is treated as if is just another
+ spawned program.
+
+
+EXAMPLE
+ Assume you have a script 'welcome-mail' to send a welcome mail
+ to a new player. Put this script into the directory for the callable
+ executables, then you can use it like this:
+
+ void erq_response(mixed * data)
+ {
+ write_file( "WELCOMELOG"
+ , sprintf("rc %d, info %d\n", data[0], data[1]));
+ }
+
+ void send_mail(string player_name, string player_email)
+ {
+ send_erq( ERQ_EXECUTE
+ , "welcome-mail '"+player_name+"' '"+player_email+"'"
+ , #'erq_response);
+ }
+
+
+HISTORY
+ The erq was introduced with 3.2.1@61.
+ ERQ_AUTH was introduced with 3.2.1@81.
+ ERQ_SEND, ERQ_SPAWN, ERQ_KILL were introduced with 3.2.1@82.
+ ERQ_OPEN_UDP, ERQ_OPEN_TCP, ERQ_LIST were introduced with 3.2.1@98.
+ ERQ_RLOOKUPV6 was introduced in 3.2.8.
+ LDMud 3.2.9 added the '--execdir' argument to erq, and the ERQ_OK
+ after ERQ_E_INCOMPLETE protocol.
+
+SEE ALSO
+ attach_erq_demon(E), send_erq(E), stale_erq(M), rfc 931
+ query_ip_number(E)
diff --git a/doc/concepts/files b/doc/concepts/files
new file mode 100644
index 0000000..56e6e64
--- /dev/null
+++ b/doc/concepts/files
@@ -0,0 +1,16 @@
+CONCEPT
+ files
+
+DESCRIPTION
+ As a wizard, you are working with files. Each file represents
+ the bulding plan for one or more objects (except text or doc files
+ of course).
+
+ The mudlib has a root, and when working with filenames, you
+ can always specify a full pathname from the root by starting
+ with a '/' (slash) at the beginning of the file name.
+
+ (oops, truncated - why when where did it happen?)
+
+SEE ALSO
+ objects(C), create(A), reset(A)
diff --git a/doc/concepts/goodstyle b/doc/concepts/goodstyle
new file mode 100644
index 0000000..15ea04b
--- /dev/null
+++ b/doc/concepts/goodstyle
@@ -0,0 +1,74 @@
+Guter Stil
+ BESCHREIBUNG:
+ Guten Stil kann man lernen. Zumindest in der Programmierung. Guter Stil
+ bedeutet vor allem: Schreib es so, das andere es lesen und verstehen
+ koennen. (Ansonsten werde /secure/-Erzmagier, die muessen aufgrund
+ eingebauter Paranoia selbstverschluesselnd schreiben.)
+
+ Lernen kann man auch am Beispiel, unter /d/gebirge/room/,
+ /d/gebirge/obj/, /d/gebirge/mon/, /doc/beispiele/ ist sauberer Code
+ zu finden.
+
+ Tipps zum Aussehen:
+ - Programmzeilen nicht laenger als 80 Zeichen schreiben, denn 80 Zeichen
+ breite Fenster sind immer noch die Regel
+ - Code kann auf der naechsten Zeile weiterfuehren:
+ filter(all_inventory(environment(TP)),
+ #'einetollesortierfunktion, &extravariablen);
+ - Strings koennen (ohne Addition mit +) unterbrochen werden:
+ "Jemand "<EOL> "jammert" == "Jemand jammert"
+ "Jemand \<EOL>jammert" == "Jemand jammert"
+
+ - Bloecke (mit {} umrandeter Code) einruecken, damit man den
+ geplanten Programmfluss gut erkennen kann
+ - 2 bis 4 Zeichen, nicht gleich ein ganzes Tab (siehe erste Regel)
+ - die { und } ruhig in einzelen Zeilen schreiben
+
+ - Variablen in dem Block deklarieren, in dem sie gebraucht werden,
+ dadurch sind sie schneller zu finden
+
+ - #define nicht uebertreiben, wenn komplexe Funktionen damit gebaut
+ sind, uebersieht der Leser den Code oft nicht mehr
+ - #define sollten in #includeten Headerdateien stehen
+ - wenn es eine oft benutzte Funktion ist, schreib sie als Lfun
+ - ist es vielleicht schon in /sys/*.h oder /secure/*.h definiert?
+
+ Tipps zum Code:
+ - objektorientiert programmieren
+ - das was andere nicht von aussen sehen oder aufrufen muessen, mit
+ "protected" oder "private" bei Vererbung verstecken
+
+ - return mitten im Code wenn moeglich vermeiden, da der Programmfluss
+ damit aufgesplittert wird - ein einziger Funktionsausgang ist
+ uebersichtlicher
+ - Ausnahme hiervon kann aber sein, (die meisten) Ausschlussbedingungen
+ fuer irgendwas am Anfang einer Funktion abzupruefen und die Funktion
+ dann auch sofort zu verlassen.
+
+ - korrekte Typen bei Variablen und Typen verwenden, damit der Leser
+ erkennt welches Ding was ist
+ - #pragma strong_types oder gar #pragma strict_types hilft
+ - Auch Typpruefungen zur Laufzeit (#pragma rtt_checks) verwenden
+ - bei Objekten, die geerbt werden, immer auch #pragma save_types
+
+ Tipps zu Dateien:
+ - unterteilt eure Gegenden am besten in verschiedene Verzeichnisse,
+ dann findet man sich schnell zurecht:
+ - NPCs, Objekte, Raeume, Master (ggf. Waffen, Ruestungen wenn zu viel)
+
+ - Pfade sollten in einer zentralen #include Datei stehen, welche dann
+ relativ #included werden kann. Damit erleichtert man spaeteren Magiern
+ eventuell noetige Verzeichnisaenderungen.
+ statt: AddItem("/d/ebene/<magier>/sumpf/npc/schleimi", ...);
+ lieber:
+ #include "../local.h"
+ [enthaelt ein
+ #define SUMPFNPC(x) ("/d/ebene/<magier>/sumpf/npc/" x)]
+ ...
+ AddItem(SUMPFNPC("schleimi"), ...);
+
+ SIEHE AUCH:
+ inheritance, effizienz, pragma, oop
+
+05.06.2014, Zesstra
+
diff --git a/doc/concepts/hooks b/doc/concepts/hooks
new file mode 100644
index 0000000..108da08
--- /dev/null
+++ b/doc/concepts/hooks
@@ -0,0 +1,144 @@
+CONCEPT
+ driver hooks
+
+DESCRIPTION
+ To allow a greater flexibility of the muds, the gamedrivers
+ since 3.2.1 moved several once hardcoded 'underground'
+ activities from the driver into the mudlib. This includes for
+ example the differences between compat and native mode.
+
+ The hooks are set with the privileged efun set_driver_hook().
+ Some of the hooks are mandatory, some not. Most hooks accept
+ unbound lambda closures as values, some also lfun closures or
+ even strings.
+ The hooks are identified by an ordinal number, for which
+ symbolic names are defined in /sys/driverhooks.h.
+
+ H_MOVE_OBJECT0
+ H_MOVE_OBJECT1
+ Mandatory hooks to implement the efun void move_object().
+
+
+ H_LOAD_UIDS
+ H_CLONE_UIDS
+ Mandatory hooks to determine the uid and euid of loaded or cloned
+ objects.
+
+
+ H_CREATE_SUPER
+ H_CREATE_OB
+ H_CREATE_CLONE
+ Optional hooks to initialize an object after creation.
+
+ H_CREATE_SUPER is called for blueprints implicitely loaded
+ by inheritance, H_CREATE_OB for explicitely loaded
+ blueprints/objects, and H_CREATE_CLONE for cloned objects.
+
+
+ H_RESET
+ Optional hook to reset an object.
+
+
+ H_CLEAN_UP
+ Optional hook to clean up an object.
+
+
+ H_DEFAULT_METHOD
+ Optional hook to provide default implementation for unresolved
+ calls.
+
+
+ H_DEFAULT_PROMPT
+ Optional hook for the command prompt. If this hook is not used,
+ the driver will use "> " as the command prompt.
+
+
+ H_PRINT_PROMPT
+ Optional hook to print the current command prompt. If this hook is
+ not set, the driver will just print the prompt to the user.
+
+
+ H_COMMAND
+ Optional hook to parse and execute commands. If this hook is used,
+ it bypasses the normal command parsing done by the driver (including
+ the MODIFY_COMMAND and NOTIFY_FAIL hooks).
+
+
+ H_MODIFY_COMMAND
+ Optional hook to modify commands (both entered or given by a
+ call to command()) before the parser sees them (this includes
+ special commands like 'status').
+
+
+ H_MODIFY_COMMAND_FNAME
+ Mandatory hook specifying the name of the 'modify_command'
+ lfun to call for newly entered commands as result of a
+ set_modify_command().
+
+
+ H_NOTIFY_FAIL
+ Mandatory hook to issue the default message if an entered
+ command couldn't be parsed and no notify_fail() command is
+ in effect.
+
+
+ H_SEND_NOTIFY_FAIL
+ Optional hook to send the notify fail message, regardless
+ of how it was determined, to the player. If the hook is not
+ set, the message is delivered using tell_object() internally.
+
+
+ H_NO_IPC_SLOT
+ Optional hook specifying the message given to logins
+ rejected due to space limitations (MAX_PLAYER).
+
+
+ H_INCLUDE_DIRS
+ Semi-mandatory hook specifying the directories where <>-type
+ include files are searched (this includes ""-includes not
+ found as specified).
+
+
+ H_AUTO_INCLUDE
+ Optional hook specifying a string to be included before
+ the source of every compiled LPC object.
+
+
+ H_TELNET_NEG
+ Optional hook to specifiy how to perform a single telnet
+ negotiation. If not set, most telnet options are rejected (read:
+ only a very minimal negotiation takes place).
+
+
+ H_NOECHO
+ Optional hook to specifiy how to perform the telnet actions
+ to switch the echo mode (used for e.g. password input_to()s).
+ If not set, a default handling is performed.
+
+ IMPORTANT: If this hook is used, the control of all telnet
+ negotiation is transferred to the mudlib (you must combine it
+ with H_TELNET_NEG to conform to the telnet protocol).
+
+
+ H_ERQ_STOP
+ Optional hook to notify the mudlib about the termination of
+ the erq demon.
+
+
+HISTORY
+ The hooks concept was introduced in 3.2.1
+ H_MOVE_OBJECT0/1 were introduced in 3.2.1@1
+ H_CLEAN_UP was introduced in 3.2.1@34
+ H_MODIFY_COMMAND was introduced in 3.2.1@51.
+ H_MODIFY_COMMAND_FNAME was 'hooked' in 3.2.1@109.
+ H_NOTIFY_FAILE and H_NO_IPC_SLOT were introduced in 3.2.1@55.
+ H_INCLUDE_DIRS was introduced in 3.2.1@57.
+ H_TELNET_NEG was introduced in 3.2.1@60.
+ H_NOECHO and H_ERQ_STOP were introduced in 3.2.1@85.
+ H_COMMAND was introduced in 3.2.7.
+ H_SEND_NOTIFY_FAIL and H_AUTO_INCLUDE were introduced in 3.2.9.
+ H_DEFAULT_METHOD was introduced in 3.3.113.
+ H_DEFAULT_PROMPT and H_PRINT_PROMPT were introduced in 3.3.163.
+
+SEE ALSO
+ native(C), set_driver_hook(E), all in (H)
diff --git a/doc/concepts/hsregexp b/doc/concepts/hsregexp
new file mode 100644
index 0000000..04266be
--- /dev/null
+++ b/doc/concepts/hsregexp
@@ -0,0 +1,99 @@
+SYNOPSIS
+ Henry Spencer Regular Expressions
+
+
+DESCRIPTION
+ This document describes the regular expressions supported by the
+ implementation by Henry Spencer (the traditional package for
+ LPMud).
+
+
+OPTIONS
+ The following bitflag options modify the behaviour of the
+ regular expressions - both interpretation and actual matching.
+
+ The efuns may understand additional options.
+
+ RE_EXCOMPATIBLE
+
+ If this bit is set, the pattern is interpreted as the UNIX ed
+ editor would do it: () match literally, and the \( \) group
+ expressions.
+
+
+REGULAR EXPRESSION DETAILS
+ A regular expression is a pattern that is matched against a
+ subject string from left to right. Most characters stand for
+ themselves in a pattern, and match the corresponding charac-
+ ters in the subject. As a trivial example, the pattern
+
+ The quick brown fox
+
+ matches a portion of a subject string that is identical to
+ itself. The power of regular expressions comes from the
+ ability to include alternatives and repetitions in the pat-
+ tern. These are encoded in the pattern by the use of meta-
+ characters, which do not stand for themselves but instead
+ are interpreted in some special way.
+
+ There are two different sets of meta-characters: those that
+ are recognized anywhere in the pattern except within square
+ brackets, and those that are recognized in square brackets.
+ Outside square brackets, the meta-characters are as follows:
+
+ . Match any character.
+
+ ^ Match begin of line.
+
+ $ Match end of line.
+
+ \< Match begin of word.
+
+ \> Match end of word.
+
+ \B not at edge of a word (supposed to be like the emacs
+ compatibility one in gnu egrep)
+
+ x|y Match regexp x or regexp y.
+
+ () Match enclosed regexp like a 'simple' one (unless
+ RE_EXCOMPATIBLE is set).
+
+ x* Match any number (0 or more) of regexp x.
+
+ x+ Match any number (1 or more) of regexp x.
+
+ [..] Match one of the characters enclosed.
+
+ [^ ..] Match none of the characters enclosed. The .. are to
+ replaced by single characters or character ranges:
+
+ [abc] matches a, b or c.
+
+ [ab0-9] matches a, b or any digit.
+
+ [^a-z] does not match any lowercase character.
+
+ \c match character c even if it's one of the special
+ characters.
+
+
+NOTES
+ The \< and \> metacharacters from Henry Spencers package
+ are not available in PCRE, but can be emulate with \b,
+ as required, also in conjunction with \W or \w.
+
+ In LDMud, backtracks are limited by the EVAL_COST runtime
+ limit, to avoid freezing the driver with a match
+ like regexp(({"=XX==================="}), "X(.+)+X").
+
+
+AUTHOR
+ Mark H. Colburn, NAPS International (mark@jhereg.mn.org)
+ Henry Spencer, University of Torronto (henry@utzoo.edu)
+ Joern Rennecke
+ Ian Phillipps
+
+
+SEE ALSO
+ regexp(C), pcre(C)
diff --git a/doc/concepts/imp b/doc/concepts/imp
new file mode 100644
index 0000000..27d9ab9
--- /dev/null
+++ b/doc/concepts/imp
@@ -0,0 +1,63 @@
+CONCEPT
+ imp
+
+LAST UPDATED
+ Deepthought, 10-Nov-92
+ Pepel, 18-Nov-93
+
+DESRIPTION
+ This document describes IMP, the intermud message protocol,
+ also known as Intermud-1.
+
+ Imp messages are exchanged between muds using UDP
+ (unreliable datagram protocol) packets. Each mud provides
+ a connection endpoint which is given by the ip host address
+ and the UDP port number. Muds may then send messages to
+ this port by using the efun send_udp(). The applied function
+ receive_udp will be called by the driver in the master
+ object if a imp message arrives at the mud's UDP port.
+
+ Imp message packets have the following format:
+
+ password@objectname@functionname[[@argument]...]
+
+ <password> is the connection password to verify incoming
+ imp packets. It is encoded using crypt(E) and compared to
+ the stored password. Each mud participating in the imp
+ network has a secret password which is encoded by the
+ admin and distributed to remote muds with which the mud
+ should have direct connection. Encryted passwords may also
+ propagated to other muds over already secure channels.
+
+ <objectname> is a logical name which is not to be confused
+ with mudlib object filenames. It is used by receive_msg in
+ the master object to route the message to another object by
+ associating the logical object name with a real mudlib file
+ name. A good idea would be to reserve a special directory
+ for imp objects, e.g. /secure/net/<objectname>.
+
+ <functionname> is the function which is called by the master
+ object in the object described by <objectname>.
+
+ <argument> are additional arguments which are handed to the
+ function <functionname>. The exact definition of functions
+ and arguments are left to the imp applications.
+
+AUTHOR
+ originally Deepthought
+
+NOTE
+ The above is only particularly correct nowadays. Recently a
+ package name ``inetd'' was published, that is based on the IMP
+ mechanism in the driver (send_udp() and receive_udp()), but
+ it uses a different message format. That package seems to
+ enjoy much publicity and is installed in a number of muds. For
+ details look into the inetd description.
+
+ An other method of inter mud connection is the Mudlink
+ package, which uses a normal user connection that is connected
+ to a special user object, and an auxiliary process that does
+ the connection to other muds.
+
+SEE ALSO
+ send_udp(E), receive_udp(M), intermud(C)
diff --git a/doc/concepts/inheritance b/doc/concepts/inheritance
new file mode 100644
index 0000000..46bf338
--- /dev/null
+++ b/doc/concepts/inheritance
@@ -0,0 +1,177 @@
+CONCEPT
+ Inheritance
+
+DESCRIPTION
+ Have you noticed how many objects in the system have the same
+ functionality in common? Let's look at rooms for instance, they
+ all have the ability to host people and provide commands. It's
+ not that every room is programmed with the same basic functions
+ again and again, rather it will use a model room and then make
+ some special changes to it. That doesn't work by copying the
+ file.. Ouch! Don't replicate code! But by putting a tiny inherit
+ declaration
+
+ inherit "<model-class>";
+
+ at the beginning of your new file. This must come before any local
+ ariables or functions. Once inherited your class will behave just
+ like the model class, because all the public methods are available
+ to the outside world. Now it is in your hands to change such an
+ inherited behaviour. You have the following tools to do so:
+
+ * Access to variables
+
+ It is one of the best design decisions in LPC that variables
+ are not accessible from outside, but you can use inherited
+ variables just as if they were your own. Modifiers apply however.
+
+ * Method overloading
+
+ int method_that_also_exists_in_the_model() {
+ <your new code>
+ }
+
+ You can simply rewrite a method that is also defined in the model
+ class, and thus change how it behaves. Contrary to other languages
+ in LPC method overloading only matches the name of the method, so
+ even by changing the amount and type of parameters you will mask
+ out the original version of the method. You can even apply other
+ modifiers to it as the original.
+
+ * Calling inherited methods
+
+ int method_that_also_exists_in_the_model() {
+ <your new code>
+ return ::method_that_also_exists_in_the_model();
+ }
+
+ You can add to the behaviour of a method by redefining it,
+ then calling it from within your new version. You can actually
+ call inherited methods from anywhere in your code. The double
+ colon tells the compiler you are looking for the inherited
+ variant.
+
+EXAMPLE
+
+ Let's imagine very simple food in a file called "/the/food.c":
+
+ // unless "modified" variables are accessible by inheritors
+ int vitamins = 10;
+
+ // please overload this function with your own description
+ public short() { return "something edible"; }
+
+ // let's do some standard action for food
+ public consume() {
+ this_player() -> nourish(vitamins);
+ destruct(this_object());
+ }
+
+ And now someone else decides to do some italian cooking in a
+ file called "/the/fusilli.c"
+
+ inherit "/the/food";
+
+ // we have our own variables.
+ int gone_cold = 0;
+
+ // and we simply redefine the short() function to replace it
+ public short() {
+ // description changes depending on gone_cold
+ return "a "+( gone_cold ? "stinking" : "steaming" )
+ +" plate of fusilli";
+ }
+
+ // we have a new function to make food go cold
+ private deteriorate() {
+ gone_cold = 1;
+ write("The fusilli have gone cold.\n");
+ }
+
+ // assume this gets called at creation
+ private create() {
+ // we can access the variable we inherited from food.c
+ vitamins = 44; // tomato has plenty of vitamins
+
+ // go cold in 5 minutes
+ call_out( #'deteriorate, 5 * 60 );
+ }
+
+ // we can overload the function even with new parameters
+ public consume(how) {
+ // fetch the name of the person, or use "Someone"
+ string name = this_player() -> name() || "Someone";
+
+ if (!gone_cold) {
+ write("You enjoy a delicious plate of fusilli.\n");
+ say(name +" guzzles a plate of hot fusilli.\n");
+ }
+ else if (how == "quickly") {
+ write("You eat the fusilli so quickly you "
+ "hardly notice they have gone cold.\n");
+ say(name +" wolfs down a plate of cold fusilli.\n");
+ }
+ else {
+ write("You eye the plate and wonder if you "
+ "really feel like eating cold fusilli.\n");
+ return; // don't eat
+ }
+
+ // and here comes the most important part:
+ // we execute consume() from food.c, so we
+ // actually inherit its behaviour.
+ ::consume();
+ }
+
+ADVANCED USAGE
+
+ * Doing multiple inheritance
+
+ While the Java(TM) language has so-called interfaces as a kludge,
+ LPC doesn't need them as it supports real multiple inheritance.
+ A very powerful feature, it lets you combine the behaviour of
+ several classes into a new one. Simply put several lines of
+ inherit declarations underneath each other. If you have name
+ collisions in the namespace of inherited methods, you will have
+ to address them explicitely with a "the/file"::method(args) syntax.
+
+ * Wildcarded multiple inheritance
+
+ LDMUD 3.2.1@117 introduces an advanced voodoo syntax which allows
+ you to call several methods in model classes at once, but for some
+ technical reasons it cannot pass any arguments. This works by
+ writing a glob type match ('*' and '?' wildcards) into the string
+ in front of the double colon, as in "*"::create(). I wouldn't
+ recommend you to use this, it's better to be clearly conscious of
+ what you inherit and do. But if you're desperate, there you go.
+
+ADVANCED EXAMPLE
+
+ inherit "foo";
+ inherit "bar";
+ inherit "baz";
+ inherit "ball";
+
+ reset() {
+ "ba?"::reset();
+ // calls bar::reset() and baz::reset()
+
+ "ba*"::reset();
+ // calls bar::reset(), baz::reset() and ball::reset()
+
+ "*"::reset();
+ // calls every inherited reset() function.
+
+ "ball"::rejoice("Listen to italectro today!");
+ // only explicit filename of model class allows
+ // passing arguments to the inherited method
+ }
+
+AUTHOR
+ symlynX of PSYC and Nemesis, with a little help from Someone
+
+SEE ALSO
+ functions(LPC), initialisation(LPC), modifiers(LPC), pragma(LPC),
+ overloading(C)
+ function_exists(efun), functionlist(efun), inherit_list(efun),
+ symbol_variable(efun), variable_exists(efun), variable_list(efun).
diff --git a/doc/concepts/intermud b/doc/concepts/intermud
new file mode 100644
index 0000000..abe2b4f
--- /dev/null
+++ b/doc/concepts/intermud
@@ -0,0 +1,224 @@
+CONCEPT
+ intermud
+
+DESCRIPTION
+ There are several intermud protocols which define how (players on)
+ different muds can communicate with each other. The protocols are
+ in general not muddriver or mudlib dependant, though the number of
+ implementations is limited.
+
+ This text is about the rather old widely spread 'Zebedee Intermud',
+ which is also called 'Intermud 2' altough it differs quite a lot
+ from the real Intermud 2 protocol.
+
+ Full information on the newer Intermud 3 could be found on the
+ web at http://www.imaginary.com/protocols/intermud3.html so there
+ is no discussion here - the following is just about Zebedee Intermud
+ (aka Intermud 2).
+
+ Zebedee Intermud communication is handled by the /secure/inetd
+ object, originally written by Nostradamus for Zebedee with some
+ extensions that are discussed in inetd(C). How the data is
+ actually sent across the network is described in intermud.basic(C).
+
+SERVICES
+ Note that the fields "NAME" and "UDP_PORT" should be present in
+ every message. Very common are the fields "ID" (used whenever an
+ reply is expected) and "SND" (the sender: he should receive the
+ reply). These fields will not be mentioned in the list below.
+
+ Request types are listed on the leftmost row (e.g. "REQ=channel"),
+ associated header are listed indented.
+
+ "channel"
+ The channel-request is used for sending a message on any
+ channel. The "CMD" field is optional and may be omitted for
+ normal messages. Note that you should not send an history or
+ list request to _all_ known muds!
+
+ "CHANNEL"
+ The channel on which a message is send (the standard
+ channels are "intermud", "intercode", "interadm", "d-chat",
+ "d-code" and "d-adm"; on the d-channels German is spoken)
+
+ "DATA"
+ The message to be send (not used with history/list request)
+
+ "CMD"
+ The body of this header may be:
+ "" for normal intermud messages,
+ "emote" if the message is an emote/gemote,
+ "history" for an history request: the last 20 lines of
+ this channel will be shown.
+ "list" to list all remote users listening to this channel
+
+ "EMOTE" (optional)
+ The body is 1 if the message is an emote.
+ The body is 2 if the message is a gemote.
+
+ "finger"
+ Retreive information about a player or creator on a remote mud.
+
+ "DATA"
+ The player of whom information is requested
+
+ "locate"
+ Check whether a certain player is logged on at a remote mud.
+ This request is usually send to all known muds at the same time.
+
+ "user"
+ Name of the person who requests the information.
+ This is used by the sending mud only and has to be included
+ in the reply.
+
+ "vbs"
+ The verbose option has only two pre-defined values:
+ 1 Even report when the result was negative
+ 2 Don't do timeouts, but keep waiting
+ This is used by the sending mud only and has to be included
+ in the reply.
+
+ "fnd"
+ The found option is only used in the reply and it's value
+ is either 1 (success) or 0 (failure). The absence of a
+ found parameter indicates failure as well.
+
+ "DATA"
+ The player to find.
+
+ "man"
+ Retreive a manual page from a remote mud. Many muds don't
+ support this feature...
+
+ "DATA"
+ The name of the requested manual page
+
+ "mail"
+ An extension to the standard protocol, by Alvin@Sushi. This is
+ used to send mails from one mud to another.
+
+ "udpm_status"
+ This field should only be used in the reply and indicates
+ how mail is handled. Currently there are four pre-defined
+ values for the status field:
+ 0 time out
+ 1 delivered ok
+ 2 unknown player
+ 3 in spool (will be delivered later)
+
+ "udpm_writer"
+ Name of the person who wrote this mail
+
+ "udpm_spool_name"
+ Should be returned as sent, this value is used to remove
+ the mail from the spool directory after it has been
+ delivered (or refused)
+
+ "udpm_subject"
+ Subject of the mail message
+
+ "DATA"
+ The body of the mail (the actual message)
+
+ "ping"
+ A ping request has only the standard fields, the reply is
+ usually a short string like " is alive."
+
+ "query"
+ Get standard information about another mud. This is the only
+ command of which the reply may not include a load of rubbish,
+ but should only hold the requested information, so that it can
+ be parsed by the server.
+
+ "DATA"
+ The following queries are pretty much standard:
+ "commands" List all commands that are supported by the inetd
+ "email" The email-address of the mud administrator(s)
+ "hosts" A listing of all hosts in a special format [t.b.d.]
+ "inetd" The version number of the inetd used
+ "list" The list of all items which can be queried
+ "info" A short human-readable string with practically
+ "query" information
+ "mud_port" The portnumber that players connect to on login
+ "time" The local time for this mud
+ "users" A list of the people that are active in this mud
+ "version" The version of the mud-driver (and library)
+ "www" The URL of the mud's web page (e.g.
+ http://mud.stack.nl/)
+
+ "reply"
+ This request method is used for _all_ replies.
+
+ "DATA"
+ A human-readable string, containing the reply to a given query
+
+ "RCPNT"
+ The same name as in the "SND" field or the query; Usually
+ this is the name of the player who initiated the query
+
+ "QUERY"
+ This field is only used in a response to a "query" request
+ and should be equal to the "DATA" field of that request
+
+ "vbs"
+ This field is only used in a response to a "locate" request
+ and should be equal to the "vbs" field of that request
+
+ "user"
+ This field is only used in a response to a "locate" request
+ and should be equal to the "user" field of that request
+
+ "fnd"
+ This field is only used in a response to a "locate" request
+ and should be 1 if the player was located and 0 otherwise
+
+ "tell"
+ Say something to a player on another mud.
+
+ "RCPNT"
+ Name of the player to whom you are talking
+
+ "DATA"
+ Whatever you wish to say to this person
+
+ Optional emote-tos are handles are also handled as tells, so
+ muds without emote-to support display them as reasonable readable
+ tell message.
+
+ "RCPNT"
+ Name of the player to whom you are talking
+
+ "METHOD"
+ The body of this header may be:
+ "emote" if the message is an emote
+ "gemote" if the message is a genitiv emote
+
+ "DATA"
+ The text to be emoted prepended with "*" and appended
+ with "* ". If you display the emote you have to cut the
+ stars off. Muds that do not process emote-tos display the
+ emote as tell message with the stars as indication of
+ the message's emote meaning.
+
+ "who"
+ List the people that are active on a remote mud. The anwer
+ usually contains some active information about the players,
+ like titles, levels or age.
+
+ "DATA"
+ Not supported by many muds. Introduced August 1997.
+ Additional switch(es) (blanc separated) that change the
+ appearence of the resulting list. The switches normally
+ resemble the switches used inside of that mud for the 'who'
+ command. Typical values include:
+ "short" "s" "-short" "-s" "kurz":
+ Return a concise listing.
+ "alpha" "a" "alphabetisch" "-alpha" "-a"
+ Sort the players alphabetically.
+
+AUTHOR
+ Information taken from Outerspaces documentation to be found
+ on http://mud.stack.nl/intermud/
+
+SEE ALSO
+ inetd(C), intermud.basic(C), imp(C)
diff --git a/doc/concepts/intermud.basic b/doc/concepts/intermud.basic
new file mode 100644
index 0000000..901967f
--- /dev/null
+++ b/doc/concepts/intermud.basic
@@ -0,0 +1,151 @@
+CONCEPT
+ intermud.basic
+
+DESCRIPTION
+ Here is how intermud data is sent across the internet - specific
+ for Zebedee Intermud (aka Intermud 2).
+
+ADVANCED PROTOCOL
+ This file was originally written as a brief outline of the intermud
+ protocol for use by developers interested in incorperating similar,
+ compatible intermud protocols into their own mud systems. It is
+ included here as it provides a much more detailed description of the
+ intermud protocol than that provided by the original PROTOCOL file,
+ and hence may be of use to LpMud developers.
+
+PACKET PROTOCOL / FORMAT
+ All information is transferred as a string via a UDP port (each mud
+ has 1 send and 1 receive port). This kindof transfer is inherently
+ unreliable, but it's fast and doesn't use up file descriptors.
+ The format of the strings (packets) is as follows:
+
+ header1:body1|headerN:bodyN|DATA:body-data
+
+ In other words, a header name, followed by a : and then the data
+ associated with this header. Each header/body pair is separated by
+ the | character. This means that headers and their body cannot
+ contain the | character. You should check for this in outgoing
+ packets to aviod decoding errors at the recieving end. The exception
+ to this is the DATA field. If it is present, it is ALWAYS positioned
+ at the end of the packet. Once a DATA header is found, everything
+ following it is interpreted as the body of the DATA field. This
+ means it can contain special characters without error and it is
+ used to carry the main body or data of all packets.
+
+ By convention, predefined system fields will use capital letters for
+ field headers and custom headers used by specific applications will
+ use lowercase names to avoid clashes. The defined system fields are
+ generally refered to by a set of macros which are defined in a
+ common header file for clarity.
+
+ There is one exception to this header format; If the data is too
+ large to be transmitted in one single packet, it will be split into
+ packets of convenient size, each with a special unique packet header
+ to enable them to be reassembled at the receiving end. These
+ headers are of the format:
+
+ PKT:mudname:packet-id:packet-number/total-packets|rest-of-packet
+
+ In this case, the mudname and packet-id combine to form a unique id
+ for the packet. The packet-number and total-packets information is
+ used to determine when all buffered packets have been received. The
+ rest-of-packet part is not parsed, but is stored while the receiver
+ awaits the other parts of the packet. When/if all parts have been
+ received they are concatenated and decoded as a normal packet.
+
+PACKET ENCODING / DECODING
+ Only 2 generic data types are fully suported within the inetd code
+ itself (namely strings and integers), though others can easily be
+ used by converting them to one of the supported data types before
+ transfer and converting back again in receipt. The LpMud "object"
+ data type is converted to a string automatically by the inetd on
+ encoding, but no such conversion is carried out on decoding.
+
+ On encoding integers are simply converted to a corresponding string.
+ Strings are left untouched as long as there is no ambiguity as to
+ wether they should be decoded as a string or an integer. In this
+ case of ambiguity, the string is prepended with a $ character. If
+ the first character of a string is the $ character, it is escaped
+ by prepending another $ character. On decoding, any string with a $
+ as its first character will have it removed and will then be treated
+ as a string. Any remaining strings that can be converted to an
+ integer and then back to a string with no loss of information are
+ considered to be integers. Any remaining strings are treated as
+ such and are left unaltered.
+
+DEFINED SYSTEM HEADERS
+ "RCPNT" (RECIPIENT)
+ The body of this field should contiain the recipient the message
+ is to be sent to if applicable.
+ "REQ" (REQUEST)
+ The name of the intermud request that is being made of the
+ receiving mud. Standard requests that should be supported by
+ all systems are "ping" (PING), "query" (QUERY), and "reply"
+ (REPLY). The PING request is used to determine wether or not a
+ mud is active. The QUERY request is used to query a remote mud
+ for information about itself (look at the udp/query module for
+ details of what information can be requested). The REPLY request
+ is special in that it is the request name used for all replies
+ made to by mud B to an initial request made by a mud A. It is
+ mud A's responsibility to keep track of the original request
+ type so that the reply can be handled appropriately.
+ "SND" (SENDER)
+ The name of the person or object which sent the request or to
+ whom replies should be directed. This is essential if a reply
+ is expected.
+ "DATA" (DATA)
+ This field should contain the main body of any packet. It is
+ the only field that can contain special delimiting characters
+ without error.
+
+ The following headers are used internally by the inetd and should
+ not be used by external objects:
+ "HST" (HOST)
+ The IP address of the host from which a request was received.
+ This is set by the receiving mud and is not contained in
+ outgoing packets.
+ "ID" (ID)
+ The packet id. This field is simply an integer which is set by
+ the sending inetd. The number is incremented each time a packet
+ is sent (zero is never used). This field is only needed if a
+ reply is expected. REPLY packets _must_ include the original
+ request id. This is _not_ done by the inetd.
+ "NAME" (NAME)
+ The name of the local mud. Used for security checking and to
+ update host list information.
+ "PKT" (PACKET)
+ A special header reserved for packets which have been split.
+ See PACKET PROTOCOL / FORMAT.
+ "UDP" (UDP_PORT)
+ The UDP port the local mud is receiving on. Used for security
+ checking and updating host list information.
+ "SYS" (SYSTEM)
+ Contains special system flags. The only system flag used at
+ present is TIME_OUT. This is included in packets returned due
+ to an expected reply timing out to differentiate it from an
+ actual reply.
+
+UDP REQUESTS / MODULES
+ The following are standard request types that must be supported
+ by all systems:
+ "ping" (PING)
+ This module should return a REPLY packet that contains the
+ original requests ID in it's ID field and the SENDER in it's
+ RECIPIENT field. It should also include an appropriate string
+ in the DATA field, eg. "Mud-Name is alive.\n"
+ "query" (QUERY)
+ This module expects the type of query requested to appear in the
+ recieved DATA field. It should return a REPLY packet containing
+ the original ID in the ID field, the SENDER in it's RECIPIENT
+ field, and the query type in a QUERY field. The DATA field should
+ contain the information requested.
+
+ For details of how other intermud requests operate, look at the
+ relevant module code.
+
+AUTHOR
+ Information taken from Outerspaces documentation to be found
+ on http://mud.stack.nl/intermud/
+
+SEE ALSO
+ inetd(C), intermud(C)
diff --git a/doc/concepts/lpc b/doc/concepts/lpc
new file mode 100644
index 0000000..1e5911e
--- /dev/null
+++ b/doc/concepts/lpc
@@ -0,0 +1,64 @@
+* What is LPC?
+
+LPC is the language in which LPmud objects are written.
+LPC stands for Lars Pensj| C. As one might surmise from the name,
+LPC is based on the syntax of C. LPC provides the C while loop, for loop,
+if statement, switch statement, a variant of sscanf, and integer data type,
+(LPC also provides other data types not in C such as the object and the
+mapping). LPC uses C's syntax for defining and calling functions and for
+declaring variables. Note that LPC's version of the string datatype is
+much different from that provided by C. See the LPC tutorial on syntax
+and language constructs for more information.
+
+Here are some differences between LPC and C:
+
+There is no need for a function named "main" in LPC objects (although there
+is one called "create").
+
+The efuns (or system calls) provided by the gamedriver are different than
+those typically found in the C library (libc.a).
+
+There is no malloc(). However, there is an allocate(int value) efun that
+lets space be allocated for arrays. Note that the argument to 'allocate'
+is not in units of bytes, but rather in units of elements.
+
+Memory is never explicitly deallocated. The gamedriver keeps track of
+how many times a given piece of data has been referenced. When the
+reference count goes to zero (when no object has a copy of that variable),
+then the space used by the variable is reclaimed (garbage collected).
+
+The string data type in LPC is closer to that provided by BASIC than that
+provided by C. Strings are not declared as arrays of characters but rather
+as a basic intrinsic type. Strings may be concatenated using the '+' operator.
+
+For example, the LPC statements:
+
+string ack;
+
+ack = foo + bar;
+
+are equivalent to the C statements:
+
+char *ack;
+
+ack = (char *)malloc(strlen(foo) + 1);
+strcpy(ack,foo);
+ack = (char *)realloc(strlen(ack) + strlen(bar) + 1);
+strcat(ack,bar);
+
+Note: ack[i] may not appear as an lvalue (i.e. ack[i] = 'a'; will not
+work as expected).
+
+LPC is an interpreted language (however it is compiled into an internal
+compact tokenized form before being interpreted).
+
+LPC has no structures or unions. In fact, the -> operator is used to
+indicate a call to another object. The mapping datatype can serve
+as an effective substitute for structures in some situations.
+
+sscanf does not work in the same way as in C. arguments to sscanf need not
+be pointers (since LPC does not have the explicit pointer data type). Also,
+sscanf(arg,"%s %s",str1,str2) does not operate as the C programmer would
+expect. In C, the first word of arg would be copied into str1 and the
+second word of arg into str2. In LPC, the first word is copied into str1
+and the _remainder_ of arg is copied into str2.
diff --git a/doc/concepts/mail b/doc/concepts/mail
new file mode 100644
index 0000000..c08508c
--- /dev/null
+++ b/doc/concepts/mail
@@ -0,0 +1,73 @@
+CONCEPT:
+ mail
+
+
+DESCRIPTION:
+ This document describes the mail system used in Nightfall.
+ The idea is to make a central mail handling object which
+ accepts and distributes mail between users. Mail is stored in
+ the /mail directory. save_object is used to save mail
+ information in mail files in this directory. Only the mail
+ demon object and the owner of the mail file can access it.
+
+ A number of mail readers will probably available which access
+ the mail files. A typical mail user agent has commands to
+ read mail contained in the user's mail file, to reply to
+ messages, to forward, delete, save them. A folder structure
+ can be implemented. Outgoing mail is given to the mail demon
+ object by the user agent for distribution. The mailreader
+ should implement multiple recipients - carbon copy, cc and
+ blind carbon copy, bcc. Carbon copy means alternate recipients
+ to which the message should be sent. Blind carbon copy is the
+ same, but the recipients won't be listed in the received
+ message.
+
+ Save file format (sort of formal notation):
+
+ mixed *folders = ({
+ ({ string name1; string name2; ... string nameN; })
+ ({ mixed *msgs1; mixed *msgs2; ... mixed *msgsN; })
+ })
+
+ The array variable <folders> contains a number of folder
+ structures containing the actual messages. There are special
+ folders which are reserved: mail, newmail. New mail will
+ be delivered into the newmail folder. This is the only hard
+ coded requirement (the mail demon will simply deposit new
+ mail there). The folder name 'mail' should be used for read
+ mail. Other folders can be dynamically created by the user
+ agent.
+
+ Each msgs field is an array of messages:
+
+ mixed *msgs = ({ mixed *message1; ... mixed *messageM })
+
+ A message is represented as an array with the following fields:
+
+ mixed *message = ({
+ string from;
+ string sender;
+ string recipient;
+ string *cc;
+ string *bcc;
+ string subject;
+ string date;
+ string id;
+ string body;
+ })
+
+ The mailer demon (/secure/mailer, or /obj/mailer) provides
+ the following functions:
+
+ DeliverMail(mixed *message)
+ Hand a mail message over to the mailer demon. The mailer
+ demon extracts recipients from the recipient, cc and bcc
+ fields and removes the bcc information. It then deposits
+ the message to the mail files of all recipients. A valid
+ message is shown above.
+
+ int FingerMail(string user)
+ Gives the number of unread messages a user has.
+
+
+SEE ALSO:
diff --git a/doc/concepts/mccp b/doc/concepts/mccp
new file mode 100644
index 0000000..f3e31fb
--- /dev/null
+++ b/doc/concepts/mccp
@@ -0,0 +1,100 @@
+CONCEPT
+ mccp - The Mud Client Compression Protocol
+
+DESCRIPTION
+ Informations and code taken from the MCCP Homepage
+ http://www.randomly.org/projects/MCCP/
+
+ MCCP is implemented as a Telnet option [RFC854, RFC855]. The server
+ and client negotiate the use of MCCP as they would any other telnet
+ option. Once agreement has been reached on the use of the option,
+ option subnegotiation is used to determine acceptable compression
+ methods to use, and to indicate the start of a compressed data stream.
+
+ If the driver is compiled with MCCP Support there is a
+ define __MCCP__.
+
+ The driver currently supports both versions of mccp. If your mud
+ has a H_NOECHO hook you have to find out if the client supports
+ mccp. Without this hook you still have to start neogotiation.
+
+ All sub-negotiation is done by the efuns start_mccp_compress() and
+ end_mccp_compress() whether you have this hook or not.
+
+ Notice: when the client uses compressions all binary_message calls
+ are executed with flag=3. This is because writing to the
+ socket would disturb zlib stream.
+
+ mccp-efuns:
+
+ start_mccp_compress(int telopt) (only needed with H_NOECHO)
+ end_mccp_compress(int telopt) (only needed with H_NOECHO)
+ query_mccp(object player)
+ query_mccp_stats(object player)
+
+ Initiating MCCP without H_NOECHO hook:
+
+ if(!query_mccp()){
+ binary_message(({ IAC, WILL, TELOPT_COMPRESS2 }),1)
+ binary_message(({ IAC, WILL, TELOPT_COMPRESS }),1)
+ }
+
+ the driver will parse the clients answers and start compression.
+ (The connection might already be compressed, because although the
+ documentation says clients should not negotiate from themselfes,
+ zmud e.g. does.)
+
+ You can start and stop compression manually by efuns
+ when you are sure client supports compression :)
+
+
+ Initiating MCCP compression with H_NOECHO hook:
+
+ If your mudlib uses the H_NOECHO driver-hook you decided to do
+ all the negotiation by yourself:
+
+ Server Commands
+ IAC WILL COMPRESS indicates the sender supports version 1 of the
+ protocol, and is willing to compress data it sends.
+
+ IAC WILL COMPRESS2 indicates the sender supports version 2, and is
+ willing to compress data it sends.
+
+ IAC WONT COMPRESS indicates the sender refuses to compress data using
+ version 1.
+
+ IAC WONT COMPRESS2 indicates the sender refuses to compress data
+ using version 2.
+
+ Client Commands
+ IAC DO COMPRESS indicates the sender supports version 1 of the
+ protocol, and is willing to decompress data received.
+
+ IAC DO COMPRESS2 indicates the sender supports version 2 or above,
+ and is willing to decompress data received.
+
+ IAC DONT COMPRESS indicates the sender refuses to support version 1.
+ If compression was previously negotiated and is
+ currently being used, the server should terminate
+ compression.
+
+ IAC DONT COMPRESS2 indicates the sender refuses to support version 2.
+ If compression was previously negotiated and is
+ currently being used, the server should terminate
+ compression
+
+ After you found out whether the client supports mccp or not you can
+ start compression with start_mccp_compress(TELOPT_COMPRESS2) or
+ start_mccp_compress(TELOPT_COMPRESS). ( you could start it without
+ checking but some players would protest :) )
+
+AUTHOR
+ Bastian Hoyer (dafire@ff.mud.de) (some text taken from project page)
+
+HISTORY
+ Added in LDMud 3.3.447, backported to LDMud 3.2.10.
+
+SEE ALSO
+ start_mccp_compress(E), end_mccp_compress(E), query_mccp(E),
+ query_mccp_stats(object player)
+
diff --git a/doc/concepts/memory b/doc/concepts/memory
new file mode 100644
index 0000000..9524eb3
--- /dev/null
+++ b/doc/concepts/memory
@@ -0,0 +1,56 @@
+CONCEPT
+ memory
+ swapping
+
+DESCRIPTION
+
+ TODO: This is out of date. Also document the relation with reset
+
+ (Collected from the Changelogs of the driver source)
+
+ The swapping algorithm has been changed. A test is done for
+ every object, comparing to a time stamp. If the object hasn't
+ been touched for a while, it could be subject for swapping.
+ Here comes the new thing: the function 'clean_up()' will be
+ called in the object. If the object still remains, the old
+ swapping algorithm will continue. That means that objects that
+ would never be subject to swapping (cloned objects) now have a
+ chance to self-destruct. It also means that rooms that
+ contains no important data can self-destruct. Self-destruction
+ saves more memory than swapping, as swapping only frees the
+ program code, while self-destruction also frees the internal
+ object representation.
+
+ The call of clean_up() has been modified. There is a constant
+ in config.h that defines how long time until clean_up is
+ called in an object. This call is independent of reset() and
+ swapping. It is recommended that the swapping time is
+ something short, like 10 minutes to 30 minutes, while the time
+ to clean_up is longer.
+
+ Fixed several bugs in the swap/reset/clean_up logic.
+ Recommended values are that the swap time is short (less than
+ 30 minutes), and that reset time is medium (aprox 60 minutes),
+ and that time to clean_up is long (greater than 1.5h hours).
+ Any feedback of how to best tune these values are welcome. The
+ call of reset will be done once, and not yet again until the
+ object has been touched. This enables reset'ed objects to stay
+ swapped out. If you have a mudlib that has no ojbects that
+ defines 'clean_up', then you may better define this time as 0,
+ which means never call clean_up (and thus never swap the
+ object in needlessly). A well implemented usage of clean_up is
+ better than the swap algorithm, as even cloned objects can be
+ cleaned up and a self destruction is more efficient than
+ swapping (memory wise).
+
+ Changed mechanism of calling clean_up() slightly. Only objects
+ that defines the function will be called. And, only clean_up()
+ that returns non-zero will be called again. This will minimize
+ calls of clean_up(), while still cost very litte to maintain.
+
+ clean_up() now gets a flag as argument, which will be non-zero
+ if the the program of this object is used for inheritance by
+ other objects.
+
+SEE ALSO
+ clean_up(A), slow_shut_down(M), quota_demon(M), malloc(D)
diff --git a/doc/concepts/mysql b/doc/concepts/mysql
new file mode 100644
index 0000000..4fb6dee
--- /dev/null
+++ b/doc/concepts/mysql
@@ -0,0 +1,205 @@
+CONCEPT
+ mysql - mySQL support
+
+DESCRIPTION
+ On hosts with the mySQL package installed, the driver can be
+ configured to interface with the mySQL database. If that is done,
+ the driver defines the macro __MYSQL__ for LPC programs and
+ activates a number of efuns.
+
+ -- Configuration --
+
+ Create a dedicated user in the mySQL database for the driver.
+ Enter this username and password in the file pkg-mysql.c, function
+ mysql_real_connect(), and compile the driver (the username and
+ password are built into the driver for security reasons).
+ If you chose to not create either a username and/or a password,
+ leave the corresponding entry at 0.
+
+ Use mysqladmin to create any databases you want to provide - the
+ names are later used in the efun db_connect() to connect to
+ the databases.
+
+
+ -- Usage --
+
+ The idea behind SQL-support is that you can swap large amounts of
+ data into a database where it can be accessed very easily.
+ As mySQL "limits" the number of connections to 100 and as every
+ connection to the mySQL-server takes time, you should use
+ database serverobjects in your MUD which constantly keep the
+ connection to the mySQL-server.
+
+ To connect to your mySQL-server, use the efun db_connect(). It
+ takes only one argument which is the name of the database (which
+ must exist). The return-value of db_connect() is an integer
+ representing the unique handle to the database with which you will
+ identify your connection later.
+
+ To send or retrieve data from this connection, use db_exec(). The
+ first parameter for all efuns dealing with an open connection is
+ always the handle and so is the first argument the handle and the
+ second one the command you want to issue. The return-value is
+ either 0 if there was an error in your command (this can have
+ various reasons), otherwise your handle is returned again. A typical
+ SQL-statement to retrieve data would be like this:
+
+ select aliases.command from aliases where (name = 'mario' AND
+ alias regexp 'l.*')
+
+ As you know, mySQL accepts either " or ' to classify strings for
+ parameters. Most likely, you will pass variables and don't know
+ whether they contain one or more of these key-chars (or even other
+ chars that need to be converted). mySQL provides a function for
+ converting just any string into an acceptable argument and this is
+ implemented in db_conv_string().
+
+ So the above example with variables looks like this:
+
+ select aliases.command from aliases where (name ='"+
+ db_conv_string(name)+"' AND alias regexp '"+
+ db_conv_string(mask)+"')
+
+ I left out the db_exec()-stuff, more complete examples will follow.
+
+ After you initiated a statement that should return rows from the
+ database, use db_fetch() to retrieve the data. db_fetch() returns
+ the data row by row and not all at once. You need to call it until
+ it returns 0. THIS IS IMPORTANT! If stop calling db_fetch() before
+ it reaches the end of data, serious inconsistencies can happen.
+
+ If you used a DELETE- or UPDATE-statement, you cannot call db_fetch(),
+ but you might be interested in the number of deleted/changed rows
+ which can be queried with db_affected_rows().
+
+ After all operations are done in the database, you should use
+ db_close() to close the connection again. If you are using a
+ database-server-concept, place it in the remove()-function.
+
+ The SQL-efuns have some built-in optimization-features to speed up
+ often used connections. To get a list of all open connections to the
+ mySQL-server, use db_handles() which returns an array of integers
+ with all open handles.
+
+
+ -- Security --
+
+ Most SQL efuns (unless execute by the master or the simul-efun object)
+ trigger a privilege_violation ("mysql", "<efun_name>"). If a more
+ finegrained control is desired, overload the individual efuns with a
+ nomask simul-efun.
+
+ The unprivileged efuns are:
+
+ db_conv_string()
+
+
+EXAMPLE
+ A simple server to store aliases could be implemented like this:
+
+ /*
+ ** CREATION:
+ **
+ ** create table aliases (
+ ** name varchar(15) not NULL,
+ ** alias varchar(20) not NULL,
+ ** command varchar(255) not NULL,
+ ** primary key (name, alias));
+ */
+
+ #define DATABASE "mud"
+
+ private int handle;
+
+ public void create()
+ {
+ handle = db_connect(DATABASE);
+ }
+
+ public int remove()
+ {
+ if ( handle )
+ db_close(handle);
+ destruct(ME);
+ return !ME;
+ }
+
+ public int AddAlias(string alias, string command, object ob)
+ {
+ if ( !handle )
+ handle = db_connect(DATABASE);
+ if ( !db_exec(handle,
+ "insert into aliases (name, alias, command) values "
+ "('" + getuid(ob) + "','" + db_conv_string(alias)
+ + "','"+
+ db_conv_string(command) + "')") )
+ return -1;
+ return 1;
+ }
+
+ public int RemoveAlias(string alias, object ob)
+ {
+ int res;
+
+ if ( !handle )
+ handle = db_connect(DATABASE);
+ res = db_exec(handle,
+ "delete from aliases where (name = '"+
+ getuid(ob) + "' AND alias = '"
+ + db_conv_string(alias)+
+ "')");
+ if ( !res )
+ return 0;
+ res = db_affected_rows(handle);
+ return (res > 0)?1:-1;
+ }
+
+ public mixed *QueryAliases(string mask, object ob)
+ {
+ mixed *result;
+ string *tmp;
+
+ if ( !handle )
+ handle = db_connect(DATABASE);
+ if ( !db_exec(handle,
+ "select aliases.alias, aliases.command from aliases where "
+ "(name = '" + getuid(ob)+
+ "' AND alias regexp '" + db_conv_string(mask) + "')") )
+ return ({ });
+ result = ({ });
+ while ( sizeof(tmp = db_fetch(handle)) )
+ result += ({ tmp });
+ return result;
+ }
+
+ public string QueryAlias(string alias, object ob)
+ {
+ mixed *result;
+ string *tmp;
+
+ if ( !handle )
+ handle = db_connect(DATABASE);
+ if ( !db_exec(handle,
+ "select aliases.command from aliases where "
+ "(name = '" + getuid(ob)+
+ "' AND alias = '" + db_conv_string(alias) + "')") )
+ return 0;
+ result = ({ });
+ while ( sizeof(tmp = db_fetch(handle)) )
+ result += tmp;
+ return sizeof(result)?result[0]:0;
+ }
+
+
+AUTHOR
+ Mark Daniel Reidel and others.
+
+HISTORY
+ mySQL support was added as a package in 3.2.8 and became and
+ integral driver part in 3.2.9.
+ LDMud 3.2.11 added a privilege_violation() call for each efun.
+
+SEE ALSO
+ pgsql(C), db_affected_rows(E), db_conv_string(E), db_close(E),
+ db_connect(E), db_exec(E), db_fetch(E), db_handles(E),
+ db_insert_id(E), db_coldefs(E), db_error(E), privilege_violation(A)
diff --git a/doc/concepts/native b/doc/concepts/native
new file mode 100644
index 0000000..13cf008
--- /dev/null
+++ b/doc/concepts/native
@@ -0,0 +1,170 @@
+CONCEPT
+ driver modes / native driver mode
+
+DESCRIPTION
+ During the evolution of LPMud there has been a hiatus as the
+ old driver became too restricting for the demands of modern
+ muds: it did a lot of things the mudlib could do better or
+ completely different. Removing these things from the driver
+ weren't a problem, but to keep compatible with the existing
+ mudlibs (namely the well-known 2.4.5 lib), it was possible to
+ undo these changes. First by setting a runtime option, then
+ by compiling the driver either in 'compat' or in 'native' mode.
+
+ Starting with 3.2.1, the distinction between compat and native
+ mode is more and more transferred into the mudlib, with the
+ future goal of having a modeless driver.
+
+ Starting with 3.2.7, native mode no longer exists as such,
+ only 'plain' (quasi a superset of 'native' and 'compat')
+ and 'compat' mode, and since 3.2.9 the mode selection can be
+ made via commandline option.
+
+ The main mode of the driver is determined at compile time
+ by preprocessor symbols to be defined/undefined in config.h:
+
+ COMPAT_MODE: when defined, the compat mode specifics are activated
+ by default.
+
+ Additional modifications can be achieved by the specification
+ of commandline arguments (most of them have a default setting
+ entry in config.h as well):
+
+ strict-euids: when active, euid usage is enforced.
+ compat: when active, the compat mode is used.
+
+ Following is the description of the changes (de) activated by
+ these defines. A shorthand notation is used: 'compat' means
+ 'if compat mode is active' and '!compat' means 'if
+ compat mode is not active', etc.
+
+
+ Predefined Preprocessor Symbols
+ If compat, the symbols COMPAT_FLAG and __COMPAT_MODE__ are
+ defined for all LPC programs.
+ If strict-euids, the symbol __STRICT_EUIDS__ is defined
+ for all LPC programs.
+ For compatibility reasons, the symbol __EUIDS__ is defined
+ for all LPC programs all the time.
+
+
+ Preloading Of Objects
+ The driver has the possibility to preload objects before the
+ game is actually opened to the world. This is done by
+ calling master->epilog(), which has to return 0 or an array.
+ If its an array, its elements (as long as they are strings)
+ are given one by one as argument to master->preload() which
+ may now preload the objects (or do anything else).
+
+
+ Initialisation Of Objects
+ It is task of the mudlib (through the driver hooks) to call
+ the initialisation lfuns in newly created objects. The
+ following table shows the traditional calls:
+
+ mode : init call : reset call
+ --------------------------------------------
+ !compat & !native : create() : reset(1)
+ !compat & native : create() : reset()
+ compat & !native : reset(0) : reset(1)
+ compat & native : reset(0) : reset(1)
+
+ If INITIALIZATION_BY___INIT was defined, the lfun __INIT()
+ is called first on creation to initialize the objects
+ variables.
+
+
+ Movement Of Objects
+ The efun move_object() is implemented in the mudlib through
+ driver hooks and the set_environment() efun.
+ move_object() itself exists just for convenience and
+ compatibility.
+
+ In original native mode, move_object() could applied only to
+ this_object() as the object to move, and it called the lfun
+ exit() in the old environment if in compat mode. As a side
+ effect, the lfun exit() may not be target of add_action()s
+ in compat mode.
+
+ In compat mode, objects may be moved using the transfer()
+ efun. It does make assumptions about the design of the
+ mudlib, though, as it calls the lfuns query_weight(),
+ can_put_and_get(), get(), prevent_insert() and add_weight().
+
+
+ Efuns In General
+ creator(), transfer()
+ These exist only in compat mode (creator() is
+ identical with getuid()).
+
+ object_name(),function_exists()
+ In !compat mode, the returned filenames start with a
+ leading '/', in compat mode they don't.
+
+ parse_command()
+ This command exists in two versions: the old is used with
+ compat, the new with !compat. However,
+ SUPPLY_PARSE_COMMAND must be defined in config.h in both
+ cases (this efun is not very useful at all).
+
+ process_string()
+ If this_object() doesn't exist, it defaults to this_player()
+ and receives the backbone uid (returned by master->get_bb_uid())
+ as euid. If strict-euids, this uid must not be 0.
+
+ Userids and Effective Userids
+ This is probably the most important difference between the
+ modes.
+
+ LPMud always had userids (uids) attributing the objects,
+ though they were called 'creator names' in compat mode.
+ Internally, the compat mode uses the 'creator names' as
+ (e)uid.
+
+ With the introduction of native/plain mode, additionally
+ 'effective userids' (euids) were introduced to improve
+ security handling (which was only a partial success).
+ The hardcoded handling of euids and uids was quite complex
+ and too mudlib-insensitive, so most of it got moved from the
+ driver into the mudlib with 3.2.1.
+
+ In strict-euids mode, only objects with a non-zero euid may load
+ or create new objects.
+
+ --- In Detail ---
+
+ Userids of the Master
+ The masters (e)uid is determined by a call to
+ master->get_master_uid().
+ In strict-euids mode, the result has to be a string,
+ otherwise the driver won't start up at all. If the result is
+ valid it is set as the masters uid and euid.
+ In !strict-euids mode, the result may be any value: 0 or a
+ string are treated as the uid to set, a non-zero integer
+ leads to the use of the uid set in the default 'global'
+ wizlist entry, and any other value defaults to 0.
+ The euid is either set to the returned string (if any),
+ or to 0.
+ The masters uid is determined only on startup this way,
+ at runtime the uids of a reloaded master determined as
+ for every object by a call to the appropriate driver
+ hooks.
+
+ Userids of New Objects
+ To determine the (e)uids for a new object (loaded or
+ inherited, or cloned), the appropriate driver hook is
+ evaluated (H_LOAD_UIDS, H_CLONE_UIDS) and the result set
+ as (e)uid. The result may be a single value, in which case the
+ euid is set to 0, or an array ({ uid, euid }).
+ In strict-euids mode, both uid and euid must be 0 or a string,
+ any other value causes the load/clone to fail.
+ In !strict-euids mode, the uid (however returned) may also be
+ a non-zero integer to use the uid of the global
+ wizlist entry as uid. The euid is then
+ set to either 0 or the second entry of the returned
+ array if it's a string.
+
+ --- ---
+
+SEE ALSO
+ hooks(C), uids(C), move_object(E), initialisation(LPC)
diff --git a/doc/concepts/negotiation b/doc/concepts/negotiation
new file mode 100644
index 0000000..ed2318f
--- /dev/null
+++ b/doc/concepts/negotiation
@@ -0,0 +1,316 @@
+CONCEPT
+ Telnet Negotiations
+
+DESCRIPTION
+ The telnet protocol is used to control textbased connections
+ between a client (the 'telnet' program or a mud client) and a
+ server (the game driver). Most of the options offered by the
+ protocol are optional and need to be negotiated between the
+ client and the server. Consequently, and due to their
+ specialized nature, mud clients don't have to support the full
+ telnet option feature set.
+
+ For the server to find out if a client supports the telnet
+ protocol at all, one good approach is to a simple, commonly
+ used telnet command to the client. If the client reacts
+ conform to the protocol (or sends telnet commands itself), the
+ mud can continue to negotiate further options. If the client
+ does not react, the mud can safely refrain from further
+ negotiations.
+
+ The following list is a more or less comprehensive overview of
+ the telnet related RFCs (available for example on
+ http://www.faqs.org/rfcs):
+
+ RFC Titel rel. Code
+
+ 495 TELNET Protocol Specification
+ 513 Comments on the new TELNET specifications
+ 559 Comments on the new TELNET Protocol and its Implem
+ 595 Some Thoughts in Defense of the TELNET Go-Ahead
+ 596 Second Thoughts on Telnet Go-Ahead
+ 652 Telnet Output Carriage-Return Disposition Option NAOCRD 10
+ 653 Telnet Output Horizontal Tabstops Option NAOHTS 11
+ 654 Telnet Output Horizontal Tab Disposition Option NAOHTD 12
+ 655 Telnet Output Formfeed Disposition Option NAOFFD 13
+ 656 Telnet Output Vertical Tabstops Option NAOVTS 14
+ 657 Telnet Output Vertical Tab Disposition Option NAOVTD 15
+ 658 Telnet Output Linefeed Disposition NAOLFD 16
+ 698 Telnet Extended Ascii Option X-ASCII 17
+ 727 Telnet Logout Option LOGOUT 18
+ 728 A Minor Pitfall in the Telnet Protocol
+ 735 Revised TELNET Byte Macro Option BM 19
+ 749 Telnet SUPDUP-OUTPUT Option SUPDUP 22
+ 764 Telnet Protocol Specification
+ 779 Telnet SEND-LOCATION Option SENDLOC 23
+ 818 The Remote User Telnet Service
+ 854 Telnet Protocol Specification
+ 855 Telnet Option Specifications
+ 856 Telnet Binary Transmission BINARY 0
+ 857 Telnet Echo Option ECHO 1
+ 858 Telnet Suppress Go Ahead Option SGA 3
+ 859 Telnet Status Option STATUS 5
+ 860 Telnet Timing Mark Option TM 6
+ 861 Telnet Extended Options - List Option EXOPL 255
+ 884 Telnet Terminal Type Option TTYPE 24
+ 885 Telnet End of Record Option EOR 25
+ 930 Telnet Terminal Type Option TTYPE 24
+ 933 Output Marking Telnet Option OUTMRK 27
+ 946 Telnet Terminal Location Number Option TTYLOC 28
+ 1043 Telnet Data Entry Terminal Option DODIIS Implement DET 20
+ 1053 Telnet X.3 PAD Option X.3-PAD 30
+ 1073 Telnet Window Size Option NAWS 31
+ 1079 Telnet Terminal Speed Option TSPEED 32
+ 1080 Telnet Remote Flow Control Option FLOWCTRL 33
+ 1091 Telnet Terminal-Type Option TTYPE 24
+ 1096 Telnet X Display Location Option XDISPLOC 35
+ 1116 Telnet Linemode Option LINEMODE 34
+ 1143 The Q Method of Implementing TELNET Option Negotia
+ 1184 Telnet Linemode Option LINEMODE 34
+ 1372 Telnet Remote Flow Control Option FLOWCTRL 33
+ 1408 Telnet Environment Option ENVIRON 36
+ 1571 Telnet Environment Option Interoperability Issues
+ 1572 Telnet Environment Option NEWENV 39
+ 2066 Telnet Charset Option CHARSET 42
+ 2217 Telnet Com Port Control Option COMPORT 44
+ 2877 5250 Telnet Enhancements
+
+ All negotiations start with the special character IAC which is
+ defined in /usr/include/arpa/telnet.h (or in
+ src/driver/telnet.h for 3.2(.1)) and has the decimal value of
+ 255. Negotiations are based on different telnetoptions (their
+ values are defined in telnet.h too). Before a negotiation can
+ start the client and the server have to agree that they
+ support the option.
+ This works in the following way:
+
+ If a client wants to send something to the server it has to
+ send 'IAC WILL option' (For terminaltype negotation this would
+ be the 3 bytes 255,251,24; again, check telnet.h) to confirm
+ that it is able to do that. If the server is supporting that
+ option and wants to receive something it sends 'IAC DO option'
+ (255,253,option)
+
+ If one side is receiving an 'IAC WILL option' and has not yet
+ sent with DO or DONT it has to respond with either 'IAC DO
+ option' if it will support this negotiation or 'IAC DONT
+ option' if it won't.
+
+ If one side is receiving an 'IAC DO option' and has not yet
+ sent a WILL or WONT it has to reply with either 'IAC WILL
+ option' if it supports the option or 'IAC WONT option' if not.
+
+ A small example: Lets assume we want to negotiating
+ terminaltype. (TELOPT_TTYPE with value 24). client is the
+ telnet executable on the playerside, the server is the
+ gamedriver.
+
+ client server
+ IAC WILL TTYPE
+ IAC DO TTYPE
+
+ Or:
+ IAC DO TTYPE
+ IAC WILL TTYPE
+
+ After this we are ready to transfer the terminaltype from the
+ client to the server as explained below.
+
+ Now we are ready to start the real negotiations. I explain the
+ 3 options I have currently implemented.
+
+ First TerminalType aka TTYPE aka 24 aka TELOPT_TTYPE assuming
+ the client and the server have exchanged WILL/DO.
+
+ The server is now free to send 'IAC SB TELOPT_TTYPE
+ TELQUAL_SEND IAC SE' which will be replied with 'IAC SB
+ TELOPT_TTYPE TELQUAL_IS terminaltype IAC SE' where
+ terminaltype is a non-zero terminated string (it's terminated
+ by the IAC) (For values look up telnet.h) AND switch the
+ client's terminalemulation to 'terminaltype'. terminaltype is
+ case-insensitive. terminal-type may be UNKNOWN. The server may
+ repeat the SEND request and the client will respond with the
+ next preferred terminaltype. If this is the same as the
+ previous received, it marks the end of the list of
+ terminaltypes. The next SEND request will start the
+ terminaltypes from the beginning.
+
+ Example: (we have exchanged WILL/DO already)
+ client server
+ IAC SB TTYPE SEND IAC SE
+ IAC SB TTYPE IS VT200 IAC SE
+ IAC SB TTYPE SEND IAC SE
+ IAC SB TTYPE IS VT100 IAC SE
+ IAC SB TTYPE SEND IAC SE
+ IAC SB TTYPE IS VT52 IAC SE
+ IAC SB TTYPE SEND IAC SE
+ IAC SB TTYPE IS VT52 IAC SE
+ /* this marks that we have all terminaltypes. We decide to use the
+ * vt200 mode so we have to skip to VT200
+ */
+ IAC SB TTYPE SEND IAC SE
+ IAC SB TTYPE IS VT200 IAC SE
+
+
+ Next important option is NAWS (31) or WindowSizeNegotiation.
+
+ This one is a bit easier than terminaltype. After having
+ received a IAC DO NAWS from the server, the client will reply
+ with IAC WILL NAWS and immediately after that send IAC SB NAWS
+ columns_high columns_low lines_high lines_low IAC SE where
+ xx_low refers to the lowbyte of xx and xx_high refers to the
+ highbyte of xx. This will be automagically resent at every
+ windowresize (when the client gets a SIGWINCH for example) or
+ at your request with 'IAC SB NAWS SEND IAC SE'.
+
+ Example: (WILL/DO exchanged)
+ client server
+ IAC SB NAWS 0 80 0 24 IAC SE /* the standard vt100 windowsize */
+ /* no reply */
+
+ And, a bit less important but most complex, the LINEMODE (34)
+ option. It was implemented it due to the fact, that
+ some weird DOS telnets would not work otherwise. Implemented
+ are only the absolute basic feature, which is the actual
+ switching the telnet to linemode. After exchanging WILL/DO the
+ server sends a modechange request to the client using IAC SB
+ LINEMODE LM_MODE MODE_EDIT IAC SE, which should turn on local
+ commandline-editing for the client. If a client supports
+ LINEMODE it HAS to support this modechange. The client will
+ reply with IAC SB LINEMODE LM_MODE MODE_EDIT|MODE_ACK IAC SE
+ (x|y is bitwise or). Thats it for linemode. (You will perhaps
+ receive other IAC SB LINEMODEs with other LM_xxx ... you may
+ ignore them. (At least IRIX 5.x sends IAC SB LINEMODE LM_SLC
+ .... IAC SE which declares the local characterset.)).
+
+ Example: (WILL/DO negotiated)
+
+ client server
+ IAC SB LINEMODE LM_MODE
+ MODE_EDIT IAC SE
+ IAC SB LINEMODE LM_MODE
+ MODE_EDIT|MODE_ACK IAC SE
+
+ Note: The option is much more funnier as it looks here, it for
+ example supports a mixed mode between linemode and
+ charactermode... flushing the input at certain characters (at
+ ESC or TAB for shell-like commandline completition). We suggest
+ reading RFC 1184.
+
+ You might be interested in TELOPT_XDISPLAYLOC and TELOPT_ENVIRON too.
+
+ Now, how to implement this using LDMud?
+
+ 0. Patch src/driver/comm1.c, function init_telopts() to include
+ telopts_do[TELOPT_XXX] = reply_h_telnet_neg;
+ telopts_dont[TELOPT_XXX] = reply_h_telnet_neg;
+ telopts_will[TELOPT_XXX] = reply_h_telnet_neg;
+ telopts_wont[TELOPT_XXX] = reply_h_telnet_neg;
+ for every telnet negotiation you want to use.
+ Do not overwrite the TELOPT_ECHO and TELOPT_SGA hooks.
+
+ Alternatively, set the driver hook H_NOECHO in master.c:
+ this diverts _all_ telnet data into the mudlib.
+
+ 1. Add a new driver hook to master.c just below the others.
+ set_driver_hook(H_TELNET_NEG,"telnet_neg"),
+ 2. Make a telnet.h for your mudlib... just change the arrays in
+ src/driver/telnet.h.
+ 3. define a function
+
+ void telnet_neg(int cmd, int option, int * optargs)
+
+ in your interactive objects (login.c , shells, player.c or
+ whereever). And note, in ALL objects, through which a
+ player is handed through (in TAPPMud these are login.c and
+ player.c). [Ok, master.c is interactive for a very short
+ time too, but it won't accept input, will it?]
+ 'cmd' will be TELCMD_xxxx (see telnet.h), 'option' one of
+ TELOPT_xxxx and 'optargs' will be an array of ints (bytes in
+ fact) when 'cmd' is SB.
+ Parse 'cmd'/'option' and reply with appropiate answers
+ using binary_message() (appropiate meaning sending the
+ right DO/DONT/WILL/WONT if not sent before and using the SB
+ return values).
+ 3.1. Sent IAC DO TTYPE IAC DO NAWS IAC DO LINEMODE at the
+ first time you can do it (before cat()ing /WELCOME perhaps).
+ 3.2. Note all sent and received WILL/WONT/DO/DONT options for
+ conforming to the standard, avoiding endless loops and for
+ easy debugging :)
+ 3.3. Pass those recevied/sent data and other data when the
+ interactive object is changed (from login.c to player.c or
+ at other bodychanges). Clear the data when the player goes
+ linkdead or quits. You won't need to save this data.
+ 3.4. Lower_case() terminaltypes... ;)
+ 3.5. Use reasonable defaultvalues if the client does not
+ support one of the options. (columns 80,lines 24 if not
+ NAWS, unknown or vt100 for no terminaltype)
+
+ The WILL/WONT/DO/DONT data is best saved in a mapping looking
+ like this:
+ ([ "received": ([ option1: DO_DONT_OR_0;WILL_WONT_OR_0, ... ])
+ , "sent" : ([ option1: DO_DONT_OR_0;WILL_WONT_OR_0, ... ])
+ ])
+
+ (Ok, it can be done better. But not without confusing *me*
+ more.)
+
+ Before sending anything check
+ TN["sent"][option,0_if_do_dont_or_1_if_will_wont]
+ so you don't enter endless loops, save network traffic and the
+ like.
+
+ The windowsize is best saved in the players environment
+ variables so that he can modify them later on. (Or in two
+ integers in the player object...). Use for these values is
+ clear I think.
+
+ The terminaltypes received using above mentioned method are
+ best stored in an array. The actual set terminaltype is best
+ stored in an environment variable where the player can modify
+ it. Upon modifying it the IAC SB TTYPE SEND IAC SE cycle
+ should be started to match the emulation to the entered new
+ terminaltype. You then may use data retrieved from
+ /etc/termcap (man 5 termcap) or /usr/lib/terminfo/*/* (SysVID,
+ man 5 terminfo) to implement terminalcontrol codes dependend
+ on the terminaltype. /etc/termcap may prove to be the easiest
+ way tough /usr/lib/terminfo/*/* is the newer (and better) SysV
+ way of doing it.
+
+ [Anyone got a description of the internal terminfo format for
+ me? -Marcus]
+
+ LINEMODE replies may be left alone if only using the mode
+ change to MODE_EDIT
+
+ Some statistics about what clients support telnet negotiations:
+
+ Tinyfugue and some other mudclients usually do not support
+ negotiations.
+ Except for TF, which supports the Telnet End-Of-Record option
+ as marker for the end of the prompt. So if you send IAC EOR
+ after every prompt, it will print the prompt always in the
+ input window. (Do not forget to negotiate that. First IAC WILL
+ TELOPT_EOR/wait for IAC DO TELOPT_EOR). Newer versions of
+ TF will support NAWS and there will be a patch for TTYPE
+ negotiation available soon.
+
+ All telnets able to do negotiations I've encountered support
+ the TTYPE option.
+ HP9.x,Irix5.x,Linux,EP/IX,CUTELNET/NCSATELNET (Novell) and
+ perhaps more support NAWS.
+ At least Irix5.x,Linux,CU/NCSATELNET support LINEMODE.
+ SUN does not support NAWS and LINEMODE neither in SunOS 4.1.3
+ nor in Solaris 2.3.
+
+ For getting RFCs you can for example use
+ ftp://ftp.uni-erlangen.de/pub/doc/rfc/
+
+
+BUGS
+ Not all aspects of the options are mentioned to keep this doc
+ at a reasonable size. Refer to the RFCs to get more confused.
+
+CREDITS
+ Provided by Marcus@TAPPMud (Marcus Meissner,
+ <msmeissn@cip.informatik.uni-erlangen.de>).
diff --git a/doc/concepts/news b/doc/concepts/news
new file mode 100644
index 0000000..cacefb4
--- /dev/null
+++ b/doc/concepts/news
@@ -0,0 +1,88 @@
+CONCEPT:
+ news
+
+
+DESCRIPTION:
+ This document describes the news system used in Nightfall.
+ News is intended to provide a general system for bulletin
+ boards and similar objects. It is similar to the Usenet
+ news system. Articles are stored in a central area, /news.
+ Articles (Messages) are stored as files within this
+ directory. Only the news demon object is allowed to write
+ and read in the /news directory. Interfaceing to the
+ news demon is done via interface functions in the news
+ demon.
+
+ Typically news are read and written by a bulletin
+ board object or by a newsreader. Player buletin boards
+ should of course be limited to specific news groups.
+ A newsreader might be intelligent in that it autmatically
+ shows new messages. Groups may be moderated, then only the
+ moderator can write there. There are also flags whether a
+ board is a wiz_only board and who may remove messages.
+
+ Security is in several levels. It may be 0, then every
+ effective userid might read and write and delete articles,
+ although the news demon will still look for the match of
+ the sender field with certain group requirements. This level
+ of security is to make it possible to create groups
+ accessible by bulletin boards which have no euid.
+
+ Security level 1 means that euid check is done, for reading
+ and writing. This still allows for bulltin boards, but those
+ must have root euid and thus the ability to seteuid to the
+ euid of the object using the bulletin board (normally the
+ player). This feature requires native gamedriver mode.
+
+ Saved news file format (formal notation):
+
+ int security; /* wheter euid check is done, 0 or 1 */
+ mixed *accesslist = ({
+ mixed *readaccess; /* who can read messages */
+ mixed *writeaccess; /* who can write messages */
+ mixed *deleteaccess; /* who can delete messages */
+ mixed *controlaccess; /* who can control the accesslist */
+ })
+ mixed *messages = ({
+ mixed *msg1; mixed *msg2; ... mixed *msgN;
+ })
+
+ An access entry can be be one of the following:
+ - an effective userid (string)
+ - one of the keywords "wizard", "all", "author"
+ - a wizard level (integer) which is required minimal
+ - an array of one of the plain types above (Alternative).
+
+ A message looks as follows:
+
+ mixed *message = ({
+ string author; /* who wrote the message */
+ string subject; /* the subject */
+ string *groups; /* news groups where this was posted */
+ int date; /* date of message written */
+ string messageid; /* a unique string */
+ string *referenceid; /* list of references */
+ int expire; /* when message expires */
+ string body; /* the contents of the message */
+ })
+
+ The news demon (/secure/news, or /obj/news) provides the
+ following functions:
+
+ int success = PostMessage(mixed *message)
+ Tells the demon to insert the message in the news database.
+ Depending on security level, euid or message.author is
+ checked if this is allowed.
+
+ int success = DeleteMessage(string *groups, string messageid)
+ Remove a message from the database.
+
+ mixed *messagelist = ReadMessageHeaders(string group)
+ Get all message headers (complete info, but without
+ body) of newsgroup <group>.
+
+ mixed *message = ReadMessage(string *groups, string messageid)
+ Retrieve a message from the database.
+
+
+SEE ALSO:
diff --git a/doc/concepts/objects b/doc/concepts/objects
new file mode 100644
index 0000000..b9c7eb9
--- /dev/null
+++ b/doc/concepts/objects
@@ -0,0 +1,25 @@
+CONCEPT
+ objects
+
+LAST UPDATED
+ never
+
+DESCRIPTION
+ An object consists of a collection of functions (also called
+ 'methods') and data (variables) on which the functions operate.
+ The only way to manipulate the data contained in an object is
+ via one of the functions defined by the object.
+
+ Every single thing in a mud is an object. Rooms are objects.
+ Weapons are objects. Even your character is an object (a special
+ kind of object called "interactive" but still an object in most
+ every respect). Each object (except possibly virtual objects) in
+ the mud is associated with some file written in LPC (in the mud's
+ directory structure) that describes how the object is to interact
+ with the gamedriver and the rest of the objects in the mud.
+
+AUTHOR
+ Someone
+
+SEE ALSO
+ files(C), inheritance(C), create(A), reset(A)
diff --git a/doc/concepts/oop b/doc/concepts/oop
new file mode 100644
index 0000000..5146a7e
--- /dev/null
+++ b/doc/concepts/oop
@@ -0,0 +1,76 @@
+OOP
+ BESCHREIBUNG:
+ OOP steht fuer "Object-Orientierte Programmierung":
+
+ Wenn du weisst, wie man in einer prozeduralen Sprache programmiert
+ (C, PASCAL, BASIC), dann hast du bereits viele der Faehigkeiten, die
+ noetig sind um effektiv in LPC zu Programmieren. Die Hauptfaehigkeit,
+ die du brauchen wirst, ist die Begabung deine Ideen in eine Reihe von
+ Schritten zu unterteilen, so dass der Computer diese fuer dich
+ ausfuehren kann.
+
+ LPC ist aber auch eine (imperative, strukturierte) objektorientierte
+ Sprache. OOP haelt dazu an, sich zuerst um die Daten zu kuemmern und
+ dann um die Methoden mit denen diese Daten manipuliert werden:
+ Ein Spieler ist zuerst einmal ein Haufen von Attributen und Punkten
+ und seinem Namen, wie auf diese eingewirkt werden kann wird danach
+ geklaert. Ein Spielerobjekt kommuniziert und kooperiert mit vielen
+ anderen Objekten hier.
+
+ Im Folgenden sind einige der Kriterien objektorientierter Programmierung
+ aufgefuehrt:
+
+ Klassen:
+ Eine Klasse beschreibt das Verhalten einer Gruppe gleichartiger
+ Objekte. Beispielsweise kann man Lebewesen als eine Klasse ansehen,
+ weil man alle Objekte in Lebewesen und nicht-Lebewesen einteilen
+ kann. Hat man einen konkretes Monster und einen konkreten Spieler in
+ einem Raum, dann sind dies Objekte (Instanzen). Man kann von einer
+ Menge gleichartiger Objekte sagen, dass sie zu ein- und derselben
+ Klasse gehoeren.
+
+ Abstraktion:
+ Jedes Objekt im System verkoerpert als abstraktes Modell einen
+ "Arbeiter", der Auftraege erledigen kann, seinen Zustand berichten
+ und aendern kann und mit den anderen Objekten im System kommunizieren
+ kann, ohne offen zu legen, wie diese seine Faehigkeiten implementiert
+ sind.
+ [So kann man einem Objekt den Auftrag Defend(1000, DT_FIRE, ...)
+ geben. In MG werden lebende Objekte diesen Auftrag durch Aenderung
+ ihres Zustandes - LP-Abzug oder Magiereaktion - erfuellen.]
+
+ Kapselung
+ Auch das "Verbergen von Information" genannt, sorgt K. dafuer, dass
+ Objekte den internen Zustand anderer Objekte nicht in unerwarteter
+ Weise aendern koennen; nur den eigenen Methoden eines Objektes soll
+ es erlaubt sein, auf den internen Zustand direkt zuzugreifen. Alle
+ Sorten von Objekten praesentieren nach aussen Schnittstellen (man
+ properties), die darueber bestimmen, wie andere Objekte mit ihnen
+ wechselwirken koennen.
+ [Siehe vor allem man properties]
+
+ Polymorphie
+ Zeiger zu Objekten koennen es mit sich bringen, dass bei der Auswahl
+ eines konkreten Objektes seine Klasse (sein Typ) nicht offensichtlich
+ ist. Trotzdem werden Nachrichten an so selektierte Objekte korrekt
+ der tatsaechlichen Klasse zugeordnet. Wenn diese Zuordnung erst zur
+ Laufzeit aufgeloest wird, dann wird dieses Verhalten Polymorphismus
+ (auch: spaete Bindung oder dynamische Bindung) genannt.
+ [In LPC ist dynamische Bindung der Standard. Ueberschreibt man also
+ in einem NPC die(), dann wird auf jeden Fall die neue Methode
+ aufgerufen, wenn der NPC aus do_damage() heraus in sich die() ruft.]
+
+ Vererbung
+ Organisiert und erleichtert Polymorphie, indem neue Objekte definiert
+ und erzeugt werden koennen, die Spezialisierungen schon existierender
+ Objekte sind. Solche neuen Objekte koennen das vorhandene Verhalten
+ uebernehmen und erweitern, ohne dass dieses Urverhalten neu
+ implementiert werden muss. Typischerweise wird das dadurch erreicht,
+ dass Objekte zu Klassen und zu Hierarchien von Klassen gruppiert
+ werden, in denen sich die Gemeinsamkeiten im Verhalten ausdruecken.
+ [Siehe man vererbung]
+
+ SIEHE AUCH:
+ objekte, inheritance, goodstyle
+
+ 22. Maerz 2004 Gloinson
diff --git a/doc/concepts/overloading b/doc/concepts/overloading
new file mode 100644
index 0000000..06209e9
--- /dev/null
+++ b/doc/concepts/overloading
@@ -0,0 +1,48 @@
+CONCEPT
+ overloading
+
+DESCRIPTION
+ This concept is strongly connected with the concept of inheritance.
+ A function is called 'overloaded' if it is defined more than once
+ in an object. This can happen if the object inherits other objects
+ which have defined a function with the same name.
+ Usually the overloading is wanted and intended by the inheriting
+ object to change the behaviour of the function it overloads.
+ To call the overloaded functions from the overloading object the
+ ::-operator is used.
+ From outside the object only one of the functions can be called
+ via call_other() or the like; this will be the topmost of all
+ overloaded functions.
+
+ Normally an overloading function is declared the same way as the
+ overloaded function, this means it has the same number and types
+ of arguments. If an object wants to change the behaviour of the
+ function in a way that it can get more arguments than the original
+ function, it has to use the modifier 'varargs' or a compiler error
+ will be raised.
+
+EXAMPLE
+ File /players/alfe/a.c:
+
+ foo() { write("A"); }
+
+ File /players/alfe/b.c:
+
+ foo() { write("B"); }
+
+ File /players/alfe/c.c:
+
+ inherit "players/alfe/a";
+ inherit "players/alfe/b";
+
+ foo() {
+ a::foo();
+ b::foo();
+ write("C");
+ }
+
+ To call "players/alfe/c"->foo() will now result in the output of
+ ABC.
+
+SEE ALSO
+ modifiers(LPC), inheritance(C), functions(LPC)
diff --git a/doc/concepts/pcre b/doc/concepts/pcre
new file mode 100644
index 0000000..f863ffd
--- /dev/null
+++ b/doc/concepts/pcre
@@ -0,0 +1,1422 @@
+SYNOPSIS
+ PCRE - Perl-compatible regular expressions
+
+
+DESCRIPTION
+ This document describes the regular expressions supported by the
+ PCRE package. When the package is compiled into the driver, the
+ macro __PCRE__ is defined.
+
+ Most of this manpage is lifted directly from the original PCRE
+ manpage (dated January 2003).
+
+ The PCRE library is a set of functions that implement regular
+ expression pattern matching using the same syntax and semantics
+ as Perl 5, with just a few differences (see below). The
+ current implementation corresponds to Perl 5.005, with some
+ additional features from later versions. This includes some
+ experimental, incomplete support for UTF-8 encoded strings.
+ Details of exactly what is and what is not supported are given
+ below.
+
+
+PCRE REGULAR EXPRESSION DETAILS
+
+ The syntax and semantics of the regular expressions supported by PCRE
+ are described below. Regular expressions are also described in the Perl
+ documentation and in a number of other books, some of which have copi-
+ ous examples. Jeffrey Friedl's "Mastering Regular Expressions", pub-
+ lished by O'Reilly, covers them in great detail. The description here
+ is intended as reference documentation.
+
+ The basic operation of PCRE is on strings of bytes. However, there is
+ also support for UTF-8 character strings. To use this support you must
+ build PCRE to include UTF-8 support, and then call pcre_compile() with
+ the PCRE_UTF8 option. How this affects the pattern matching is men-
+ tioned in several places below. There is also a summary of UTF-8 fea-
+ tures in the section on UTF-8 support in the main pcre page.
+
+ A regular expression is a pattern that is matched against a subject
+ string from left to right. Most characters stand for themselves in a
+ pattern, and match the corresponding characters in the subject. As a
+ trivial example, the pattern
+
+ The quick brown fox
+
+ matches a portion of a subject string that is identical to itself. The
+ power of regular expressions comes from the ability to include alterna-
+ tives and repetitions in the pattern. These are encoded in the pattern
+ by the use of meta-characters, which do not stand for themselves but
+ instead are interpreted in some special way.
+
+ There are two different sets of meta-characters: those that are recog-
+ nized anywhere in the pattern except within square brackets, and those
+ that are recognized in square brackets. Outside square brackets, the
+ meta-characters are as follows:
+
+ \ general escape character with several uses
+ ^ assert start of string (or line, in multiline mode)
+ $ assert end of string (or line, in multiline mode)
+ . match any character except newline (by default)
+ [ start character class definition
+ | start of alternative branch
+ ( start subpattern
+ ) end subpattern
+ ? extends the meaning of (
+ also 0 or 1 quantifier
+ also quantifier minimizer
+ * 0 or more quantifier
+ + 1 or more quantifier
+ also "possessive quantifier"
+ { start min/max quantifier
+
+ Part of a pattern that is in square brackets is called a "character
+ class". In a character class the only meta-characters are:
+
+ \ general escape character
+ ^ negate the class, but only if the first character
+ - indicates character range
+ [ POSIX character class (only if followed by POSIX
+ syntax)
+ ] terminates the character class
+
+ The following sections describe the use of each of the meta-characters.
+
+
+BACKSLASH
+
+ The backslash character has several uses. Firstly, if it is followed by
+ a non-alphameric character, it takes away any special meaning that
+ character may have. This use of backslash as an escape character
+ applies both inside and outside character classes.
+
+ For example, if you want to match a * character, you write \* in the
+ pattern. This escaping action applies whether or not the following
+ character would otherwise be interpreted as a meta-character, so it is
+ always safe to precede a non-alphameric with backslash to specify that
+ it stands for itself. In particular, if you want to match a backslash,
+ you write \\.
+
+ If a pattern is compiled with the PCRE_EXTENDED option, whitespace in
+ the pattern (other than in a character class) and characters between a
+ # outside a character class and the next newline character are ignored.
+ An escaping backslash can be used to include a whitespace or # charac-
+ ter as part of the pattern.
+
+ If you want to remove the special meaning from a sequence of charac-
+ ters, you can do so by putting them between \Q and \E. This is differ-
+ ent from Perl in that $ and @ are handled as literals in \Q...\E
+ sequences in PCRE, whereas in Perl, $ and @ cause variable interpola-
+ tion. Note the following examples:
+
+ Pattern PCRE matches Perl matches
+
+ \Qabc$xyz\E abc$xyz abc followed by the
+ contents of $xyz
+ \Qabc\$xyz\E abc\$xyz abc\$xyz
+ \Qabc\E\$\Qxyz\E abc$xyz abc$xyz
+
+ The \Q...\E sequence is recognized both inside and outside character
+ classes.
+
+ A second use of backslash provides a way of encoding non-printing char-
+ acters in patterns in a visible manner. There is no restriction on the
+ appearance of non-printing characters, apart from the binary zero that
+ terminates a pattern, but when a pattern is being prepared by text
+ editing, it is usually easier to use one of the following escape
+ sequences than the binary character it represents:
+
+ \a alarm, that is, the BEL character (hex 07)
+ \cx "control-x", where x is any character
+ \e escape (hex 1B)
+ \f formfeed (hex 0C)
+ \n newline (hex 0A)
+ \r carriage return (hex 0D)
+ \t tab (hex 09)
+ \ddd character with octal code ddd, or backreference
+ \xhh character with hex code hh
+ \x{hhh..} character with hex code hhh... (UTF-8 mode only)
+
+ The precise effect of \cx is as follows: if x is a lower case letter,
+ it is converted to upper case. Then bit 6 of the character (hex 40) is
+ inverted. Thus \cz becomes hex 1A, but \c{ becomes hex 3B, while \c;
+ becomes hex 7B.
+
+ After \x, from zero to two hexadecimal digits are read (letters can be
+ in upper or lower case). In UTF-8 mode, any number of hexadecimal dig-
+ its may appear between \x{ and }, but the value of the character code
+ must be less than 2**31 (that is, the maximum hexadecimal value is
+ 7FFFFFFF). If characters other than hexadecimal digits appear between
+ \x{ and }, or if there is no terminating }, this form of escape is not
+ recognized. Instead, the initial \x will be interpreted as a basic hex-
+ adecimal escape, with no following digits, giving a byte whose value is
+ zero.
+
+ Characters whose value is less than 256 can be defined by either of the
+ two syntaxes for \x when PCRE is in UTF-8 mode. There is no difference
+ in the way they are handled. For example, \xdc is exactly the same as
+ \x{dc}.
+
+ After \0 up to two further octal digits are read. In both cases, if
+ there are fewer than two digits, just those that are present are used.
+ Thus the sequence \0\x\07 specifies two binary zeros followed by a BEL
+ character (code value 7). Make sure you supply two digits after the
+ initial zero if the character that follows is itself an octal digit.
+
+ The handling of a backslash followed by a digit other than 0 is compli-
+ cated. Outside a character class, PCRE reads it and any following dig-
+ its as a decimal number. If the number is less than 10, or if there
+ have been at least that many previous capturing left parentheses in the
+ expression, the entire sequence is taken as a back reference. A
+ description of how this works is given later, following the discussion
+ of parenthesized subpatterns.
+
+ Inside a character class, or if the decimal number is greater than 9
+ and there have not been that many capturing subpatterns, PCRE re-reads
+ up to three octal digits following the backslash, and generates a sin-
+ gle byte from the least significant 8 bits of the value. Any subsequent
+ digits stand for themselves. For example:
+
+ \040 is another way of writing a space
+ \40 is the same, provided there are fewer than 40
+ previous capturing subpatterns
+ \7 is always a back reference
+ \11 might be a back reference, or another way of
+ writing a tab
+ \011 is always a tab
+ \0113 is a tab followed by the character "3"
+ \113 might be a back reference, otherwise the
+ character with octal code 113
+ \377 might be a back reference, otherwise
+ the byte consisting entirely of 1 bits
+ \81 is either a back reference, or a binary zero
+ followed by the two characters "8" and "1"
+
+ Note that octal values of 100 or greater must not be introduced by a
+ leading zero, because no more than three octal digits are ever read.
+
+ All the sequences that define a single byte value or a single UTF-8
+ character (in UTF-8 mode) can be used both inside and outside character
+ classes. In addition, inside a character class, the sequence \b is
+ interpreted as the backspace character (hex 08). Outside a character
+ class it has a different meaning (see below).
+
+ The third use of backslash is for specifying generic character types:
+
+ \d any decimal digit
+ \D any character that is not a decimal digit
+ \s any whitespace character
+ \S any character that is not a whitespace character
+ \w any "word" character
+ \W any "non-word" character
+
+ Each pair of escape sequences partitions the complete set of characters
+ into two disjoint sets. Any given character matches one, and only one,
+ of each pair.
+
+ In UTF-8 mode, characters with values greater than 255 never match \d,
+ \s, or \w, and always match \D, \S, and \W.
+
+ For compatibility with Perl, \s does not match the VT character (code
+ 11). This makes it different from the the POSIX "space" class. The \s
+ characters are HT (9), LF (10), FF (12), CR (13), and space (32).
+
+ A "word" character is any letter or digit or the underscore character,
+ that is, any character which can be part of a Perl "word". The defini-
+ tion of letters and digits is controlled by PCRE's character tables,
+ and may vary if locale- specific matching is taking place (see "Locale
+ support" in the pcreapi page). For example, in the "fr" (French)
+ locale, some character codes greater than 128 are used for accented
+ letters, and these are matched by \w.
+
+ These character type sequences can appear both inside and outside char-
+ acter classes. They each match one character of the appropriate type.
+ If the current matching point is at the end of the subject string, all
+ of them fail, since there is no character to match.
+
+ The fourth use of backslash is for certain simple assertions. An asser-
+ tion specifies a condition that has to be met at a particular point in
+ a match, without consuming any characters from the subject string. The
+ use of subpatterns for more complicated assertions is described below.
+ The backslashed assertions are
+
+ \b matches at a word boundary
+ \B matches when not at a word boundary
+ \A matches at start of subject
+ \Z matches at end of subject or before newline at end
+ \z matches at end of subject
+ \G matches at first matching position in subject
+
+ These assertions may not appear in character classes (but note that \b
+ has a different meaning, namely the backspace character, inside a char-
+ acter class).
+
+ A word boundary is a position in the subject string where the current
+ character and the previous character do not both match \w or \W (i.e.
+ one matches \w and the other matches \W), or the start or end of the
+ string if the first or last character matches \w, respectively.
+
+ The \A, \Z, and \z assertions differ from the traditional circumflex
+ and dollar (described below) in that they only ever match at the very
+ start and end of the subject string, whatever options are set. Thus,
+ they are independent of multiline mode.
+
+ They are not affected by the PCRE_NOTBOL or PCRE_NOTEOL options. If the
+ startoffset argument of pcre_exec() is non-zero, indicating that match-
+ ing is to start at a point other than the beginning of the subject, \A
+ can never match. The difference between \Z and \z is that \Z matches
+ before a newline that is the last character of the string as well as at
+ the end of the string, whereas \z matches only at the end.
+
+ The \G assertion is true only when the current matching position is at
+ the start point of the match, as specified by the startoffset argument
+ of pcre_exec(). It differs from \A when the value of startoffset is
+ non-zero. By calling pcre_exec() multiple times with appropriate argu-
+ ments, you can mimic Perl's /g option, and it is in this kind of imple-
+ mentation where \G can be useful.
+
+ Note, however, that PCRE's interpretation of \G, as the start of the
+ current match, is subtly different from Perl's, which defines it as the
+ end of the previous match. In Perl, these can be different when the
+ previously matched string was empty. Because PCRE does just one match
+ at a time, it cannot reproduce this behaviour.
+
+ If all the alternatives of a pattern begin with \G, the expression is
+ anchored to the starting match position, and the "anchored" flag is set
+ in the compiled regular expression.
+
+
+CIRCUMFLEX AND DOLLAR
+
+ Outside a character class, in the default matching mode, the circumflex
+ character is an assertion which is true only if the current matching
+ point is at the start of the subject string. If the startoffset argu-
+ ment of pcre_exec() is non-zero, circumflex can never match if the
+ PCRE_MULTILINE option is unset. Inside a character class, circumflex
+ has an entirely different meaning (see below).
+
+ Circumflex need not be the first character of the pattern if a number
+ of alternatives are involved, but it should be the first thing in each
+ alternative in which it appears if the pattern is ever to match that
+ branch. If all possible alternatives start with a circumflex, that is,
+ if the pattern is constrained to match only at the start of the sub-
+ ject, it is said to be an "anchored" pattern. (There are also other
+ constructs that can cause a pattern to be anchored.)
+
+ A dollar character is an assertion which is true only if the current
+ matching point is at the end of the subject string, or immediately
+ before a newline character that is the last character in the string (by
+ default). Dollar need not be the last character of the pattern if a
+ number of alternatives are involved, but it should be the last item in
+ any branch in which it appears. Dollar has no special meaning in a
+ character class.
+
+ The meaning of dollar can be changed so that it matches only at the
+ very end of the string, by setting the PCRE_DOLLAR_ENDONLY option at
+ compile time. This does not affect the \Z assertion.
+
+ The meanings of the circumflex and dollar characters are changed if the
+ PCRE_MULTILINE option is set. When this is the case, they match immedi-
+ ately after and immediately before an internal newline character,
+ respectively, in addition to matching at the start and end of the sub-
+ ject string. For example, the pattern /^abc$/ matches the subject
+ string "def\nabc" in multiline mode, but not otherwise. Consequently,
+ patterns that are anchored in single line mode because all branches
+ start with ^ are not anchored in multiline mode, and a match for cir-
+ cumflex is possible when the startoffset argument of pcre_exec() is
+ non-zero. The PCRE_DOLLAR_ENDONLY option is ignored if PCRE_MULTILINE
+ is set.
+
+ Note that the sequences \A, \Z, and \z can be used to match the start
+ and end of the subject in both modes, and if all branches of a pattern
+ start with \A it is always anchored, whether PCRE_MULTILINE is set or
+ not.
+
+
+FULL STOP (PERIOD, DOT)
+
+ Outside a character class, a dot in the pattern matches any one charac-
+ ter in the subject, including a non-printing character, but not (by
+ default) newline. In UTF-8 mode, a dot matches any UTF-8 character,
+ which might be more than one byte long, except (by default) for new-
+ line. If the PCRE_DOTALL option is set, dots match newlines as well.
+ The handling of dot is entirely independent of the handling of circum-
+ flex and dollar, the only relationship being that they both involve
+ newline characters. Dot has no special meaning in a character class.
+
+
+MATCHING A SINGLE BYTE
+
+ Outside a character class, the escape sequence \C matches any one byte,
+ both in and out of UTF-8 mode. Unlike a dot, it always matches a new-
+ line. The feature is provided in Perl in order to match individual
+ bytes in UTF-8 mode. Because it breaks up UTF-8 characters into indi-
+ vidual bytes, what remains in the string may be a malformed UTF-8
+ string. For this reason it is best avoided.
+
+ PCRE does not allow \C to appear in lookbehind assertions (see below),
+ because in UTF-8 mode it makes it impossible to calculate the length of
+ the lookbehind.
+
+
+SQUARE BRACKETS
+
+ An opening square bracket introduces a character class, terminated by a
+ closing square bracket. A closing square bracket on its own is not spe-
+ cial. If a closing square bracket is required as a member of the class,
+ it should be the first data character in the class (after an initial
+ circumflex, if present) or escaped with a backslash.
+
+ A character class matches a single character in the subject. In UTF-8
+ mode, the character may occupy more than one byte. A matched character
+ must be in the set of characters defined by the class, unless the first
+ character in the class definition is a circumflex, in which case the
+ subject character must not be in the set defined by the class. If a
+ circumflex is actually required as a member of the class, ensure it is
+ not the first character, or escape it with a backslash.
+
+ For example, the character class [aeiou] matches any lower case vowel,
+ while [^aeiou] matches any character that is not a lower case vowel.
+ Note that a circumflex is just a convenient notation for specifying the
+ characters which are in the class by enumerating those that are not. It
+ is not an assertion: it still consumes a character from the subject
+ string, and fails if the current pointer is at the end of the string.
+
+ In UTF-8 mode, characters with values greater than 255 can be included
+ in a class as a literal string of bytes, or by using the \x{ escaping
+ mechanism.
+
+ When caseless matching is set, any letters in a class represent both
+ their upper case and lower case versions, so for example, a caseless
+ [aeiou] matches "A" as well as "a", and a caseless [^aeiou] does not
+ match "A", whereas a caseful version would. PCRE does not support the
+ concept of case for characters with values greater than 255.
+
+ The newline character is never treated in any special way in character
+ classes, whatever the setting of the PCRE_DOTALL or PCRE_MULTILINE
+ options is. A class such as [^a] will always match a newline.
+
+ The minus (hyphen) character can be used to specify a range of charac-
+ ters in a character class. For example, [d-m] matches any letter
+ between d and m, inclusive. If a minus character is required in a
+ class, it must be escaped with a backslash or appear in a position
+ where it cannot be interpreted as indicating a range, typically as the
+ first or last character in the class.
+
+ It is not possible to have the literal character "]" as the end charac-
+ ter of a range. A pattern such as [W-]46] is interpreted as a class of
+ two characters ("W" and "-") followed by a literal string "46]", so it
+ would match "W46]" or "-46]". However, if the "]" is escaped with a
+ backslash it is interpreted as the end of range, so [W-\]46] is inter-
+ preted as a single class containing a range followed by two separate
+ characters. The octal or hexadecimal representation of "]" can also be
+ used to end a range.
+
+ Ranges operate in the collating sequence of character values. They can
+ also be used for characters specified numerically, for example
+ [\000-\037]. In UTF-8 mode, ranges can include characters whose values
+ are greater than 255, for example [\x{100}-\x{2ff}].
+
+ If a range that includes letters is used when caseless matching is set,
+ it matches the letters in either case. For example, [W-c] is equivalent
+ to [][\^_`wxyzabc], matched caselessly, and if character tables for the
+ "fr" locale are in use, [\xc8-\xcb] matches accented E characters in
+ both cases.
+
+ The character types \d, \D, \s, \S, \w, and \W may also appear in a
+ character class, and add the characters that they match to the class.
+ For example, [\dABCDEF] matches any hexadecimal digit. A circumflex can
+ conveniently be used with the upper case character types to specify a
+ more restricted set of characters than the matching lower case type.
+ For example, the class [^\W_] matches any letter or digit, but not
+ underscore.
+
+ All non-alphameric characters other than \, -, ^ (at the start) and the
+ terminating ] are non-special in character classes, but it does no harm
+ if they are escaped.
+
+
+POSIX CHARACTER CLASSES
+
+ Perl supports the POSIX notation for character classes, which uses
+ names enclosed by [: and :] within the enclosing square brackets. PCRE
+ also supports this notation. For example,
+
+ [01[:alpha:]%]
+
+ matches "0", "1", any alphabetic character, or "%". The supported class
+ names are
+
+ alnum letters and digits
+ alpha letters
+ ascii character codes 0 - 127
+ blank space or tab only
+ cntrl control characters
+ digit decimal digits (same as \d)
+ graph printing characters, excluding space
+ lower lower case letters
+ print printing characters, including space
+ punct printing characters, excluding letters and digits
+ space white space (not quite the same as \s)
+ upper upper case letters
+ word "word" characters (same as \w)
+ xdigit hexadecimal digits
+
+ The "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13),
+ and space (32). Notice that this list includes the VT character (code
+ 11). This makes "space" different to \s, which does not include VT (for
+ Perl compatibility).
+
+ The name "word" is a Perl extension, and "blank" is a GNU extension
+ from Perl 5.8. Another Perl extension is negation, which is indicated
+ by a ^ character after the colon. For example,
+
+ [12[:^digit:]]
+
+ matches "1", "2", or any non-digit. PCRE (and Perl) also recognize the
+ POSIX syntax [.ch.] and [=ch=] where "ch" is a "collating element", but
+ these are not supported, and an error is given if they are encountered.
+
+ In UTF-8 mode, characters with values greater than 255 do not match any
+ of the POSIX character classes.
+
+
+VERTICAL BAR
+
+ Vertical bar characters are used to separate alternative patterns. For
+ example, the pattern
+
+ gilbert|sullivan
+
+ matches either "gilbert" or "sullivan". Any number of alternatives may
+ appear, and an empty alternative is permitted (matching the empty
+ string). The matching process tries each alternative in turn, from
+ left to right, and the first one that succeeds is used. If the alterna-
+ tives are within a subpattern (defined below), "succeeds" means match-
+ ing the rest of the main pattern as well as the alternative in the sub-
+ pattern.
+
+
+INTERNAL OPTION SETTING
+
+ The settings of the PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and
+ PCRE_EXTENDED options can be changed from within the pattern by a
+ sequence of Perl option letters enclosed between "(?" and ")". The
+ option letters are
+
+ i for PCRE_CASELESS
+ m for PCRE_MULTILINE
+ s for PCRE_DOTALL
+ x for PCRE_EXTENDED
+
+ For example, (?im) sets caseless, multiline matching. It is also possi-
+ ble to unset these options by preceding the letter with a hyphen, and a
+ combined setting and unsetting such as (?im-sx), which sets PCRE_CASE-
+ LESS and PCRE_MULTILINE while unsetting PCRE_DOTALL and PCRE_EXTENDED,
+ is also permitted. If a letter appears both before and after the
+ hyphen, the option is unset.
+
+ When an option change occurs at top level (that is, not inside subpat-
+ tern parentheses), the change applies to the remainder of the pattern
+ that follows. If the change is placed right at the start of a pattern,
+ PCRE extracts it into the global options (and it will therefore show up
+ in data extracted by the pcre_fullinfo() function).
+
+ An option change within a subpattern affects only that part of the cur-
+ rent pattern that follows it, so
+
+ (a(?i)b)c
+
+ matches abc and aBc and no other strings (assuming PCRE_CASELESS is not
+ used). By this means, options can be made to have different settings
+ in different parts of the pattern. Any changes made in one alternative
+ do carry on into subsequent branches within the same subpattern. For
+ example,
+
+ (a(?i)b|c)
+
+ matches "ab", "aB", "c", and "C", even though when matching "C" the
+ first branch is abandoned before the option setting. This is because
+ the effects of option settings happen at compile time. There would be
+ some very weird behaviour otherwise.
+
+ The PCRE-specific options PCRE_UNGREEDY and PCRE_EXTRA can be changed
+ in the same way as the Perl-compatible options by using the characters
+ U and X respectively. The (?X) flag setting is special in that it must
+ always occur earlier in the pattern than any of the additional features
+ it turns on, even when it is at top level. It is best put at the start.
+
+
+SUBPATTERNS
+
+ Subpatterns are delimited by parentheses (round brackets), which can be
+ nested. Marking part of a pattern as a subpattern does two things:
+
+ 1. It localizes a set of alternatives. For example, the pattern
+
+ cat(aract|erpillar|)
+
+ matches one of the words "cat", "cataract", or "caterpillar". Without
+ the parentheses, it would match "cataract", "erpillar" or the empty
+ string.
+
+ 2. It sets up the subpattern as a capturing subpattern (as defined
+ above). When the whole pattern matches, that portion of the subject
+ string that matched the subpattern is passed back to the caller via the
+ ovector argument of pcre_exec(). Opening parentheses are counted from
+ left to right (starting from 1) to obtain the numbers of the capturing
+ subpatterns.
+
+ For example, if the string "the red king" is matched against the pat-
+ tern
+
+ the ((red|white) (king|queen))
+
+ the captured substrings are "red king", "red", and "king", and are num-
+ bered 1, 2, and 3, respectively.
+
+ The fact that plain parentheses fulfil two functions is not always
+ helpful. There are often times when a grouping subpattern is required
+ without a capturing requirement. If an opening parenthesis is followed
+ by a question mark and a colon, the subpattern does not do any captur-
+ ing, and is not counted when computing the number of any subsequent
+ capturing subpatterns. For example, if the string "the white queen" is
+ matched against the pattern
+
+ the ((?:red|white) (king|queen))
+
+ the captured substrings are "white queen" and "queen", and are numbered
+ 1 and 2. The maximum number of capturing subpatterns is 65535, and the
+ maximum depth of nesting of all subpatterns, both capturing and non-
+ capturing, is 200.
+
+ As a convenient shorthand, if any option settings are required at the
+ start of a non-capturing subpattern, the option letters may appear
+ between the "?" and the ":". Thus the two patterns
+
+ (?i:saturday|sunday)
+ (?:(?i)saturday|sunday)
+
+ match exactly the same set of strings. Because alternative branches are
+ tried from left to right, and options are not reset until the end of
+ the subpattern is reached, an option setting in one branch does affect
+ subsequent branches, so the above patterns match "SUNDAY" as well as
+ "Saturday".
+
+
+NAMED SUBPATTERNS
+
+ Identifying capturing parentheses by number is simple, but it can be
+ very hard to keep track of the numbers in complicated regular expres-
+ sions. Furthermore, if an expression is modified, the numbers may
+ change. To help with the difficulty, PCRE supports the naming of sub-
+ patterns, something that Perl does not provide. The Python syntax
+ (?P<name>...) is used. Names consist of alphanumeric characters and
+ underscores, and must be unique within a pattern.
+
+ Named capturing parentheses are still allocated numbers as well as
+ names. The PCRE API provides function calls for extracting the name-to-
+ number translation table from a compiled pattern. For further details
+ see the pcreapi documentation.
+
+
+REPETITION
+
+ Repetition is specified by quantifiers, which can follow any of the
+ following items:
+
+ a literal data character
+ the . metacharacter
+ the \C escape sequence
+ escapes such as \d that match single characters
+ a character class
+ a back reference (see next section)
+ a parenthesized subpattern (unless it is an assertion)
+
+ The general repetition quantifier specifies a minimum and maximum num-
+ ber of permitted matches, by giving the two numbers in curly brackets
+ (braces), separated by a comma. The numbers must be less than 65536,
+ and the first must be less than or equal to the second. For example:
+
+ z{2,4}
+
+ matches "zz", "zzz", or "zzzz". A closing brace on its own is not a
+ special character. If the second number is omitted, but the comma is
+ present, there is no upper limit; if the second number and the comma
+ are both omitted, the quantifier specifies an exact number of required
+ matches. Thus
+
+ [aeiou]{3,}
+
+ matches at least 3 successive vowels, but may match many more, while
+
+ \d{8}
+
+ matches exactly 8 digits. An opening curly bracket that appears in a
+ position where a quantifier is not allowed, or one that does not match
+ the syntax of a quantifier, is taken as a literal character. For exam-
+ ple, {,6} is not a quantifier, but a literal string of four characters.
+
+ In UTF-8 mode, quantifiers apply to UTF-8 characters rather than to
+ individual bytes. Thus, for example, \x{100}{2} matches two UTF-8 char-
+ acters, each of which is represented by a two-byte sequence.
+
+ The quantifier {0} is permitted, causing the expression to behave as if
+ the previous item and the quantifier were not present.
+
+ For convenience (and historical compatibility) the three most common
+ quantifiers have single-character abbreviations:
+
+ * is equivalent to {0,}
+ + is equivalent to {1,}
+ ? is equivalent to {0,1}
+
+ It is possible to construct infinite loops by following a subpattern
+ that can match no characters with a quantifier that has no upper limit,
+ for example:
+
+ (a?)*
+
+ Earlier versions of Perl and PCRE used to give an error at compile time
+ for such patterns. However, because there are cases where this can be
+ useful, such patterns are now accepted, but if any repetition of the
+ subpattern does in fact match no characters, the loop is forcibly bro-
+ ken.
+
+ By default, the quantifiers are "greedy", that is, they match as much
+ as possible (up to the maximum number of permitted times), without
+ causing the rest of the pattern to fail. The classic example of where
+ this gives problems is in trying to match comments in C programs. These
+ appear between the sequences /* and */ and within the sequence, indi-
+ vidual * and / characters may appear. An attempt to match C comments by
+ applying the pattern
+
+ /\*.*\*/
+
+ to the string
+
+ /* first command */ not comment /* second comment */
+
+ fails, because it matches the entire string owing to the greediness of
+ the .* item.
+
+ However, if a quantifier is followed by a question mark, it ceases to
+ be greedy, and instead matches the minimum number of times possible, so
+ the pattern
+
+ /\*.*?\*/
+
+ does the right thing with the C comments. The meaning of the various
+ quantifiers is not otherwise changed, just the preferred number of
+ matches. Do not confuse this use of question mark with its use as a
+ quantifier in its own right. Because it has two uses, it can sometimes
+ appear doubled, as in
+
+ \d??\d
+
+ which matches one digit by preference, but can match two if that is the
+ only way the rest of the pattern matches.
+
+ If the PCRE_UNGREEDY option is set (an option which is not available in
+ Perl), the quantifiers are not greedy by default, but individual ones
+ can be made greedy by following them with a question mark. In other
+ words, it inverts the default behaviour.
+
+ When a parenthesized subpattern is quantified with a minimum repeat
+ count that is greater than 1 or with a limited maximum, more store is
+ required for the compiled pattern, in proportion to the size of the
+ minimum or maximum.
+
+ If a pattern starts with .* or .{0,} and the PCRE_DOTALL option (equiv-
+ alent to Perl's /s) is set, thus allowing the . to match newlines, the
+ pattern is implicitly anchored, because whatever follows will be tried
+ against every character position in the subject string, so there is no
+ point in retrying the overall match at any position after the first.
+ PCRE normally treats such a pattern as though it were preceded by \A.
+
+ In cases where it is known that the subject string contains no new-
+ lines, it is worth setting PCRE_DOTALL in order to obtain this opti-
+ mization, or alternatively using ^ to indicate anchoring explicitly.
+
+ However, there is one situation where the optimization cannot be used.
+ When .* is inside capturing parentheses that are the subject of a
+ backreference elsewhere in the pattern, a match at the start may fail,
+ and a later one succeed. Consider, for example:
+
+ (.*)abc\1
+
+ If the subject is "xyz123abc123" the match point is the fourth charac-
+ ter. For this reason, such a pattern is not implicitly anchored.
+
+ When a capturing subpattern is repeated, the value captured is the sub-
+ string that matched the final iteration. For example, after
+
+ (tweedle[dume]{3}\s*)+
+
+ has matched "tweedledum tweedledee" the value of the captured substring
+ is "tweedledee". However, if there are nested capturing subpatterns,
+ the corresponding captured values may have been set in previous itera-
+ tions. For example, after
+
+ /(a|(b))+/
+
+ matches "aba" the value of the second captured substring is "b".
+
+
+ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS
+
+ With both maximizing and minimizing repetition, failure of what follows
+ normally causes the repeated item to be re-evaluated to see if a dif-
+ ferent number of repeats allows the rest of the pattern to match. Some-
+ times it is useful to prevent this, either to change the nature of the
+ match, or to cause it fail earlier than it otherwise might, when the
+ author of the pattern knows there is no point in carrying on.
+
+ Consider, for example, the pattern \d+foo when applied to the subject
+ line
+
+ 123456bar
+
+ After matching all 6 digits and then failing to match "foo", the normal
+ action of the matcher is to try again with only 5 digits matching the
+ \d+ item, and then with 4, and so on, before ultimately failing.
+ "Atomic grouping" (a term taken from Jeffrey Friedl's book) provides
+ the means for specifying that once a subpattern has matched, it is not
+ to be re-evaluated in this way.
+
+ If we use atomic grouping for the previous example, the matcher would
+ give up immediately on failing to match "foo" the first time. The nota-
+ tion is a kind of special parenthesis, starting with (?> as in this
+ example:
+
+ (?>\d+)foo
+
+ This kind of parenthesis "locks up" the part of the pattern it con-
+ tains once it has matched, and a failure further into the pattern is
+ prevented from backtracking into it. Backtracking past it to previous
+ items, however, works as normal.
+
+ An alternative description is that a subpattern of this type matches
+ the string of characters that an identical standalone pattern would
+ match, if anchored at the current point in the subject string.
+
+ Atomic grouping subpatterns are not capturing subpatterns. Simple cases
+ such as the above example can be thought of as a maximizing repeat that
+ must swallow everything it can. So, while both \d+ and \d+? are pre-
+ pared to adjust the number of digits they match in order to make the
+ rest of the pattern match, (?>\d+) can only match an entire sequence of
+ digits.
+
+ Atomic groups in general can of course contain arbitrarily complicated
+ subpatterns, and can be nested. However, when the subpattern for an
+ atomic group is just a single repeated item, as in the example above, a
+ simpler notation, called a "possessive quantifier" can be used. This
+ consists of an additional + character following a quantifier. Using
+ this notation, the previous example can be rewritten as
+
+ \d++bar
+
+ Possessive quantifiers are always greedy; the setting of the
+ PCRE_UNGREEDY option is ignored. They are a convenient notation for the
+ simpler forms of atomic group. However, there is no difference in the
+ meaning or processing of a possessive quantifier and the equivalent
+ atomic group.
+
+ The possessive quantifier syntax is an extension to the Perl syntax. It
+ originates in Sun's Java package.
+
+ When a pattern contains an unlimited repeat inside a subpattern that
+ can itself be repeated an unlimited number of times, the use of an
+ atomic group is the only way to avoid some failing matches taking a
+ very long time indeed. The pattern
+
+ (\D+|<\d+>)*[!?]
+
+ matches an unlimited number of substrings that either consist of non-
+ digits, or digits enclosed in <>, followed by either ! or ?. When it
+ matches, it runs quickly. However, if it is applied to
+
+ aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+
+ it takes a long time before reporting failure. This is because the
+ string can be divided between the two repeats in a large number of
+ ways, and all have to be tried. (The example used [!?] rather than a
+ single character at the end, because both PCRE and Perl have an opti-
+ mization that allows for fast failure when a single character is used.
+ They remember the last single character that is required for a match,
+ and fail early if it is not present in the string.) If the pattern is
+ changed to
+
+ ((?>\D+)|<\d+>)*[!?]
+
+ sequences of non-digits cannot be broken, and failure happens quickly.
+
+
+BACK REFERENCES
+
+ Outside a character class, a backslash followed by a digit greater than
+ 0 (and possibly further digits) is a back reference to a capturing sub-
+ pattern earlier (that is, to its left) in the pattern, provided there
+ have been that many previous capturing left parentheses.
+
+ However, if the decimal number following the backslash is less than 10,
+ it is always taken as a back reference, and causes an error only if
+ there are not that many capturing left parentheses in the entire pat-
+ tern. In other words, the parentheses that are referenced need not be
+ to the left of the reference for numbers less than 10. See the section
+ entitled "Backslash" above for further details of the handling of dig-
+ its following a backslash.
+
+ A back reference matches whatever actually matched the capturing sub-
+ pattern in the current subject string, rather than anything matching
+ the subpattern itself (see "Subpatterns as subroutines" below for a way
+ of doing that). So the pattern
+
+ (sens|respons)e and \1ibility
+
+ matches "sense and sensibility" and "response and responsibility", but
+ not "sense and responsibility". If caseful matching is in force at the
+ time of the back reference, the case of letters is relevant. For exam-
+ ple,
+
+ ((?i)rah)\s+\1
+
+ matches "rah rah" and "RAH RAH", but not "RAH rah", even though the
+ original capturing subpattern is matched caselessly.
+
+ Back references to named subpatterns use the Python syntax (?P=name).
+ We could rewrite the above example as follows:
+
+ (?<p1>(?i)rah)\s+(?P=p1)
+
+ There may be more than one back reference to the same subpattern. If a
+ subpattern has not actually been used in a particular match, any back
+ references to it always fail. For example, the pattern
+
+ (a|(bc))\2
+
+ always fails if it starts to match "a" rather than "bc". Because there
+ may be many capturing parentheses in a pattern, all digits following
+ the backslash are taken as part of a potential back reference number.
+ If the pattern continues with a digit character, some delimiter must be
+ used to terminate the back reference. If the PCRE_EXTENDED option is
+ set, this can be whitespace. Otherwise an empty comment can be used.
+
+ A back reference that occurs inside the parentheses to which it refers
+ fails when the subpattern is first used, so, for example, (a\1) never
+ matches. However, such references can be useful inside repeated sub-
+ patterns. For example, the pattern
+
+ (a|b\1)+
+
+ matches any number of "a"s and also "aba", "ababbaa" etc. At each iter-
+ ation of the subpattern, the back reference matches the character
+ string corresponding to the previous iteration. In order for this to
+ work, the pattern must be such that the first iteration does not need
+ to match the back reference. This can be done using alternation, as in
+ the example above, or by a quantifier with a minimum of zero.
+
+
+ASSERTIONS
+
+ An assertion is a test on the characters following or preceding the
+ current matching point that does not actually consume any characters.
+ The simple assertions coded as \b, \B, \A, \G, \Z, \z, ^ and $ are
+ described above. More complicated assertions are coded as subpatterns.
+ There are two kinds: those that look ahead of the current position in
+ the subject string, and those that look behind it.
+
+ An assertion subpattern is matched in the normal way, except that it
+ does not cause the current matching position to be changed. Lookahead
+ assertions start with (?= for positive assertions and (?! for negative
+ assertions. For example,
+
+ \w+(?=;)
+
+ matches a word followed by a semicolon, but does not include the semi-
+ colon in the match, and
+
+ foo(?!bar)
+
+ matches any occurrence of "foo" that is not followed by "bar". Note
+ that the apparently similar pattern
+
+ (?!foo)bar
+
+ does not find an occurrence of "bar" that is preceded by something
+ other than "foo"; it finds any occurrence of "bar" whatsoever, because
+ the assertion (?!foo) is always true when the next three characters are
+ "bar". A lookbehind assertion is needed to achieve this effect.
+
+ If you want to force a matching failure at some point in a pattern, the
+ most convenient way to do it is with (?!) because an empty string
+ always matches, so an assertion that requires there not to be an empty
+ string must always fail.
+
+ Lookbehind assertions start with (?<= for positive assertions and (?<!
+ for negative assertions. For example,
+
+ (?<!foo)bar
+
+ does find an occurrence of "bar" that is not preceded by "foo". The
+ contents of a lookbehind assertion are restricted such that all the
+ strings it matches must have a fixed length. However, if there are sev-
+ eral alternatives, they do not all have to have the same fixed length.
+ Thus
+
+ (?<=bullock|donkey)
+
+ is permitted, but
+
+ (?<!dogs?|cats?)
+
+ causes an error at compile time. Branches that match different length
+ strings are permitted only at the top level of a lookbehind assertion.
+ This is an extension compared with Perl (at least for 5.8), which
+ requires all branches to match the same length of string. An assertion
+ such as
+
+ (?<=ab(c|de))
+
+ is not permitted, because its single top-level branch can match two
+ different lengths, but it is acceptable if rewritten to use two top-
+ level branches:
+
+ (?<=abc|abde)
+
+ The implementation of lookbehind assertions is, for each alternative,
+ to temporarily move the current position back by the fixed width and
+ then try to match. If there are insufficient characters before the cur-
+ rent position, the match is deemed to fail.
+
+ PCRE does not allow the \C escape (which matches a single byte in UTF-8
+ mode) to appear in lookbehind assertions, because it makes it impossi-
+ ble to calculate the length of the lookbehind.
+
+ Atomic groups can be used in conjunction with lookbehind assertions to
+ specify efficient matching at the end of the subject string. Consider a
+ simple pattern such as
+
+ abcd$
+
+ when applied to a long string that does not match. Because matching
+ proceeds from left to right, PCRE will look for each "a" in the subject
+ and then see if what follows matches the rest of the pattern. If the
+ pattern is specified as
+
+ ^.*abcd$
+
+ the initial .* matches the entire string at first, but when this fails
+ (because there is no following "a"), it backtracks to match all but the
+ last character, then all but the last two characters, and so on. Once
+ again the search for "a" covers the entire string, from right to left,
+ so we are no better off. However, if the pattern is written as
+
+ ^(?>.*)(?<=abcd)
+
+ or, equivalently,
+
+ ^.*+(?<=abcd)
+
+ there can be no backtracking for the .* item; it can match only the
+ entire string. The subsequent lookbehind assertion does a single test
+ on the last four characters. If it fails, the match fails immediately.
+ For long strings, this approach makes a significant difference to the
+ processing time.
+
+ Several assertions (of any sort) may occur in succession. For example,
+
+ (?<=\d{3})(?<!999)foo
+
+ matches "foo" preceded by three digits that are not "999". Notice that
+ each of the assertions is applied independently at the same point in
+ the subject string. First there is a check that the previous three
+ characters are all digits, and then there is a check that the same
+ three characters are not "999". This pattern does not match "foo" pre-
+ ceded by six characters, the first of which are digits and the last
+ three of which are not "999". For example, it doesn't match "123abc-
+ foo". A pattern to do that is
+
+ (?<=\d{3}...)(?<!999)foo
+
+ This time the first assertion looks at the preceding six characters,
+ checking that the first three are digits, and then the second assertion
+ checks that the preceding three characters are not "999".
+
+ Assertions can be nested in any combination. For example,
+
+ (?<=(?<!foo)bar)baz
+
+ matches an occurrence of "baz" that is preceded by "bar" which in turn
+ is not preceded by "foo", while
+
+ (?<=\d{3}(?!999)...)foo
+
+ is another pattern which matches "foo" preceded by three digits and any
+ three characters that are not "999".
+
+ Assertion subpatterns are not capturing subpatterns, and may not be
+ repeated, because it makes no sense to assert the same thing several
+ times. If any kind of assertion contains capturing subpatterns within
+ it, these are counted for the purposes of numbering the capturing sub-
+ patterns in the whole pattern. However, substring capturing is carried
+ out only for positive assertions, because it does not make sense for
+ negative assertions.
+
+
+CONDITIONAL SUBPATTERNS
+
+ It is possible to cause the matching process to obey a subpattern con-
+ ditionally or to choose between two alternative subpatterns, depending
+ on the result of an assertion, or whether a previous capturing
+ subpattern matched or not. The two possible forms of conditional sub-
+ pattern are
+
+ (?(condition)yes-pattern)
+ (?(condition)yes-pattern|no-pattern)
+
+ If the condition is satisfied, the yes-pattern is used; otherwise the
+ no-pattern (if present) is used. If there are more than two alterna-
+ tives in the subpattern, a compile-time error occurs.
+
+ There are three kinds of condition. If the text between the parentheses
+ consists of a sequence of digits, the condition is satisfied if the
+ capturing subpattern of that number has previously matched. The number
+ must be greater than zero. Consider the following pattern, which con-
+ tains non-significant white space to make it more readable (assume the
+ PCRE_EXTENDED option) and to divide it into three parts for ease of
+ discussion:
+
+ ( \( )? [^()]+ (?(1) \) )
+
+ The first part matches an optional opening parenthesis, and if that
+ character is present, sets it as the first captured substring. The sec-
+ ond part matches one or more characters that are not parentheses. The
+ third part is a conditional subpattern that tests whether the first set
+ of parentheses matched or not. If they did, that is, if subject started
+ with an opening parenthesis, the condition is true, and so the yes-pat-
+ tern is executed and a closing parenthesis is required. Otherwise,
+ since no-pattern is not present, the subpattern matches nothing. In
+ other words, this pattern matches a sequence of non-parentheses,
+ optionally enclosed in parentheses.
+
+ If the condition is the string (R), it is satisfied if a recursive call
+ to the pattern or subpattern has been made. At "top level", the condi-
+ tion is false. This is a PCRE extension. Recursive patterns are
+ described in the next section.
+
+ If the condition is not a sequence of digits or (R), it must be an
+ assertion. This may be a positive or negative lookahead or lookbehind
+ assertion. Consider this pattern, again containing non-significant
+ white space, and with the two alternatives on the second line:
+
+ (?(?=[^a-z]*[a-z])
+ \d{2}-[a-z]{3}-\d{2} | \d{2}-\d{2}-\d{2} )
+
+ The condition is a positive lookahead assertion that matches an
+ optional sequence of non-letters followed by a letter. In other words,
+ it tests for the presence of at least one letter in the subject. If a
+ letter is found, the subject is matched against the first alternative;
+ otherwise it is matched against the second. This pattern matches
+ strings in one of the two forms dd-aaa-dd or dd-dd-dd, where aaa are
+ letters and dd are digits.
+
+
+COMMENTS
+
+ The sequence (?# marks the start of a comment which continues up to the
+ next closing parenthesis. Nested parentheses are not permitted. The
+ characters that make up a comment play no part in the pattern matching
+ at all.
+
+ If the PCRE_EXTENDED option is set, an unescaped # character outside a
+ character class introduces a comment that continues up to the next new-
+ line character in the pattern.
+
+
+RECURSIVE PATTERNS
+
+ Consider the problem of matching a string in parentheses, allowing for
+ unlimited nested parentheses. Without the use of recursion, the best
+ that can be done is to use a pattern that matches up to some fixed
+ depth of nesting. It is not possible to handle an arbitrary nesting
+ depth. Perl has provided an experimental facility that allows regular
+ expressions to recurse (amongst other things). It does this by interpo-
+ lating Perl code in the expression at run time, and the code can refer
+ to the expression itself. A Perl pattern to solve the parentheses prob-
+ lem can be created like this:
+
+ $re = qr{\( (?: (?>[^()]+) | (?p{$re}) )* \)}x;
+
+ The (?p{...}) item interpolates Perl code at run time, and in this case
+ refers recursively to the pattern in which it appears. Obviously, PCRE
+ cannot support the interpolation of Perl code. Instead, it supports
+ some special syntax for recursion of the entire pattern, and also for
+ individual subpattern recursion.
+
+ The special item that consists of (? followed by a number greater than
+ zero and a closing parenthesis is a recursive call of the subpattern of
+ the given number, provided that it occurs inside that subpattern. (If
+ not, it is a "subroutine" call, which is described in the next sec-
+ tion.) The special item (?R) is a recursive call of the entire regular
+ expression.
+
+ For example, this PCRE pattern solves the nested parentheses problem
+ (assume the PCRE_EXTENDED option is set so that white space is
+ ignored):
+
+ \( ( (?>[^()]+) | (?R) )* \)
+
+ First it matches an opening parenthesis. Then it matches any number of
+ substrings which can either be a sequence of non-parentheses, or a
+ recursive match of the pattern itself (that is a correctly parenthe-
+ sized substring). Finally there is a closing parenthesis.
+
+ If this were part of a larger pattern, you would not want to recurse
+ the entire pattern, so instead you could use this:
+
+ ( \( ( (?>[^()]+) | (?1) )* \) )
+
+ We have put the pattern into parentheses, and caused the recursion to
+ refer to them instead of the whole pattern. In a larger pattern, keep-
+ ing track of parenthesis numbers can be tricky. It may be more conve-
+ nient to use named parentheses instead. For this, PCRE uses (?P>name),
+ which is an extension to the Python syntax that PCRE uses for named
+ parentheses (Perl does not provide named parentheses). We could rewrite
+ the above example as follows:
+
+ (?P<pn> \( ( (?>[^()]+) | (?P>pn) )* \) )
+
+ This particular example pattern contains nested unlimited repeats, and
+ so the use of atomic grouping for matching strings of non-parentheses
+ is important when applying the pattern to strings that do not match.
+ For example, when this pattern is applied to
+
+ (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
+
+ it yields "no match" quickly. However, if atomic grouping is not used,
+ the match runs for a very long time indeed because there are so many
+ different ways the + and * repeats can carve up the subject, and all
+ have to be tested before failure can be reported.
+
+ At the end of a match, the values set for any capturing subpatterns are
+ those from the outermost level of the recursion at which the subpattern
+ value is set. If you want to obtain intermediate values, a callout
+ function can be used (see below and the pcrecallout documentation). If
+ the pattern above is matched against
+
+ (ab(cd)ef)
+
+ the value for the capturing parentheses is "ef", which is the last
+ value taken on at the top level. If additional parentheses are added,
+ giving
+
+ \( ( ( (?>[^()]+) | (?R) )* ) \)
+ ^ ^
+ ^ ^
+
+ the string they capture is "ab(cd)ef", the contents of the top level
+ parentheses. If there are more than 15 capturing parentheses in a pat-
+ tern, PCRE has to obtain extra memory to store data during a recursion,
+ which it does by using pcre_malloc, freeing it via pcre_free after-
+ wards. If no memory can be obtained, the match fails with the
+ PCRE_ERROR_NOMEMORY error.
+
+ Do not confuse the (?R) item with the condition (R), which tests for
+ recursion. Consider this pattern, which matches text in angle brack-
+ ets, allowing for arbitrary nesting. Only digits are allowed in nested
+ brackets (that is, when recursing), whereas any characters are permit-
+ ted at the outer level.
+
+ < (?: (?(R) \d++ | [^<>]*+) | (?R)) * >
+
+ In this pattern, (?(R) is the start of a conditional subpattern, with
+ two different alternatives for the recursive and non-recursive cases.
+ The (?R) item is the actual recursive call.
+
+
+SUBPATTERNS AS SUBROUTINES
+
+ If the syntax for a recursive subpattern reference (either by number or
+ by name) is used outside the parentheses to which it refers, it oper-
+ ates like a subroutine in a programming language. An earlier example
+ pointed out that the pattern
+
+ (sens|respons)e and \1ibility
+
+ matches "sense and sensibility" and "response and responsibility", but
+ not "sense and responsibility". If instead the pattern
+
+ (sens|respons)e and (?1)ibility
+
+ is used, it does match "sense and responsibility" as well as the other
+ two strings. Such references must, however, follow the subpattern to
+ which they refer.
+
+
+CALLOUTS
+
+ Perl has a feature whereby using the sequence (?{...}) causes arbitrary
+ Perl code to be obeyed in the middle of matching a regular expression.
+ This makes it possible, amongst other things, to extract different sub-
+ strings that match the same pair of parentheses when there is a repeti-
+ tion.
+
+ PCRE provides a similar feature, but of course it cannot obey arbitrary
+ Perl code. The feature is called "callout". The caller of PCRE provides
+ an external function by putting its entry point in the global variable
+ pcre_callout. By default, this variable contains NULL, which disables
+ all calling out.
+
+ Within a regular expression, (?C) indicates the points at which the
+ external function is to be called. If you want to identify different
+ callout points, you can put a number less than 256 after the letter C.
+ The default value is zero. For example, this pattern has two callout
+ points:
+
+ (?C1)abc(?C2)def
+
+ During matching, when PCRE reaches a callout point (and pcre_callout is
+ set), the external function is called. It is provided with the number
+ of the callout, and, optionally, one item of data originally supplied
+ by the caller of pcre_exec(). The callout function may cause matching
+ to backtrack, or to fail altogether. A complete description of the
+ interface to the callout function is given in the pcrecallout documen-
+ tation.
+
+
+DIFFERENCES FROM PERL
+ This section escribes the differences in the ways that PCRE and Perl
+ handle regular expressions. The differences described here are with
+ respect to Perl 5.8.
+
+ 1. PCRE does not have full UTF-8 support. Details of what it does have
+ are given in the section on UTF-8 support in the main pcre page.
+
+ 2. PCRE does not allow repeat quantifiers on lookahead assertions. Perl
+ permits them, but they do not mean what you might think. For example,
+ (?!a){3} does not assert that the next three characters are not "a". It
+ just asserts that the next character is not "a" three times.
+
+ 3. Capturing subpatterns that occur inside negative lookahead asser-
+ tions are counted, but their entries in the offsets vector are never
+ set. Perl sets its numerical variables from any such patterns that are
+ matched before the assertion fails to match something (thereby succeed-
+ ing), but only if the negative lookahead assertion contains just one
+ branch.
+
+ 4. Though binary zero characters are supported in the subject string,
+ they are not allowed in a pattern string because it is passed as a nor-
+ mal C string, terminated by zero. The escape sequence "\0" can be used
+ in the pattern to represent a binary zero.
+
+ 5. The following Perl escape sequences are not supported: \l, \u, \L,
+ \U, \P, \p, \N, and \X. In fact these are implemented by Perl's general
+ string-handling and are not part of its pattern matching engine. If any
+ of these are encountered by PCRE, an error is generated.
+
+ 6. PCRE does support the \Q...\E escape for quoting substrings. Charac-
+ ters in between are treated as literals. This is slightly different
+ from Perl in that $ and @ are also handled as literals inside the
+ quotes. In Perl, they cause variable interpolation (but of course PCRE
+ does not have variables). Note the following examples:
+
+ Pattern PCRE matches Perl matches
+
+ \Qabc$xyz\E abc$xyz abc followed by the
+ contents of $xyz
+ \Qabc\$xyz\E abc\$xyz abc\$xyz
+ \Qabc\E\$\Qxyz\E abc$xyz abc$xyz
+
+ The \Q...\E sequence is recognized both inside and outside character
+ classes.
+
+ 7. Fairly obviously, PCRE does not support the (?{code}) and (?p{code})
+ constructions. However, there is some experimental support for recur-
+ sive patterns using the non-Perl items (?R), (?number) and (?P>name).
+ Also, the PCRE "callout" feature allows an external function to be
+ called during pattern matching.
+
+ 8. There are some differences that are concerned with the settings of
+ captured strings when part of a pattern is repeated. For example,
+ matching "aba" against the pattern /^(a(b)?)+$/ in Perl leaves $2
+ unset, but in PCRE it is set to "b".
+
+ 9. PCRE provides some extensions to the Perl regular expression
+ facilities:
+
+ (a) Although lookbehind assertions must match fixed length strings,
+ each alternative branch of a lookbehind assertion can match a different
+ length of string. Perl requires them all to have the same length.
+
+ (b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $
+ meta-character matches only at the very end of the string.
+
+ (c) If PCRE_EXTRA is set, a backslash followed by a letter with no spe-
+ cial meaning is faulted.
+
+ (d) If PCRE_UNGREEDY is set, the greediness of the repetition quanti-
+ fiers is inverted, that is, by default they are not greedy, but if fol-
+ lowed by a question mark they are.
+
+ (e) PCRE_ANCHORED can be used to force a pattern to be tried only at
+ the first matching position in the subject string.
+
+ (f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, and PCRE_NO_AUTO_CAP-
+ TURE options for pcre_exec() have no Perl equivalents.
+
+ (g) The (?R), (?number), and (?P>name) constructs allows for recursive
+ pattern matching (Perl can do this using the (?p{code}) construct,
+ which PCRE cannot support.)
+
+ (h) PCRE supports named capturing substrings, using the Python syntax.
+
+ (i) PCRE supports the possessive quantifier "++" syntax, taken from
+ Sun's Java package.
+
+ (j) The (R) condition, for testing recursion, is a PCRE extension.
+
+ (k) The callout facility is PCRE-specific.
+
+
+
+NOTES
+ The \< and \> metacharacters from Henry Spencers package
+ are not available in PCRE, but can be emulate with \b,
+ as required, also in conjunction with \W or \w.
+
+ In LDMud, backtracks are limited by the EVAL_COST runtime
+ limit, to avoid freezing the driver with a match
+ like regexp(({"=XX==================="}), "X(.+)+X").
+
+ LDMud doesn't support PCRE callouts.
+
+
+LIMITATIONS
+ There are some size limitations in PCRE but it is hoped that
+ they will never in practice be relevant. The maximum length
+ of a compiled pattern is 65539 (sic) bytes. All values in
+ repeating quantifiers must be less than 65536. There max-
+ imum number of capturing subpatterns is 65535. There is no
+ limit to the number of non-capturing subpatterns, but the
+ maximum depth of nesting of all kinds of parenthesized sub-
+ pattern, including capturing subpatterns, assertions, and
+ other types of subpattern, is 200.
+
+ The maximum length of a subject string is the largest posi-
+ tive number that an integer variable can hold. However, PCRE
+ uses recursion to handle subpatterns and indefinite repeti-
+ tion. This means that the available stack space may limit
+ the size of a subject string that can be processed by cer-
+ tain patterns.
+
+
+AUTHOR
+ Philip Hazel <ph10@cam.ac.uk>
+ University Computing Service,
+ New Museums Site,
+ Cambridge CB2 3QG, England.
+ Phone: +44 1223 334714
+
+SEE ALSO
+ regexp(C), hsregexp(C)
diff --git a/doc/concepts/pgsql b/doc/concepts/pgsql
new file mode 100644
index 0000000..12f2cc4
--- /dev/null
+++ b/doc/concepts/pgsql
@@ -0,0 +1,88 @@
+CONCEPT
+ pgsql - PostgreSQL support
+
+DESCRIPTION
+ On hosts with the PostgreSQL package installed, the driver can be
+ configured to interface with the PostgreSQL database. If that is done,
+ the driver defines the macro __PGSQL__ for LPC programs and
+ activates a number of related efuns.
+
+ -- Usage --
+
+ The interface to the PostgreSQL database is implemented
+ through the concept of a controlling object: when opening a
+ database connection, the LPC code has to provide a callback
+ function. The object this function is bound to is the
+ controlling object: all queries to the database will be issued
+ by this object, and the responses will be sent to the callback
+ function.
+
+ The interface is also asynchronous: the pg_query() efun just
+ queues the query with the database connection, and returns
+ immediately. When the database has finished working the query,
+ the callback function is called with the results.
+
+ The callback function can be defined by name or by closure,
+ and can be defined with extra parameters:
+
+
+ #include <pgsql.h>
+
+ void <callback>(int type, mixed ret, int id [, mixed extra...])
+
+ <type> is the type of the call, <id> identifies the query
+ for which this call is executed:
+
+ PGRES_TUPLES_OK: <ret> is the result from a query.
+ It is either a mapping (field name as
+ key, indexing <n> values for n returned
+ tuples), or an array of arrays (one per
+ row).
+
+ PGRES_COMMAND_OK: <ret> is a string which contains the
+ server response (e.g. on INSERT or DELETE)
+
+ PGRES_BAD_RESPONSE,
+ PGRES_NONFATAL_ERROR,
+ PGRES_FATAL_ERROR: ret is the error-string
+
+
+ void <callback>(int type, mixed ret [, mixed extra...])
+
+ <type> is the type of the call, which is not related a
+ specific query:
+
+ PGCONN_SUCCESS: The database-connection was established,
+ <ret> is a dummy string.
+ PGCONN_FAILED: The database-connection failed, <ret> is
+ the error message.
+ The first message to the callback after a call to
+ pg_connect() is always one of these two.
+
+ PGRES_NOTICE: <ret> is a informational text.
+
+ PGCONN_ABORTED: If the connection to the backend fails
+ we try to re-establish (reset) it. If the
+ reset fails, the connection is closed and
+ this value is returned. Consider the
+ connection gone and don't try to close or
+ otherwise operate further on it.
+ <ret> is a dummy string.
+
+ -- Security --
+
+ All SQL efuns (unless execute by the master or the simul-efun object)
+ trigger a privilege_violation ("pgsql", "<efun_name>"). If a more
+ finegrained control is desired, overload the individual efuns with a
+ nomask simul-efun.
+
+AUTHOR
+ Florian Heinz and others.
+
+HISTORY
+ Added as package in LDMud 3.3.445.
+ LDMud 3.3.640 added a privilege_violation() call for each efun.
+
+SEE ALSO
+ mysql(C), pg_connect(E), pg_conv_string(E), pg_query(E), pg_pending(E),
+ pg_close(E), privilege_violation(M)
diff --git a/doc/concepts/properties b/doc/concepts/properties
new file mode 100644
index 0000000..1ab418a
--- /dev/null
+++ b/doc/concepts/properties
@@ -0,0 +1,109 @@
+Properties
+ BESCHREIBUNG:
+ Im Gegensatz zu Variablen innerhalb eines Objektes, kann man Properties
+ von aussen veraendern, ohne eine besondere Funktion geschrieben zu haben.
+
+ 1. Das zugrundeliegende Prinzip
+ ===============================
+ Das grundlegende Konzept der MUDlib ist, dass wichtige, objektbezogene
+ Informationen in den sogenannnten Properties gespeichert werden (engl.
+ property -- Eigenschaft, Eigentum).
+
+ Diese Informationen koennen einfache Werte, wie z.B. Zahlen, Zeichen oder
+ Objekte, aber auch kompliziertere Strukturen sein.
+ Jedes Objekt kann beliebig viele solcher Properties besitzen und deren
+ Namensgebung ist nicht nur auf die von der MUDlib bereitgestellten
+ Standardproperties begrenzt. Das heisst, das fuer eigene Anwendungen die
+ Menge der Properties fuer ein Objekt beliebig erweitert werden kann.
+ Damit sind auch schon die beiden Hauptmerkmale einer Property ange-
+ sprochen:
+
+ a) ein Name oder Kennung und
+ b) ein Wert, der durch den Namen repraesentiert wird.
+
+ Das reine Verwalten einer Property mit Namen und Wert ist aber nicht sehr
+ sinnvoll und so gehoeren zu jeder Property noch zwei weitere wichtige
+ Dinge. Zu jeder Property wurden jeweils zwei Operationen eingefuehrt,
+ welche den uebergebenen Wert vor der Speicherung oder Abfrage bearbeiten.
+
+ Zusammenfassend laesst sich das Konzept der Property in folgendem Schema
+ darstellen:
+
+ +-------------------------------------------+
+ | Property |
+ +-------------------------------------------+
+ | privater Datenbereich (Property Werte) |
+ +-------------------------------------------+
+ | Direktzugriff auf den Datenbereich |
+ +-------------------------------------+ |
+ | ^ Methoden v | ^ v |
+ | Setzen | Abfragen | |
+ +-------------------------------------+-----+
+ ^ |
+ | V
+ SetProp() QueryProp()
+
+ Aus dem Schema laesst sich Folgendes erkennen:
+ - beim Setzen und Abfragen wird der Wert einer Methode uebergeben, die
+ den Wert zurueckgibt oder ggf. die Aenderungen vornimmt
+ - ein direkter Zugriff auf den Wert der ist ebenfalls moeglich, sollte
+ aber nicht der Normalfall sein, da die Methoden Nebeneffekte erzeugen
+ - in bestimmten Faellen kann man den aeusserlich aendernden Zugriff
+ vollkommen unterbinden (NOSETMETHOD, PROTECT)
+ (VORSICHT bei mappings/arrays, diese werden bei QueryProp()
+ als Referenz zurueckgegeben, sind also so aenderbar)
+
+ 2. Implementation
+ =================
+
+ Die Klasse /std/thing/properties.c stellt folgende Funktionen fuer die
+ Behandlung von Properties bereit:
+
+ Normaler Zugriff: mixed SetProp(<name>, <wert>)
+ - setzt den Wert von <name> auf <wert>
+ mixed QueryProp(<name>)
+ - gibt den Wert von <name> zurueck
+
+ Direkter Zugriff: mixed Set(<name>, <wert>, <interpretation>)
+ - setzt fuer <name> einen <wert>:
+ - den normalen Wert
+ <interpretation>: F_VALUE (==0)
+ - eine Methode
+ <wert>: closure
+ <interpretation>: F_SET_METHOD, F_QUERY_METHOD
+ - ein Flag
+ <wert>: SAVE, SECURED, PROTECTED, NOSETMETHOD
+ <interpretation>: F_MODE, F_MODE_AS, F_MODE_AD
+ mixed Query(<name>, <interpretation>)
+ - fragt fuer <name> einen <wert> ab
+ - F_SET_METHOD, F_QUERY_METHOD: die Closure/0
+ - F_MODE: das (veroderte!) Flag
+ Global: void SetProperties(<mapping>)
+ - setzt das Mapping komplett, beachtet >= PROTECTED
+ mapping QueryProperties()
+ - fragte das komplette Mapping als Kopie ab
+
+ 3. Besonderheiten/Eingebaute Properties:
+
+ Existiert zu einer Property eine Funktion mit dem selben Namen und einem
+ "_set_" bzw "_query_" davor, so wird nicht auf die das Property-Mapping
+ zugegriffen, sondern es werden die Argumente an diese Funktion uebergeben
+ und der Rueckgabewert dieser Funktion zurueckgegeben.
+ Vorteil:
+ - so kann man Daten, die schnell verfuegbar sein muessen, (bei denen
+ also Effizienz gegen SetProp/QueryProp spricht) trotzdem nach aussen
+ einheitlich zugreifbar machen
+ Nachteil:
+ - nicht wirklich sauber
+ - Speichern muss man selbst vornehmen
+ - Set/Query gehen wie auch bei Methoden an _set_*/_query_* vorbei
+ - dieses Verhalten sollte der Mudlib vorbehalten bleiben, fuer eigene
+ Prueffunktionen (wird etwas gesetzt/abgefragt) bzw. Aenderungen
+ sollte man Methoden (F_SET_METHOD/F_QUERY_METHOD) benutzen
+
+ SIEHE AUCH:
+ SetProp(L), QueryProp(L), Set(L), Query(L), SetProperties(L),
+ QueryProperties(L)
+ objekte, effizienz, closures
+
+ 21. Maerz 2004 Gloinson
diff --git a/doc/concepts/regexp b/doc/concepts/regexp
new file mode 100644
index 0000000..9a9d70e
--- /dev/null
+++ b/doc/concepts/regexp
@@ -0,0 +1,121 @@
+SYNOPSIS
+ Regular Expressions
+
+
+DESCRIPTION
+ LDMud supports both the traditional regular expressions as
+ implemented by Henry Spencer ("HS" or "traditional"), and the
+ Perl-compatible regular expressions by Philip Hazel ("PCRE").
+ Both packages can be used concurrently, with the selection
+ being made through extra option flags argument to the efuns.
+ One of the two packages can be selected at compile time, by
+ commandline argument, and by driver hook to be the default
+ package.
+
+ The packages differ in the expressivity of their expressions
+ (PCRE offering more options that Henry Spencer's package),
+ though they both implement the common subset outlined below.
+
+ All regular expression efuns take an additional options
+ parameter, which is a an number composed of bitflags, and is
+ used to modify the exact behaviour of the expression
+ evaluation. In addition, certain efuns may accept additional
+ specific options.
+
+ For details, refer to the detailed manpages: hsregexp(C) for
+ the Henry Spencer package, pcre(C) for the PCRE package.
+
+
+REGULAR EXPRESSION DETAILS
+ A regular expression is a pattern that is matched against a
+ subject string from left to right. Most characters stand for
+ themselves in a pattern, and match the corresponding charac-
+ ters in the subject. As a trivial example, the pattern
+
+ The quick brown fox
+
+ matches a portion of a subject string that is identical to
+ itself. The power of regular expressions comes from the
+ ability to include alternatives and repetitions in the pat-
+ tern. These are encoded in the pattern by the use of meta-
+ characters, which do not stand for themselves but instead
+ are interpreted in some special way.
+
+ The following metacharacters are 'universal' in that both regexp
+ packages understand them in the same way:
+
+ . Match any character.
+
+ ^ Match begin of line.
+
+ $ Match end of line.
+
+ x|y Match regexp x or regexp y.
+
+ () Match enclosed regexp like a 'simple' one.
+
+ x* Match any number (0 or more) of regexp x.
+
+ x+ Match any number (1 or more) of regexp x.
+
+ [..] Match one of the characters enclosed.
+
+ [^ ..] Match none of the characters enclosed. The .. are to
+ replaced by single characters or character ranges:
+
+ [abc] matches a, b or c.
+
+ [ab0-9] matches a, b or any digit.
+
+ [^a-z] does not match any lowercase character.
+
+ \B not a word boundary
+
+ \c match character c even if it's one of the special
+ characters.
+
+ The following metacharacters or metacharacter combinations implement
+ similar functions in the two regexp packages;
+
+ \b PCRE: word boundary, also used inconjunction with
+ \w (any "word" character) and \W (any "non-word"
+ character).
+
+ \< HS: Match begin of word.
+ \> HS: Match end of word.
+
+
+OPTIONS
+ The package is selected with these option flags:
+
+ RE_PCRE
+ RE_TRADITIONAL
+
+ These flags are also used for the H_REGEXP_PACKAGE driver
+ hook.
+
+
+ Traditional regular expressions understand one option:
+
+ RE_EXCOMPATIBLE
+
+
+ PCRE understands these options:
+
+ RE_ANCHORED
+ RE_CASELESS
+ RE_DOLLAR_ENDONLY
+ RE_DOTALL
+ RE_EXTENDED
+ RE_MULTILINE
+ RE_UNGREEDY
+ RE_NOTBOL
+ RE_NOTEOL
+ RE_NOTEMPTY
+
+HISTORY
+ LDMud 3.3.596 implemented the concurrent use of both packages.
+
+SEE ALSO
+ hsregexp(C), pcre(C), regexp_package(H), regexp(E), regexplode(E),
+ regmatch(E), regreplace(E), regexp_package(E), invocation(D)
diff --git a/doc/concepts/rtfm b/doc/concepts/rtfm
new file mode 100644
index 0000000..0c566fe
--- /dev/null
+++ b/doc/concepts/rtfm
@@ -0,0 +1,33 @@
+CONCEPT
+ rtfm - read the fucking manual
+
+UPDATE
+ Mateese, 15-Jun-93, 03:15 MET
+
+SYNOPSIS
+ rtfm
+
+ OPTIONS None, you have to read the manual for an answer.
+
+DESCRIPTION
+ Used when lazy people ask stupid questions. Normaly cried
+ out in vain.
+
+FILES
+ /dev/null
+
+ENVIRONMENT
+ Any.
+
+CREDITS
+ Bert Nase, who else?
+
+SEE ALSO
+ man(H)
+
+DIAGNOSTICS
+ Is an diagnostic. Since you are reading this you are getting
+ the idea.
+
+BUGS
+ Ha!
diff --git a/doc/concepts/secure b/doc/concepts/secure
new file mode 100644
index 0000000..a294a5e
--- /dev/null
+++ b/doc/concepts/secure
@@ -0,0 +1,52 @@
+THEMA:
+ secure-Verzeichnisse
+
+FUNKTION:
+ Magier haben die Moeglichkeit in ihren Gilden oder Regions-
+ verzeichnissen /secure Verzeichnisse anzulegen, in denen Daten dann
+ vor Lesezugriffen anderer Magier geschuetzt sind. Leserechte in diesen
+ Verzeichnissen haben grundsaetzlich nur diejenigen, die dort auch
+ Schreibrechte haben. Diese Verzeichnisse sind fuer (Quest-)Raetsel oder
+ schwierige NPCs mit vielen Stufenpunkten gedacht. Diese Verzeichnisse
+ sind ausdruecklich _nicht_ dazu gedacht, dort ganze Gebiete oder Quests
+ abzulegen, da bei Problemen andere Magier nur noch sehr schwer helfen
+ koennen. Aus diesem Grund sind die Regionsmagier gehalten darauf zu
+ achten, dass diese Verzeichnisse nur mit Bedacht verwendet werden.
+ Sollte das ganze ausarten und uebertrieben werden, so wird der Schutz
+ der secure Verzeichnisse in den betroffenen Gebieten/Regionen wieder
+ aufgehoben!
+
+HINWEIS:
+ Es ist _nicht_ moeglich ganze Verzeichnisbaeume in ein secure/
+ Verzeichnis abzulegen. Anders formuliert:
+ Unterverzeichnisse von secure/ geniessen _keinen_ besonderen Schutz.
+
+BEISPIEL:
+
+ o richtiger Einsatz in einem fiktiven standard include file...
+
+ #define HOME(x) "/d/region/magiername/meingebiet/"+x
+ #define NPC(x) HOME("npc/"+x)
+ -> /d/region/magiername/meingebiet/npc/
+ #define OBJ(x) HOME("obj/"+x)
+ -> /d/region/magiername/meingebiet/obj/
+ #define ROOM(x) HOME("room/"+x)
+ -> /d/region/magiername/meingebiet/room/
+ #define SECURE(x) HOME("secure/"+x)
+ -> /d/region/magiername/meingebiet/secure/
+
+
+ o falscher (wirkungsloser) Einsatz mit einem Verzeichnisbaum:
+
+ #define HOME(x) "/d/region/magiername/meingebiet/secure/"+x
+ #define NPC(x) HOME("npc/"+x)
+ -> /d/region/magiername/meingebiet/secure/npc/
+ #define OBJ(x) HOME("obj/"+x)
+ -> /d/region/magiername/meingebiet/secure/obj/
+ #define ROOM(x) HOME("room/"+x)
+ -> /d/region/magiername/meingebiet/secure/room/
+
+
+LETZTE AeNDERUNG:
+ 04.09.2011 Zesstra
+
diff --git a/doc/concepts/simul_efun b/doc/concepts/simul_efun
new file mode 100644
index 0000000..8c4a2a0
--- /dev/null
+++ b/doc/concepts/simul_efun
@@ -0,0 +1,14 @@
+CONCEPT
+ simul_efun
+
+DESCRIPTION
+ The simul_efun object is automagically sort-of inherited by
+ every object. That functions that are defined in it can be
+ accessed just like efuns or inherited functions by every
+ object (except the master object). To get access to efuns that
+ are overloaded by the simul_efun object, you can use the
+ efun::function() to bypass the simul_efun (unless the
+ simul_efun object has defined the function as ``nomask'').
+
+SEE ALSO
+ get_simul_efun(M), inheritance(LPC), operators(LPC)
diff --git a/doc/concepts/terminals b/doc/concepts/terminals
new file mode 100644
index 0000000..3dd3171
--- /dev/null
+++ b/doc/concepts/terminals
@@ -0,0 +1,18 @@
+terminals
+ BESCHREIBUNG:
+ Ein Spieler kann sein Terminal mit "stty" in verschiedene Modi
+ schalten. Mit ANSI-Escape-Sequenzen kann man dementsprechend dann
+ diesem Spieler farbige, unterstrichene, blinkende Texte auf den
+ Bildschirm zaubern, wenn dieser es unterstuetzt.
+
+ Die derzeit unterstuetzten Terminals sind:
+ dumb: bitte keine Escapesequenzen nutzen ...
+ vt100: versteht reverse, bold, blinking
+ ansi: versteht vt100 und Farben
+
+ Die nutzbaren Farbcodes sind unter /sys/ansi.h definiert.
+
+ SIEHE AUCH:
+ P_TTY, stty
+
+ 21. Maerz 2004 Gloinson
diff --git a/doc/concepts/ticks b/doc/concepts/ticks
new file mode 100644
index 0000000..a30f18f
--- /dev/null
+++ b/doc/concepts/ticks
@@ -0,0 +1,66 @@
+* Was sind Ticks?
+Ein Tick ist eine Masseinheit, die man frueher eingefuehrt hat, um zu
+verhindern, dass ein Objekt/Magier den Driver beliebig lang beschaeftigen
+kann und niemand anders mehr zum Zuge kommt.
+
+* Sind es also Zeiteinheiten? Oder Rechenoperationen?
+Am ehesten entspricht ein Tick der Rechenoperation. Besser gesagt: jeder
+LPC-Operator, jedes LPC-Schluesselwort und jede efun, die man ruft, verringert
+die zur Verfuegung stehenden Ticks um mindestens 1: Sowas wie +, -, if(),
+else, switch(), etc.
+Hierbei stehen die Ticks, die ein Stueck Code braucht und die Zeit, die dessen
+Ausfuehrung braucht, in keinem konstanten Verhaeltnis. Ein Stueck Code
+braucht immer die gleiche definierte Menge an Ticks, aber kann dafuer
+unterschiedliche Zeiten benoetigen (z.b. wenn man erst was einswappen muss).
+Ebenso brauchen 2 Stueck Code, die die gleiche Menge Ticks brauchen, oft
+unterschiedlich lang.
+
+* Wie messe ich Ticks?
+Die Funktion get_eval_cost() liefert einem zurueck, wieviele Ticks man in
+diesem Ausfuehrungsthread man noch verbraten darf, bis die Ausfuehrung
+abgebrochen wird. Will man wissen, wieviel etwas kostet, ruft man vorher
+get_eval_cost(), merkt sich das und vergleicht mit dem Ergebnis von
+get_eval_cost() danach.
+
+* Wieviel ticks verbraucht diese und diese Operation/Funktion?
+Jede elementare Operation verbraucht erstmal einen Tick. Aber: es gibt Efuns
+und Operatoren, die sehr grosse Datenmengen manipulieren (koennen). Damit man
+mit diesen nicht megabyteweise Daten fuer einen Tick manipulieren kann, gibt
+es in LDMud die sog. dynamischen Evalcosts, bei denen je nach Umfang der
+manipulierten Daten Ticks abgezogen werden. Bsp: str1 + str2 kostet umso mehr,
+je groesser str1 und str2 sind.
+Wenn man es genau wissen will, sollte man per get_eval_cost() messen, dies ist
+natuerlich fuer die dynamischen Evalcosts schwieriger. Ggf. fragt den EM fuer
+Driver/Mudlib, ob er das weiss.
+
+* Wieviel stehen dem Driver pro Heartbeat zur Verfuegung?
+Das ist so global nicht zu beantworten. Aber: pro Ausfuehrungsthread stehen
+uns momentan 1500000 Ticks zur Verfuegung. So ein Ausfuehrungsthread startet
+z.B., wenn der Driver in einem Objekt heart_beat() ruft, auch reset(),
+clean_up() oder wenn der Driver ein Spielerkommando auswertet. Callouts sind
+ein Spezialfall, hierbei teilen sich die "gleichzeitig" (im gleichen
+Backend-Zyklus des Drivers) und unter der gleichen UID ausgefuehrten Callouts
+letztendlich die Ticks.
+
+* Was passiert, wenn es doch mal nicht reicht?
+In dem Fall gibt es den beruehmt-beruechigten 'too long evaluation'-Fehler
+(TLE) und die Ausfuehrung wird an der Stelle abgebrochen, wo die Anzahl an
+verfuegbaren Ticks auf 0 faellt.
+
+* Wie finde ich heraus, wieviel Laufzeit eine bestimmte Operation benoetigt?
+Hierbei helfen einem die verbrauchten Ticks nicht weiter. Um die Laufzeit
+eines Stuecks Code zu bestimmen, misst man vorher die aktuelle Zeit mit
+genuegend grosser Genauigkeit, fuehrt seinen Code, misst erneut die Zeit und
+bildet die Differenz. Die genaueste Moeglichkeit der Zeitmessung im Mud stellt
+die efun utime() dar, welche die Zeit in Mikrosekunden ermitteln kann.
+Beispiel:
+ int *zeit1 = utime();
+ // code ausfuehren
+ int *zeit2 = utime();
+ int usec = (zeit2[0] - zeit1[0]) * 1000000 - zeit1[1] + zeit2[1];
+
+SIEHE AUCH:
+ effizienz, memory, goodstyle
+
+04.09.2008, Zesstra
+
diff --git a/doc/concepts/tls b/doc/concepts/tls
new file mode 100644
index 0000000..25946a2
--- /dev/null
+++ b/doc/concepts/tls
@@ -0,0 +1,73 @@
+PRELIMINARY
+CONCEPT
+ tls (transport layer security)
+
+DESCRIPTION
+ TLS stands for Transport Layer Security which is the successor
+ of the well known SSL (Secure Socket Layer). Both techniques
+ provide a way to authenticate and encrypt the data send through
+ a network connection.
+ By enabling TLS during compilation of the driver you can provide
+ a secure channel into the mud to your players.
+ In difference to other solutions as "sslwrap" or "stunnel" the
+ driver integrated approach has the advantage that the mud sees
+ the real IP of the player, not the IP of the local mud host.
+
+USAGE
+ To use TLS configure your driver with --enable-tls option.
+ After starting your driver you have five new efuns
+ (tls_init_connection(), tls_deinit_connection(), tls_error(),
+ tls_query_connection_info(), tls_query_connection_state()).
+
+ You can switch on TLS by calling tls_init_connection().
+ This can happen in three ways:
+
+ 1) in telnet_neg()
+
+ Advantage of this method is that you can offer TLS on a normal
+ mud port. If you have a limited number of ports this can
+ become important. The TLS connection will be started by
+ the client with help of telnet option STARTTLS. Currently
+ there are no mudclients that support this method.
+
+ You will have to implement the telnet option STARTTLS (46) for
+ this method. The draft for this can be found here:
+ http://www.ietf.org/proceedings/99mar/I-D/draft-ietf-tn3270e-telnet-tls-01.txt
+ Call tls_init_connection() to initiate the TLS handshake.
+
+
+ 2) in master_ob->connect()
+
+ Advantage of this method is that your users can connect with
+ any program that supports TLS/SSL. Examples are telnet-ssl,
+ sslwrap or stunnel. Disadvantage is that you have to spend
+ a dedicated port for this.
+
+ You have to call tls_init_connection() as first command
+ after the player connected (normally in master_ob->connect())
+
+ 3) in an interactive object using a callback.
+
+ This method is similar to method (1), but not limited to
+ telnet: it is useful for implementing protocols thta use
+ STARTTLS like SMTP or IMAP. tls_init_connection() can be
+ called at any time by the interactive object.
+
+ You must not write to the connection after calling this
+ efun until the callback is executed (the prompt will
+ be supressed automatically during this time).
+
+ To test your code, you can use the openssl binary.
+ `openssl s_client -connect host:port' should display your certificate
+ and anything you write after the callback is executed. If you
+ encounter the error message `SSL3_GET_RECORD: wrong version number'
+ you're probably writing to the connection while you should not.
+
+BUG
+ This manpage might be not quite up to date with the implementation.
+
+HISTORY
+ Introduced in LDMud 3.3.474 and following, backported to 3.2.11.
+
+SEE ALSO
+ tls_* efuns
diff --git a/doc/concepts/uids b/doc/concepts/uids
new file mode 100644
index 0000000..724a7af
--- /dev/null
+++ b/doc/concepts/uids
@@ -0,0 +1,96 @@
+CONCEPT
+ uids (userids)
+
+DESCRIPTION
+ Every object in the mud is attributed with a user-id 'uid': a string
+ which associates the object with a certain 'user' (aka 'wizard' or
+ 'creator', though it is not limited to that). The uid can be 0, which
+ internally is the default-uid.
+
+ The uid serves a dual purpose: on the on hand it is used to gather
+ statistics about the various groups of objects (in the famous
+ 'wizlist'), on the other hand the uid can come in handy in the
+ implementation of security systems.
+
+ The uid of an object is assigned at its creation through the
+ driver hooks H_LOAD_UIDS for loaded objects, and H_CLONE_UIDS
+ for cloned objects, and can't be changed afterwards.
+
+ The uid of an object can be queried with the efun getuid() (resp.
+ creator() in compat-mode).
+
+
+ Every object also has a second string attribute, the 'effective
+ userid' or 'euid', which also may be 0. This value was intended to
+ implement a security system based on difference between theoretical
+ and effective permissions. Since the effectiveness of this system is
+ doubtful, the driver enforces such a use only as an option.
+
+ As uids, euids are assigned at an objects creation through
+ the two aformentioned driverhooks. They can be queried with
+ the efun geteuid() and changed with the efun seteuid(). Calls
+ to the latter are verified by the master lfun valid_seteuid().
+
+ Additionally objects can impose their uid onto an other objects
+ euid with the efun export_uid().
+
+
+ If the driver is run in 'strict euids' mode, euids are taken
+ more seriously than being just another attribute:
+ - all objects must have a non-0 uid.
+ - objects with a 0 euid can't load or clone other objects.
+ - the backbone uid as returned by master::get_bb_uid() must
+ not be 0.
+
+
+ Userids are assigned at the time of the creation of an object
+ by calling the driverhooks H_LOAD_UIDS and H_CLONE_UIDS:
+
+ mixed <load_uids closure> (string objectname)
+ mixed <clone_uids closure>(object blueprint, string objectname)
+
+ When an object is newly loaded, the H_LOAD_UIDS hook is
+ called with the object name as argument.
+ When an object is cloned, the H_CLONE_UIDS hook is called
+ with the blueprint object as first and the clone's designated
+ name as second argument.
+ In both cases the new object already exists, but has 0 uids.
+
+ For the result, the following possibilities exist (<num> is
+ a non-zero number, <no-string> is anything but a string):
+
+ "<uid>" -> uid = "<uid>", euid = "<uid>"
+ ({ "<uid>", "<euid>" }) -> uid = "<uid>", euid = "<euid>"
+ ({ "<uid>", <no-string> }) -> uid = "<uid>", euid = 0
+
+ If strict-euids is not active, the following results are
+ possible, too:
+
+ <num> -> uid = 'default', euid = 0
+ ({ <num>, "<euid>" }) -> uid = 'default', euid = "<euid>"
+ ({ <num>, <no-string> }) -> uid = 'default', euid = 0
+
+
+
+ Slightly different rules apply to the (e)uid of the master.
+ The masters (e)uid is determined by a call to
+ master->get_master_uid():
+
+ "<uid"> -> uid = "<uid>", euid = "<uid>"
+
+ In non-strict-euids mode, more results are possible:
+
+ 0 -> uid = 0, euid = 0
+ <num> -> uid = 'default', euid = 0
+
+ If your uids are in general based on filenames, it is wise to return a
+ value here which can not be legally generated from any filename. OSB
+ for example uses 'ze/us'.
+
+ The masters uid is determined only on startup this way, at runtime the
+ uids of a reloaded master determined as for every object by a call to
+ the appropriate driver hooks.
+
+SEE ALSO
+ native(C), get_root_uid(M), valid_seteuid(M),
+ objects(C), clone_object(E), geteuid(E), getuid(E), seteuid(E)