| SYNOPSIS |
| Henry Spencer Regular Expressions |
| |
| |
| DESCRIPTION |
| This document describes the regular expressions supported by the |
| implementation by Henry Spencer (the traditional package for |
| LPMud). |
| |
| |
| OPTIONS |
| The following bitflag options modify the behaviour of the |
| regular expressions - both interpretation and actual matching. |
| |
| The efuns may understand additional options. |
| |
| RE_EXCOMPATIBLE |
| |
| If this bit is set, the pattern is interpreted as the UNIX ed |
| editor would do it: () match literally, and the \( \) group |
| expressions. |
| |
| |
| REGULAR EXPRESSION DETAILS |
| A regular expression is a pattern that is matched against a |
| subject string from left to right. Most characters stand for |
| themselves in a pattern, and match the corresponding charac- |
| ters in the subject. As a trivial example, the pattern |
| |
| The quick brown fox |
| |
| matches a portion of a subject string that is identical to |
| itself. The power of regular expressions comes from the |
| ability to include alternatives and repetitions in the pat- |
| tern. These are encoded in the pattern by the use of meta- |
| characters, which do not stand for themselves but instead |
| are interpreted in some special way. |
| |
| There are two different sets of meta-characters: those that |
| are recognized anywhere in the pattern except within square |
| brackets, and those that are recognized in square brackets. |
| Outside square brackets, the meta-characters are as follows: |
| |
| . Match any character. |
| |
| ^ Match begin of line. |
| |
| $ Match end of line. |
| |
| \< Match begin of word. |
| |
| \> Match end of word. |
| |
| \B not at edge of a word (supposed to be like the emacs |
| compatibility one in gnu egrep) |
| |
| x|y Match regexp x or regexp y. |
| |
| () Match enclosed regexp like a 'simple' one (unless |
| RE_EXCOMPATIBLE is set). |
| |
| x* Match any number (0 or more) of regexp x. |
| |
| x+ Match any number (1 or more) of regexp x. |
| |
| [..] Match one of the characters enclosed. |
| |
| [^ ..] Match none of the characters enclosed. The .. are to |
| replaced by single characters or character ranges: |
| |
| [abc] matches a, b or c. |
| |
| [ab0-9] matches a, b or any digit. |
| |
| [^a-z] does not match any lowercase character. |
| |
| \c match character c even if it's one of the special |
| characters. |
| |
| |
| NOTES |
| The \< and \> metacharacters from Henry Spencers package |
| are not available in PCRE, but can be emulate with \b, |
| as required, also in conjunction with \W or \w. |
| |
| In LDMud, backtracks are limited by the EVAL_COST runtime |
| limit, to avoid freezing the driver with a match |
| like regexp(({"=XX==================="}), "X(.+)+X"). |
| |
| |
| AUTHOR |
| Mark H. Colburn, NAPS International (mark@jhereg.mn.org) |
| Henry Spencer, University of Torronto (henry@utzoo.edu) |
| Joern Rennecke |
| Ian Phillipps |
| |
| |
| SEE ALSO |
| regexp(C), pcre(C) |