MG Mud User | 88f1247 | 2016-06-24 23:31:02 +0200 | [diff] [blame] | 1 | SYNOPSIS |
| 2 | Henry Spencer Regular Expressions |
| 3 | |
MG Mud User | 88f1247 | 2016-06-24 23:31:02 +0200 | [diff] [blame] | 4 | DESCRIPTION |
| 5 | This document describes the regular expressions supported by the |
| 6 | implementation by Henry Spencer (the traditional package for |
| 7 | LPMud). |
| 8 | |
MG Mud User | 88f1247 | 2016-06-24 23:31:02 +0200 | [diff] [blame] | 9 | OPTIONS |
| 10 | The following bitflag options modify the behaviour of the |
| 11 | regular expressions - both interpretation and actual matching. |
| 12 | |
| 13 | The efuns may understand additional options. |
| 14 | |
| 15 | RE_EXCOMPATIBLE |
| 16 | |
| 17 | If this bit is set, the pattern is interpreted as the UNIX ed |
| 18 | editor would do it: () match literally, and the \( \) group |
| 19 | expressions. |
| 20 | |
MG Mud User | 88f1247 | 2016-06-24 23:31:02 +0200 | [diff] [blame] | 21 | REGULAR EXPRESSION DETAILS |
| 22 | A regular expression is a pattern that is matched against a |
| 23 | subject string from left to right. Most characters stand for |
| 24 | themselves in a pattern, and match the corresponding charac- |
| 25 | ters in the subject. As a trivial example, the pattern |
| 26 | |
| 27 | The quick brown fox |
| 28 | |
| 29 | matches a portion of a subject string that is identical to |
| 30 | itself. The power of regular expressions comes from the |
| 31 | ability to include alternatives and repetitions in the pat- |
| 32 | tern. These are encoded in the pattern by the use of meta- |
| 33 | characters, which do not stand for themselves but instead |
| 34 | are interpreted in some special way. |
| 35 | |
| 36 | There are two different sets of meta-characters: those that |
| 37 | are recognized anywhere in the pattern except within square |
| 38 | brackets, and those that are recognized in square brackets. |
| 39 | Outside square brackets, the meta-characters are as follows: |
| 40 | |
| 41 | . Match any character. |
| 42 | |
| 43 | ^ Match begin of line. |
| 44 | |
| 45 | $ Match end of line. |
| 46 | |
| 47 | \< Match begin of word. |
| 48 | |
| 49 | \> Match end of word. |
| 50 | |
| 51 | \B not at edge of a word (supposed to be like the emacs |
| 52 | compatibility one in gnu egrep) |
| 53 | |
| 54 | x|y Match regexp x or regexp y. |
| 55 | |
| 56 | () Match enclosed regexp like a 'simple' one (unless |
| 57 | RE_EXCOMPATIBLE is set). |
| 58 | |
| 59 | x* Match any number (0 or more) of regexp x. |
| 60 | |
| 61 | x+ Match any number (1 or more) of regexp x. |
| 62 | |
| 63 | [..] Match one of the characters enclosed. |
| 64 | |
| 65 | [^ ..] Match none of the characters enclosed. The .. are to |
| 66 | replaced by single characters or character ranges: |
| 67 | |
| 68 | [abc] matches a, b or c. |
| 69 | |
| 70 | [ab0-9] matches a, b or any digit. |
| 71 | |
| 72 | [^a-z] does not match any lowercase character. |
| 73 | |
| 74 | \c match character c even if it's one of the special |
| 75 | characters. |
| 76 | |
MG Mud User | 88f1247 | 2016-06-24 23:31:02 +0200 | [diff] [blame] | 77 | NOTES |
| 78 | The \< and \> metacharacters from Henry Spencers package |
| 79 | are not available in PCRE, but can be emulate with \b, |
| 80 | as required, also in conjunction with \W or \w. |
| 81 | |
| 82 | In LDMud, backtracks are limited by the EVAL_COST runtime |
| 83 | limit, to avoid freezing the driver with a match |
| 84 | like regexp(({"=XX==================="}), "X(.+)+X"). |
| 85 | |
MG Mud User | 88f1247 | 2016-06-24 23:31:02 +0200 | [diff] [blame] | 86 | AUTHOR |
| 87 | Mark H. Colburn, NAPS International (mark@jhereg.mn.org) |
| 88 | Henry Spencer, University of Torronto (henry@utzoo.edu) |
| 89 | Joern Rennecke |
| 90 | Ian Phillipps |
| 91 | |
MG Mud User | 88f1247 | 2016-06-24 23:31:02 +0200 | [diff] [blame] | 92 | SEE ALSO |
| 93 | regexp(C), pcre(C) |