MG Mud User | 88f1247 | 2016-06-24 23:31:02 +0200 | [diff] [blame^] | 1 | SYNOPSIS |
| 2 | Henry Spencer Regular Expressions |
| 3 | |
| 4 | |
| 5 | DESCRIPTION |
| 6 | This document describes the regular expressions supported by the |
| 7 | implementation by Henry Spencer (the traditional package for |
| 8 | LPMud). |
| 9 | |
| 10 | |
| 11 | OPTIONS |
| 12 | The following bitflag options modify the behaviour of the |
| 13 | regular expressions - both interpretation and actual matching. |
| 14 | |
| 15 | The efuns may understand additional options. |
| 16 | |
| 17 | RE_EXCOMPATIBLE |
| 18 | |
| 19 | If this bit is set, the pattern is interpreted as the UNIX ed |
| 20 | editor would do it: () match literally, and the \( \) group |
| 21 | expressions. |
| 22 | |
| 23 | |
| 24 | REGULAR EXPRESSION DETAILS |
| 25 | A regular expression is a pattern that is matched against a |
| 26 | subject string from left to right. Most characters stand for |
| 27 | themselves in a pattern, and match the corresponding charac- |
| 28 | ters in the subject. As a trivial example, the pattern |
| 29 | |
| 30 | The quick brown fox |
| 31 | |
| 32 | matches a portion of a subject string that is identical to |
| 33 | itself. The power of regular expressions comes from the |
| 34 | ability to include alternatives and repetitions in the pat- |
| 35 | tern. These are encoded in the pattern by the use of meta- |
| 36 | characters, which do not stand for themselves but instead |
| 37 | are interpreted in some special way. |
| 38 | |
| 39 | There are two different sets of meta-characters: those that |
| 40 | are recognized anywhere in the pattern except within square |
| 41 | brackets, and those that are recognized in square brackets. |
| 42 | Outside square brackets, the meta-characters are as follows: |
| 43 | |
| 44 | . Match any character. |
| 45 | |
| 46 | ^ Match begin of line. |
| 47 | |
| 48 | $ Match end of line. |
| 49 | |
| 50 | \< Match begin of word. |
| 51 | |
| 52 | \> Match end of word. |
| 53 | |
| 54 | \B not at edge of a word (supposed to be like the emacs |
| 55 | compatibility one in gnu egrep) |
| 56 | |
| 57 | x|y Match regexp x or regexp y. |
| 58 | |
| 59 | () Match enclosed regexp like a 'simple' one (unless |
| 60 | RE_EXCOMPATIBLE is set). |
| 61 | |
| 62 | x* Match any number (0 or more) of regexp x. |
| 63 | |
| 64 | x+ Match any number (1 or more) of regexp x. |
| 65 | |
| 66 | [..] Match one of the characters enclosed. |
| 67 | |
| 68 | [^ ..] Match none of the characters enclosed. The .. are to |
| 69 | replaced by single characters or character ranges: |
| 70 | |
| 71 | [abc] matches a, b or c. |
| 72 | |
| 73 | [ab0-9] matches a, b or any digit. |
| 74 | |
| 75 | [^a-z] does not match any lowercase character. |
| 76 | |
| 77 | \c match character c even if it's one of the special |
| 78 | characters. |
| 79 | |
| 80 | |
| 81 | NOTES |
| 82 | The \< and \> metacharacters from Henry Spencers package |
| 83 | are not available in PCRE, but can be emulate with \b, |
| 84 | as required, also in conjunction with \W or \w. |
| 85 | |
| 86 | In LDMud, backtracks are limited by the EVAL_COST runtime |
| 87 | limit, to avoid freezing the driver with a match |
| 88 | like regexp(({"=XX==================="}), "X(.+)+X"). |
| 89 | |
| 90 | |
| 91 | AUTHOR |
| 92 | Mark H. Colburn, NAPS International (mark@jhereg.mn.org) |
| 93 | Henry Spencer, University of Torronto (henry@utzoo.edu) |
| 94 | Joern Rennecke |
| 95 | Ian Phillipps |
| 96 | |
| 97 | |
| 98 | SEE ALSO |
| 99 | regexp(C), pcre(C) |