Update Doku aus Driversourcen Change-Id: I455f0813b970151089b3dc1b8d9407eea323cdd1

commit: 7ea4a03dfec8fd917fa08d5e21c0519a02a8e7a2 [log] [tgz]
author: Zesstra <zesstra@zesstra.de> Tue Nov 26 20:11:40 2019 +0100
committer: MG Mud User <mud@mg.mud.de> Tue Nov 26 20:11:40 2019 +0100
tree: 0f2606527a1910c08f1d19a8f8f1c192943cd418
parent: c70bf58803d2d854a1bd7e255031a562db0d94e6 [diff] [blame]
diff --git a/doc/concepts/pcre b/doc/concepts/pcre
index f863ffd..4c4cc02 100644
--- a/doc/concepts/pcre
+++ b/doc/concepts/pcre

@@ -1,58 +1,54 @@
 SYNOPSIS
         PCRE - Perl-compatible regular expressions
 
-
 DESCRIPTION
-        This document describes the regular expressions supported by the
-        PCRE package. When the package is compiled into the driver, the
-        macro __PCRE__ is defined.
+        This document describes the regular expressions supported by the PCRE
+        package. When the package is compiled into the driver, the macro
+        __PCRE__ is defined.
 
-        Most of this manpage is lifted directly from the original PCRE
-        manpage (dated January 2003).
+        Most of this manpage is lifted directly from the original PCRE manpage
+        (dated January 2003).
 
-        The PCRE library is a set of functions that implement  regular
-        expression  pattern  matching using the same syntax and semantics
-        as Perl  5,  with  just  a  few  differences  (see below).  The
-        current  implementation  corresponds  to  Perl 5.005, with some
-        additional features  from  later  versions.  This  includes  some
-        experimental,  incomplete  support for UTF-8 encoded strings.
-        Details of exactly what is  and  what is not supported are given
-        below.
-
+        The PCRE library is a set of functions that implement regular
+        expression pattern matching using the same syntax and semantics as
+        Perl 5, with just a few differences (see below). The current
+        implementation corresponds to Perl 5.005, with some additional features
+        from later versions. This includes some experimental, incomplete
+        support for UTF-8 encoded strings. Details of exactly what is and what
+        is not supported are given below.
 
 PCRE REGULAR EXPRESSION DETAILS
+        The syntax and semantics of the regular expressions supported by PCRE
+        are described below. Regular expressions are also described in the Perl
+        documentation and in a number of other books, some of which have
+        copious examples. Jeffrey Friedl's "Mastering Regular Expressions",
+        published by O'Reilly, covers them in great detail. The description
+        here is intended as reference documentation.
 
-       The  syntax  and semantics of the regular expressions supported by PCRE
-       are described below. Regular expressions are also described in the Perl
-       documentation  and in a number of other books, some of which have copi-
-       ous examples. Jeffrey Friedl's "Mastering  Regular  Expressions",  pub-
-       lished  by  O'Reilly, covers them in great detail. The description here
-       is intended as reference documentation.
+        The basic operation of PCRE is on strings of bytes. However, there is
+        also support for UTF-8 character strings. To use this support you must
+        build PCRE to include UTF-8 support, and then call pcre_compile() with
+        the PCRE_UTF8 option. How this affects the pattern matching is
+        mentioned in several places below. There is also a summary of UTF-8
+        features in the section on UTF-8 support in the main pcre page.
 
-       The basic operation of PCRE is on strings of bytes. However,  there  is
-       also  support for UTF-8 character strings. To use this support you must
-       build PCRE to include UTF-8 support, and then call pcre_compile()  with
-       the  PCRE_UTF8  option.  How  this affects the pattern matching is men-
-       tioned in several places below. There is also a summary of  UTF-8  fea-
-       tures in the section on UTF-8 support in the main pcre page.
-
-       A  regular  expression  is  a pattern that is matched against a subject
-       string from left to right. Most characters stand for  themselves  in  a
-       pattern,  and  match  the corresponding characters in the subject. As a
-       trivial example, the pattern
+        A regular expression is a pattern that is matched against a subject
+        string from left to right. Most characters stand for themselves in a
+        pattern, and match the corresponding characters in the subject. As a
+        trivial example, the pattern
 
          The quick brown fox
 
-       matches a portion of a subject string that is identical to itself.  The
-       power of regular expressions comes from the ability to include alterna-
-       tives and repetitions in the pattern. These are encoded in the  pattern
-       by  the  use  of meta-characters, which do not stand for themselves but
-       instead are interpreted in some special way.
+        matches a portion of a subject string that is identical to itself. The
+        power of regular expressions comes from the ability to include
+        alternatives and repetitions in the pattern. These are encoded in the
+        pattern by the use of meta-characters, which do not stand for
+        themselves but instead are interpreted in some special way.
 
-       There are two different sets of meta-characters: those that are  recog-
-       nized  anywhere in the pattern except within square brackets, and those
-       that are recognized in square brackets. Outside  square  brackets,  the
-       meta-characters are as follows:
+        There are two different sets of meta-characters: those that are
+        recognized anywhere in the pattern except within square brackets, and
+        those that are recognized in square brackets. Outside square brackets,
+        the meta-characters are as follows:
 
          \      general escape character with several uses
          ^      assert start of string (or line, in multiline mode)
@@ -70,8 +66,8 @@
                 also "possessive quantifier"
          {      start min/max quantifier
 
-       Part  of  a  pattern  that is in square brackets is called a "character
-       class". In a character class the only meta-characters are:
+        Part of a pattern that is in square brackets is called a "character
+        class". In a character class the only meta-characters are:
 
          \      general escape character
          ^      negate the class, but only if the first character
@@ -80,34 +76,32 @@
                   syntax)
          ]      terminates the character class
 
-       The following sections describe the use of each of the meta-characters.
-
+        The following sections describe the use of each of the meta-characters.
 
 BACKSLASH
+        The backslash character has several uses. Firstly, if it is followed by
+        a non-alphameric character, it takes away any special meaning that
+        character may have. This use of backslash as an escape character
+        applies both inside and outside character classes.
 
-       The backslash character has several uses. Firstly, if it is followed by
-       a non-alphameric character, it takes  away  any  special  meaning  that
-       character  may  have.  This  use  of  backslash  as an escape character
-       applies both inside and outside character classes.
+        For example, if you want to match a * character, you write \* in the
+        pattern. This escaping action applies whether or not the following
+        character would otherwise be interpreted as a meta-character, so it is
+        always safe to precede a non-alphameric with backslash to specify that
+        it stands for itself. In particular, if you want to match a backslash,
+        you write \\.
 
-       For example, if you want to match a * character, you write  \*  in  the
-       pattern.   This  escaping  action  applies whether or not the following
-       character would otherwise be interpreted as a meta-character, so it  is
-       always  safe to precede a non-alphameric with backslash to specify that
-       it stands for itself. In particular, if you want to match a  backslash,
-       you write \\.
+        If a pattern is compiled with the PCRE_EXTENDED option, whitespace in
+        the pattern (other than in a character class) and characters between a
+        # outside a character class and the next newline character are ignored.
+        An escaping backslash can be used to include a whitespace or #
+        character as part of the pattern.
 
-       If  a  pattern is compiled with the PCRE_EXTENDED option, whitespace in
-       the pattern (other than in a character class) and characters between  a
-       # outside a character class and the next newline character are ignored.
-       An escaping backslash can be used to include a whitespace or #  charac-
-       ter as part of the pattern.
-
-       If  you  want  to remove the special meaning from a sequence of charac-
-       ters, you can do so by putting them between \Q and \E. This is  differ-
-       ent  from  Perl  in  that  $  and  @ are handled as literals in \Q...\E
-       sequences in PCRE, whereas in Perl, $ and @ cause  variable  interpola-
-       tion. Note the following examples:
+        If you want to remove the special meaning from a sequence of
+        characters, you can do so by putting them between \Q and \E. This is
+        different from Perl in that $ and @ are handled as literals in \Q...\E
+        sequences in PCRE, whereas in Perl, $ and @ cause variable
+        interpolation. Note the following examples:
 
          Pattern            PCRE matches   Perl matches
 
@@ -116,15 +110,15 @@
          \Qabc\$xyz\E       abc\$xyz       abc\$xyz
          \Qabc\E\$\Qxyz\E   abc$xyz        abc$xyz
 
-       The  \Q...\E  sequence  is recognized both inside and outside character
-       classes.
+        The \Q...\E sequence is recognized both inside and outside character
+        classes.
 
-       A second use of backslash provides a way of encoding non-printing char-
-       acters  in patterns in a visible manner. There is no restriction on the
-       appearance of non-printing characters, apart from the binary zero  that
-       terminates  a  pattern,  but  when  a pattern is being prepared by text
-       editing, it is usually easier  to  use  one  of  the  following  escape
-       sequences than the binary character it represents:
+        A second use of backslash provides a way of encoding non-printing
+        characters in patterns in a visible manner. There is no restriction on
+        the appearance of non-printing characters, apart from the binary zero
+        that terminates a pattern, but when a pattern is being prepared by text
+        editing, it is usually easier to use one of the following escape
+        sequences than the binary character it represents:
 
          \a        alarm, that is, the BEL character (hex 07)
          \cx       "control-x", where x is any character
@@ -137,45 +131,45 @@
          \xhh      character with hex code hh
          \x{hhh..} character with hex code hhh... (UTF-8 mode only)
 
-       The  precise  effect of \cx is as follows: if x is a lower case letter,
-       it is converted to upper case. Then bit 6 of the character (hex 40)  is
-       inverted.   Thus  \cz becomes hex 1A, but \c{ becomes hex 3B, while \c;
-       becomes hex 7B.
+        The precise effect of \cx is as follows: if x is a lower case letter,
+        it is converted to upper case. Then bit 6 of the character (hex 40) is
+        inverted. Thus \cz becomes hex 1A, but \c{ becomes hex 3B, while \c;
+        becomes hex 7B.
 
-       After \x, from zero to two hexadecimal digits are read (letters can  be
-       in  upper or lower case). In UTF-8 mode, any number of hexadecimal dig-
-       its may appear between \x{ and }, but the value of the  character  code
-       must  be  less  than  2**31  (that is, the maximum hexadecimal value is
-       7FFFFFFF). If characters other than hexadecimal digits  appear  between
-       \x{  and }, or if there is no terminating }, this form of escape is not
-       recognized. Instead, the initial \x will be interpreted as a basic hex-
-       adecimal escape, with no following digits, giving a byte whose value is
-       zero.
+        After \x, from zero to two hexadecimal digits are read (letters can be
+        in upper or lower case). In UTF-8 mode, any number of hexadecimal
+        dig-its may appear between \x{ and }, but the value of the character
+        code must be less than 2**31 (that is, the maximum hexadecimal value is
+        7FFFFFFF). If characters other than hexadecimal digits appear between
+        \x{ and }, or if there is no terminating }, this form of escape is not
+        recognized. Instead, the initial \x will be interpreted as a basic
+        hexadecimal escape, with no following digits, giving a byte whose value
+        is zero.
 
-       Characters whose value is less than 256 can be defined by either of the
-       two  syntaxes for \x when PCRE is in UTF-8 mode. There is no difference
-       in the way they are handled. For example, \xdc is exactly the  same  as
-       \x{dc}.
+        Characters whose value is less than 256 can be defined by either of the
+        two syntaxes for \x when PCRE is in UTF-8 mode. There is no difference
+        in the way they are handled. For example, \xdc is exactly the same as
+        \x{dc}.
 
-       After  \0  up  to  two further octal digits are read. In both cases, if
-       there are fewer than two digits, just those that are present are  used.
-       Thus  the sequence \0\x\07 specifies two binary zeros followed by a BEL
-       character (code value 7). Make sure you supply  two  digits  after  the
-       initial zero if the character that follows is itself an octal digit.
+        After \0 up to two further octal digits are read. In both cases, if
+        there are fewer than two digits, just those that are present are used.
+        Thus the sequence \0\x\07 specifies two binary zeros followed by a BEL
+        character (code value 7). Make sure you supply two digits after the
+        initial zero if the character that follows is itself an octal digit.
 
-       The handling of a backslash followed by a digit other than 0 is compli-
-       cated.  Outside a character class, PCRE reads it and any following dig-
-       its  as  a  decimal  number. If the number is less than 10, or if there
-       have been at least that many previous capturing left parentheses in the
-       expression,  the  entire  sequence  is  taken  as  a  back reference. A
-       description of how this works is given later, following the  discussion
-       of parenthesized subpatterns.
+        The handling of a backslash followed by a digit other than 0 is
+        complicated. Outside a character class, PCRE reads it and any following
+        digits as a decimal number. If the number is less than 10, or if there
+        have been at least that many previous capturing left parentheses in the
+        expression, the entire sequence is taken as a back reference. A
+        description of how this works is given later, following the discussion
+        of parenthesized subpatterns.
 
-       Inside  a  character  class, or if the decimal number is greater than 9
-       and there have not been that many capturing subpatterns, PCRE  re-reads
-       up  to three octal digits following the backslash, and generates a sin-
-       gle byte from the least significant 8 bits of the value. Any subsequent
-       digits stand for themselves.  For example:
+        Inside a character class, or if the decimal number is greater than 9
+        and there have not been that many capturing subpatterns, PCRE re-reads
+        up to three octal digits following the backslash, and generates a
+        single byte from the least significant 8 bits of the value. Any
+        subsequent digits stand for themselves. For example:
 
          \040   is another way of writing a space
          \40    is the same, provided there are fewer than 40
@@ -192,16 +186,16 @@
          \81    is either a back reference, or a binary zero
                    followed by the two characters "8" and "1"
 
-       Note  that  octal  values of 100 or greater must not be introduced by a
-       leading zero, because no more than three octal digits are ever read.
+        Note that octal values of 100 or greater must not be introduced by a
+        leading zero, because no more than three octal digits are ever read.
 
-       All the sequences that define a single byte value  or  a  single  UTF-8
-       character (in UTF-8 mode) can be used both inside and outside character
-       classes. In addition, inside a character  class,  the  sequence  \b  is
-       interpreted  as  the  backspace character (hex 08). Outside a character
-       class it has a different meaning (see below).
+        All the sequences that define a single byte value or a single UTF-8
+        character (in UTF-8 mode) can be used both inside and outside character
+        classes. In addition, inside a character class, the sequence \b is
+        interpreted as the backspace character (hex 08). Outside a character
+        class it has a different meaning (see below).
 
-       The third use of backslash is for specifying generic character types:
+        The third use of backslash is for specifying generic character types:
 
          \d     any decimal digit
          \D     any character that is not a decimal digit
@@ -210,35 +204,35 @@
          \w     any "word" character
          \W     any "non-word" character
 
-       Each pair of escape sequences partitions the complete set of characters
-       into  two disjoint sets. Any given character matches one, and only one,
-       of each pair.
+        Each pair of escape sequences partitions the complete set of characters
+        into two disjoint sets. Any given character matches one, and only one,
+        of each pair.
 
-       In UTF-8 mode, characters with values greater than 255 never match  \d,
-       \s, or \w, and always match \D, \S, and \W.
+        In UTF-8 mode, characters with values greater than 255 never match \d,
+        \s, or \w, and always match \D, \S, and \W.
 
-       For  compatibility  with Perl, \s does not match the VT character (code
-       11).  This makes it different from the the POSIX "space" class. The  \s
-       characters are HT (9), LF (10), FF (12), CR (13), and space (32).
+        For compatibility with Perl, \s does not match the VT character (code
+        11). This makes it different from the the POSIX "space" class. The \s
+        characters are HT (9), LF (10), FF (12), CR (13), and space (32).
 
-       A  "word" character is any letter or digit or the underscore character,
-       that is, any character which can be part of a Perl "word". The  defini-
-       tion  of  letters  and digits is controlled by PCRE's character tables,
-       and may vary if locale- specific matching is taking place (see  "Locale
-       support"  in  the  pcreapi  page).  For  example,  in the "fr" (French)
-       locale, some character codes greater than 128  are  used  for  accented
-       letters, and these are matched by \w.
+        A "word" character is any letter or digit or the underscore character,
+        that is, any character which can be part of a Perl "word". The
+        definition of letters and digits is controlled by PCRE's character
+        tables, and may vary if locale-specific matching is taking place (see
+        "Locale support" in the pcreapi page). For example, in the "fr"
+        (French) locale, some character codes greater than 128 are used for
+        accented letters, and these are matched by \w.
 
-       These character type sequences can appear both inside and outside char-
-       acter classes. They each match one character of the  appropriate  type.
-       If  the current matching point is at the end of the subject string, all
-       of them fail, since there is no character to match.
+        These character type sequences can appear both inside and outside
+        character classes. They each match one character of the appropriate
+        type. If the current matching point is at the end of the subject
+        string, all of them fail, since there is no character to match.
 
-       The fourth use of backslash is for certain simple assertions. An asser-
-       tion  specifies a condition that has to be met at a particular point in
-       a match, without consuming any characters from the subject string.  The
-       use  of subpatterns for more complicated assertions is described below.
-       The backslashed assertions are
+        The fourth use of backslash is for certain simple assertions. An
+        assertion specifies a condition that has to be met at a particular
+        point in a match, without consuming any characters from the subject
+        string. The use of subpatterns for more complicated assertions is
+        described below. The backslashed assertions are:
 
          \b     matches at a word boundary
          \B     matches when not at a word boundary
@@ -247,205 +241,196 @@
          \z     matches at end of subject
          \G     matches at first matching position in subject
 
-       These assertions may not appear in character classes (but note that  \b
-       has a different meaning, namely the backspace character, inside a char-
-       acter class).
+        These assertions may not appear in character classes (but note that \b
+        has a different meaning, namely the backspace character, inside a
+        character class).
 
-       A word boundary is a position in the subject string where  the  current
-       character  and  the previous character do not both match \w or \W (i.e.
-       one matches \w and the other matches \W), or the start or  end  of  the
-       string if the first or last character matches \w, respectively.
+        A word boundary is a position in the subject string where the current
+        character and the previous character do not both match \w or \W (i.e.
+        one matches \w and the other matches \W), or the start or end of the
+        string if the first or last character matches \w, respectively.
 
-       The  \A,  \Z,  and \z assertions differ from the traditional circumflex
-       and dollar (described below) in that they only ever match at  the  very
-       start  and  end  of the subject string, whatever options are set. Thus,
-       they are independent of multiline mode.
+        The \A, \Z, and \z assertions differ from the traditional circumflex
+        and dollar (described below) in that they only ever match at the very
+        start and end of the subject string, whatever options are set. Thus,
+        they are independent of multiline mode.
 
-       They are not affected by the PCRE_NOTBOL or PCRE_NOTEOL options. If the
-       startoffset argument of pcre_exec() is non-zero, indicating that match-
-       ing is to start at a point other than the beginning of the subject,  \A
-       can  never  match.  The difference between \Z and \z is that \Z matches
-       before a newline that is the last character of the string as well as at
-       the end of the string, whereas \z matches only at the end.
+        They are not affected by the PCRE_NOTBOL or PCRE_NOTEOL options. If the
+        startoffset argument of pcre_exec() is non-zero, indicating that
+        matching is to start at a point other than the beginning of the
+        subject, \A can never match. The difference between \Z and \z is that
+        \Z matches before a newline that is the last character of the string as
+        well as at the end of the string, whereas \z matches only at the end.
 
-       The  \G assertion is true only when the current matching position is at
-       the start point of the match, as specified by the startoffset  argument
-       of  pcre_exec().  It  differs  from \A when the value of startoffset is
-       non-zero. By calling pcre_exec() multiple times with appropriate  argu-
-       ments, you can mimic Perl's /g option, and it is in this kind of imple-
-       mentation where \G can be useful.
+        The \G assertion is true only when the current matching position is at
+        the start point of the match, as specified by the startoffset argument
+        of pcre_exec(). It differs from \A when the value of startoffset is
+        non-zero. By calling pcre_exec() multiple times with appropriate
+        arguments, you can mimic Perl's /g option, and it is in this kind of
+        implementation where \G can be useful.
 
-       Note, however, that PCRE's interpretation of \G, as the  start  of  the
-       current match, is subtly different from Perl's, which defines it as the
-       end of the previous match. In Perl, these can  be  different  when  the
-       previously  matched  string was empty. Because PCRE does just one match
-       at a time, it cannot reproduce this behaviour.
+        Note, however, that PCRE's interpretation of \G, as the start of the
+        current match, is subtly different from Perl's, which defines it as the
+        end of the previous match. In Perl, these can be different when the
+        previously matched string was empty. Because PCRE does just one match
+        at a time, it cannot reproduce this behaviour.
 
-       If all the alternatives of a pattern begin with \G, the  expression  is
-       anchored to the starting match position, and the "anchored" flag is set
-       in the compiled regular expression.
-
+        If all the alternatives of a pattern begin with \G, the expression is
+        anchored to the starting match position, and the "anchored" flag is set
+        in the compiled regular expression.
 
 CIRCUMFLEX AND DOLLAR
+        Outside a character class, in the default matching mode, the circumflex
+        character is an assertion which is true only if the current matching
+        point is at the start of the subject string. If the startoffset
+        argument of pcre_exec() is non-zero, circumflex can never match if the
+        PCRE_MULTILINE option is unset. Inside a character class, circumflex
+        has an entirely different meaning (see below).
 
-       Outside a character class, in the default matching mode, the circumflex
-       character  is  an  assertion which is true only if the current matching
-       point is at the start of the subject string. If the  startoffset  argu-
-       ment  of  pcre_exec()  is  non-zero,  circumflex can never match if the
-       PCRE_MULTILINE option is unset. Inside a  character  class,  circumflex
-       has an entirely different meaning (see below).
+        Circumflex need not be the first character of the pattern if a number
+        of alternatives are involved, but it should be the first thing in each
+        alternative in which it appears if the pattern is ever to match that
+        branch. If all possible alternatives start with a circumflex, that is,
+        if the pattern is constrained to match only at the start of the
+        subject, it is said to be an "anchored" pattern. (There are also other
+        constructs that can cause a pattern to be anchored.)
 
-       Circumflex  need  not be the first character of the pattern if a number
-       of alternatives are involved, but it should be the first thing in  each
-       alternative  in  which  it appears if the pattern is ever to match that
-       branch. If all possible alternatives start with a circumflex, that  is,
-       if  the  pattern  is constrained to match only at the start of the sub-
-       ject, it is said to be an "anchored" pattern.  (There  are  also  other
-       constructs that can cause a pattern to be anchored.)
+        A dollar character is an assertion which is true only if the current
+        matching point is at the end of the subject string, or immediately
+        before a newline character that is the last character in the string (by
+        default). Dollar need not be the last character of the pattern if a
+        number of alternatives are involved, but it should be the last item in
+        any branch in which it appears. Dollar has no special meaning in a
+        character class.
 
-       A  dollar  character  is an assertion which is true only if the current
-       matching point is at the end of  the  subject  string,  or  immediately
-       before a newline character that is the last character in the string (by
-       default). Dollar need not be the last character of  the  pattern  if  a
-       number  of alternatives are involved, but it should be the last item in
-       any branch in which it appears.  Dollar has no  special  meaning  in  a
-       character class.
+        The meaning of dollar can be changed so that it matches only at the
+        very end of the string, by setting the PCRE_DOLLAR_ENDONLY option at
+        compile time. This does not affect the \Z assertion.
 
-       The  meaning  of  dollar  can be changed so that it matches only at the
-       very end of the string, by setting the  PCRE_DOLLAR_ENDONLY  option  at
-       compile time. This does not affect the \Z assertion.
+        The meanings of the circumflex and dollar characters are changed if the
+        PCRE_MULTILINE option is set. When this is the case, they match
+        immediately after and immediately before an internal newline character,
+        respectively, in addition to matching at the start and end of the
+        subject string. For example, the pattern /^abc$/ matches the subject
+        string "def\nabc" in multiline mode, but not otherwise. Consequently,
+        patterns that are anchored in single line mode because all branches
+        start with ^ are not anchored in multiline mode, and a match for
+        circumflex is possible when the startoffset argument of pcre_exec() is
+        non-zero. The PCRE_DOLLAR_ENDONLY option is ignored if PCRE_MULTILINE
+        is set.
 
-       The meanings of the circumflex and dollar characters are changed if the
-       PCRE_MULTILINE option is set. When this is the case, they match immedi-
-       ately  after  and  immediately  before  an  internal newline character,
-       respectively, in addition to matching at the start and end of the  sub-
-       ject  string.  For  example,  the  pattern  /^abc$/ matches the subject
-       string "def\nabc" in multiline mode, but not  otherwise.  Consequently,
-       patterns  that  are  anchored  in single line mode because all branches
-       start with ^ are not anchored in multiline mode, and a match  for  cir-
-       cumflex  is  possible  when  the startoffset argument of pcre_exec() is
-       non-zero. The PCRE_DOLLAR_ENDONLY option is ignored  if  PCRE_MULTILINE
-       is set.
-
-       Note  that  the sequences \A, \Z, and \z can be used to match the start
-       and end of the subject in both modes, and if all branches of a  pattern
-       start  with  \A it is always anchored, whether PCRE_MULTILINE is set or
-       not.
-
+        Note that the sequences \A, \Z, and \z can be used to match the start
+        and end of the subject in both modes, and if all branches of a pattern
+        start with \A it is always anchored, whether PCRE_MULTILINE is set or
+        not.
 
 FULL STOP (PERIOD, DOT)
-
-       Outside a character class, a dot in the pattern matches any one charac-
-       ter  in  the  subject,  including a non-printing character, but not (by
-       default) newline.  In UTF-8 mode, a dot matches  any  UTF-8  character,
-       which  might  be  more than one byte long, except (by default) for new-
-       line. If the PCRE_DOTALL option is set, dots match  newlines  as  well.
-       The  handling of dot is entirely independent of the handling of circum-
-       flex and dollar, the only relationship being  that  they  both  involve
-       newline characters. Dot has no special meaning in a character class.
-
+        Outside a character class, a dot in the pattern matches any one
+        character in the subject, including a non-printing character, but not
+        (by default) newline. In UTF-8 mode, a dot matches any UTF-8 character,
+        which might be more than one byte long, except (by default) for
+        newline. If the PCRE_DOTALL option is set, dots match newlines as well.
+        The handling of dot is entirely independent of the handling of
+        circumflex and dollar, the only relationship being that they both
+        involve newline characters. Dot has no special meaning in a character
+        class.
 
 MATCHING A SINGLE BYTE
+        Outside a character class, the escape sequence \C matches any one byte,
+        both in and out of UTF-8 mode. Unlike a dot, it always matches a
+        newline. The feature is provided in Perl in order to match individual
+        bytes in UTF-8 mode. Because it breaks up UTF-8 characters into
+        individual bytes, what remains in the string may be a malformed UTF-8
+        string. For this reason it is best avoided.
 
-       Outside a character class, the escape sequence \C matches any one byte,
-       both in and out of UTF-8 mode. Unlike a dot, it always matches  a  new-
-       line.  The  feature  is  provided  in Perl in order to match individual
-       bytes in UTF-8 mode.  Because it breaks up UTF-8 characters into  indi-
-       vidual  bytes,  what  remains  in  the  string may be a malformed UTF-8
-       string. For this reason it is best avoided.
-
-       PCRE does not allow \C to appear in lookbehind assertions (see  below),
-       because in UTF-8 mode it makes it impossible to calculate the length of
-       the lookbehind.
-
+        PCRE does not allow \C to appear in lookbehind assertions (see below),
+        because in UTF-8 mode it makes it impossible to calculate the length of
+        the lookbehind.
 
 SQUARE BRACKETS
+        An opening square bracket introduces a character class, terminated by a
+        closing square bracket. A closing square bracket on its own is not
+        special. If a closing square bracket is required as a member of the
+        class, it should be the first data character in the class (after an
+        initial circumflex, if present) or escaped with a backslash.
 
-       An opening square bracket introduces a character class, terminated by a
-       closing square bracket. A closing square bracket on its own is not spe-
-       cial. If a closing square bracket is required as a member of the class,
-       it  should  be  the first data character in the class (after an initial
-       circumflex, if present) or escaped with a backslash.
+        A character class matches a single character in the subject. In UTF-8
+        mode, the character may occupy more than one byte. A matched character
+        must be in the set of characters defined by the class, unless the first
+        character in the class definition is a circumflex, in which case the
+        subject character must not be in the set defined by the class. If a
+        circumflex is actually required as a member of the class, ensure it is
+        not the first character, or escape it with a backslash.
 
-       A character class matches a single character in the subject.  In  UTF-8
-       mode,  the character may occupy more than one byte. A matched character
-       must be in the set of characters defined by the class, unless the first
-       character  in  the  class definition is a circumflex, in which case the
-       subject character must not be in the set defined by  the  class.  If  a
-       circumflex  is actually required as a member of the class, ensure it is
-       not the first character, or escape it with a backslash.
+        For example, the character class [aeiou] matches any lower case vowel,
+        while [^aeiou] matches any character that is not a lower case vowel.
+        Note that a circumflex is just a convenient notation for specifying the
+        characters which are in the class by enumerating those that are not. It
+        is not an assertion: it still consumes a character from the subject
+        string, and fails if the current pointer is at the end of the string.
 
-       For example, the character class [aeiou] matches any lower case  vowel,
-       while  [^aeiou]  matches  any character that is not a lower case vowel.
-       Note that a circumflex is just a convenient notation for specifying the
-       characters which are in the class by enumerating those that are not. It
-       is not an assertion: it still consumes a  character  from  the  subject
-       string, and fails if the current pointer is at the end of the string.
+        In UTF-8 mode, characters with values greater than 255 can be included
+        in a class as a literal string of bytes, or by using the \x{ escaping
+        mechanism.
 
-       In  UTF-8 mode, characters with values greater than 255 can be included
-       in a class as a literal string of bytes, or by using the  \x{  escaping
-       mechanism.
+        When caseless matching is set, any letters in a class represent both
+        their upper case and lower case versions, so for example, a caseless
+        [aeiou] matches "A" as well as "a", and a caseless [^aeiou] does not
+        match "A", whereas a caseful version would. PCRE does not support the
+        concept of case for characters with values greater than 255.
 
-       When  caseless  matching  is set, any letters in a class represent both
-       their upper case and lower case versions, so for  example,  a  caseless
-       [aeiou]  matches  "A"  as well as "a", and a caseless [^aeiou] does not
-       match "A", whereas a caseful version would. PCRE does not  support  the
-       concept of case for characters with values greater than 255.
+        The newline character is never treated in any special way in character
+        classes, whatever the setting of the PCRE_DOTALL or PCRE_MULTILINE
+        options is. A class such as [^a] will always match a newline.
 
-       The  newline character is never treated in any special way in character
-       classes, whatever the setting  of  the  PCRE_DOTALL  or  PCRE_MULTILINE
-       options is. A class such as [^a] will always match a newline.
+        The minus (hyphen) character can be used to specify a range of
+        characters in a character class. For example, [d-m] matches any letter
+        between d and m, inclusive. If a minus character is required in a
+        class, it must be escaped with a backslash or appear in a position
+        where it cannot be interpreted as indicating a range, typically as the
+        first or last character in the class.
 
-       The  minus (hyphen) character can be used to specify a range of charac-
-       ters in a character  class.  For  example,  [d-m]  matches  any  letter
-       between  d  and  m,  inclusive.  If  a minus character is required in a
-       class, it must be escaped with a backslash  or  appear  in  a  position
-       where  it cannot be interpreted as indicating a range, typically as the
-       first or last character in the class.
+        It is not possible to have the literal character "]" as the end
+        character of a range. A pattern such as [W-]46] is interpreted as a
+        class of two characters ("W" and "-") followed by a literal string
+        "46]", so it would match "W46]" or "-46]". However, if the "]" is
+        escaped with a backslash it is interpreted as the end of range, so
+        [W-\]46] is interpreted as a single class containing a range followed
+        by two separate characters. The octal or hexadecimal representation of
+        "]" can also be used to end a range.
 
-       It is not possible to have the literal character "]" as the end charac-
-       ter  of a range. A pattern such as [W-]46] is interpreted as a class of
-       two characters ("W" and "-") followed by a literal string "46]", so  it
-       would  match  "W46]"  or  "-46]". However, if the "]" is escaped with a
-       backslash it is interpreted as the end of range, so [W-\]46] is  inter-
-       preted  as  a  single class containing a range followed by two separate
-       characters. The octal or hexadecimal representation of "]" can also  be
-       used to end a range.
+        Ranges operate in the collating sequence of character values. They can
+        also be used for characters specified numerically, for example
+        [\000-\037]. In UTF-8 mode, ranges can include characters whose values
+        are greater than 255, for example [\x{100}-\x{2ff}].
 
-       Ranges  operate in the collating sequence of character values. They can
-       also  be  used  for  characters  specified  numerically,  for   example
-       [\000-\037].  In UTF-8 mode, ranges can include characters whose values
-       are greater than 255, for example [\x{100}-\x{2ff}].
+        If a range that includes letters is used when caseless matching is set,
+        it matches the letters in either case. For example, [W-c] is equivalent
+        to [][\^_`wxyzabc], matched caselessly, and if character tables for the
+        "fr" locale are in use, [\xc8-\xcb] matches accented E characters in
+        both cases.
 
-       If a range that includes letters is used when caseless matching is set,
-       it matches the letters in either case. For example, [W-c] is equivalent
-       to [][\^_`wxyzabc], matched caselessly, and if character tables for the
-       "fr"  locale  are  in use, [\xc8-\xcb] matches accented E characters in
-       both cases.
+        The character types \d, \D, \s, \S, \w, and \W may also appear in a
+        character class, and add the characters that they match to the class.
+        For example, [\dABCDEF] matches any hexadecimal digit. A circumflex can
+        conveniently be used with the upper case character types to specify a
+        more restricted set of characters than the matching lower case type.
+        For example, the class [^\W_] matches any letter or digit, but not
+        underscore.
 
-       The character types \d, \D, \s, \S, \w, and \W may  also  appear  in  a
-       character  class,  and add the characters that they match to the class.
-       For example, [\dABCDEF] matches any hexadecimal digit. A circumflex can
-       conveniently  be  used with the upper case character types to specify a
-       more restricted set of characters than the matching  lower  case  type.
-       For  example,  the  class  [^\W_]  matches any letter or digit, but not
-       underscore.
-
-       All non-alphameric characters other than \, -, ^ (at the start) and the
-       terminating ] are non-special in character classes, but it does no harm
-       if they are escaped.
-
+        All non-alphameric characters other than \, -, ^ (at the start) and the
+        terminating ] are non-special in character classes, but it does no harm
+        if they are escaped.
 
 POSIX CHARACTER CLASSES
-
-       Perl supports the POSIX notation  for  character  classes,  which  uses
-       names  enclosed by [: and :] within the enclosing square brackets. PCRE
-       also supports this notation. For example,
+        Perl supports the POSIX notation for character classes, which uses
+        names enclosed by [: and :] within the enclosing square brackets. PCRE
+        also supports this notation. For example,
 
          [01[:alpha:]%]
 
-       matches "0", "1", any alphabetic character, or "%". The supported class
-       names are
+        matches "0", "1", any alphabetic character, or "%". The supported class
+        names are
 
          alnum    letters and digits
          alpha    letters
@@ -462,168 +447,158 @@
          word     "word" characters (same as \w)
          xdigit   hexadecimal digits
 
-       The  "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13),
-       and space (32). Notice that this list includes the VT  character  (code
-       11). This makes "space" different to \s, which does not include VT (for
-       Perl compatibility).
+        The "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13),
+        and space (32). Notice that this list includes the VT character (code
+        11). This makes "space" different to \s, which does not include VT (for
+        Perl compatibility).
 
-       The name "word" is a Perl extension, and "blank"  is  a  GNU  extension
-       from  Perl  5.8. Another Perl extension is negation, which is indicated
-       by a ^ character after the colon. For example,
+        The name "word" is a Perl extension, and "blank" is a GNU extension
+        from Perl 5.8. Another Perl extension is negation, which is indicated
+        by a ^ character after the colon. For example,
 
          [12[:^digit:]]
 
-       matches "1", "2", or any non-digit. PCRE (and Perl) also recognize  the
-       POSIX syntax [.ch.] and [=ch=] where "ch" is a "collating element", but
-       these are not supported, and an error is given if they are encountered.
+        matches "1", "2", or any non-digit. PCRE (and Perl) also recognize the
+        POSIX syntax [.ch.] and [=ch=] where "ch" is a "collating element", but
+        these are not supported, and an error is given if they are encountered.
 
-       In UTF-8 mode, characters with values greater than 255 do not match any
-       of the POSIX character classes.
-
+        In UTF-8 mode, characters with values greater than 255 do not match any
+        of the POSIX character classes.
 
 VERTICAL BAR
-
-       Vertical bar characters are used to separate alternative patterns.  For
-       example, the pattern
+        Vertical bar characters are used to separate alternative patterns. For
+        example, the pattern
 
          gilbert|sullivan
 
-       matches  either "gilbert" or "sullivan". Any number of alternatives may
-       appear, and an empty  alternative  is  permitted  (matching  the  empty
-       string).   The  matching  process  tries each alternative in turn, from
-       left to right, and the first one that succeeds is used. If the alterna-
-       tives  are within a subpattern (defined below), "succeeds" means match-
-       ing the rest of the main pattern as well as the alternative in the sub-
-       pattern.
-
+        matches either "gilbert" or "sullivan". Any number of alternatives may
+        appear, and an empty alternative is permitted (matching the empty
+        string). The matching process tries each alternative in turn, from
+        left to right, and the first one that succeeds is used. If the
+        alternatives are within a subpattern (defined below), "succeeds" means
+        matching the rest of the main pattern as well as the alternative in the
+        subpattern.
 
 INTERNAL OPTION SETTING
-
-       The  settings  of  the  PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and
-       PCRE_EXTENDED options can be changed  from  within  the  pattern  by  a
-       sequence  of  Perl  option  letters  enclosed between "(?" and ")". The
-       option letters are
+        The settings of the PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and
+        PCRE_EXTENDED options can be changed from within the pattern by a
+        sequence of Perl option letters enclosed between "(?" and ")". The
+        option letters are
 
          i  for PCRE_CASELESS
          m  for PCRE_MULTILINE
          s  for PCRE_DOTALL
          x  for PCRE_EXTENDED
 
-       For example, (?im) sets caseless, multiline matching. It is also possi-
-       ble to unset these options by preceding the letter with a hyphen, and a
-       combined setting and unsetting such as (?im-sx), which sets  PCRE_CASE-
-       LESS  and PCRE_MULTILINE while unsetting PCRE_DOTALL and PCRE_EXTENDED,
-       is also permitted. If a  letter  appears  both  before  and  after  the
-       hyphen, the option is unset.
+        For example, (?im) sets caseless, multiline matching. It is also
+        possible to unset these options by preceding the letter with a hyphen,
+        and a combined setting and unsetting such as (?im-sx), which sets
+        PCRE_CASELESS and PCRE_MULTILINE while unsetting PCRE_DOTALL and
+        PCRE_EXTENDED, is also permitted. If a letter appears both before and
+        after the hyphen, the option is unset.
 
-       When  an option change occurs at top level (that is, not inside subpat-
-       tern parentheses), the change applies to the remainder of  the  pattern
-       that follows.  If the change is placed right at the start of a pattern,
-       PCRE extracts it into the global options (and it will therefore show up
-       in data extracted by the pcre_fullinfo() function).
+        When an option change occurs at top level (that is, not inside
+        subpattern parentheses), the change applies to the remainder of the
+        pattern that follows. If the change is placed right at the start of a
+        pattern, PCRE extracts it into the global options (and it will
+        therefore show up in data extracted by the pcre_fullinfo() function).
 
-       An option change within a subpattern affects only that part of the cur-
-       rent pattern that follows it, so
+        An option change within a subpattern affects only that part of the
+        current pattern that follows it, so
 
          (a(?i)b)c
 
-       matches abc and aBc and no other strings (assuming PCRE_CASELESS is not
-       used).   By  this means, options can be made to have different settings
-       in different parts of the pattern. Any changes made in one  alternative
-       do  carry  on  into subsequent branches within the same subpattern. For
-       example,
+        matches abc and aBc and no other strings (assuming PCRE_CASELESS is not
+        used). By this means, options can be made to have different settings
+        in different parts of the pattern. Any changes made in one alternative
+        do carry on into subsequent branches within the same subpattern. For
+        example,
 
          (a(?i)b|c)
 
-       matches "ab", "aB", "c", and "C", even though  when  matching  "C"  the
-       first  branch  is  abandoned before the option setting. This is because
-       the effects of option settings happen at compile time. There  would  be
-       some very weird behaviour otherwise.
+        matches "ab", "aB", "c", and "C", even though when matching "C" the
+        first branch is abandoned before the option setting. This is because
+        the effects of option settings happen at compile time. There would be
+        some very weird behaviour otherwise.
 
-       The  PCRE-specific  options PCRE_UNGREEDY and PCRE_EXTRA can be changed
-       in the same way as the Perl-compatible options by using the  characters
-       U  and X respectively. The (?X) flag setting is special in that it must
-       always occur earlier in the pattern than any of the additional features
-       it turns on, even when it is at top level. It is best put at the start.
-
+        The PCRE-specific options PCRE_UNGREEDY and PCRE_EXTRA can be changed
+        in the same way as the Perl-compatible options by using the characters
+        U and X respectively. The (?X) flag setting is special in that it must
+        always occur earlier in the pattern than any of the additional features
+        it turns on, even when it is at top level. It is best put at the start.
 
 SUBPATTERNS
+        Subpatterns are delimited by parentheses (round brackets), which can be
+        nested. Marking part of a pattern as a subpattern does two things:
 
-       Subpatterns are delimited by parentheses (round brackets), which can be
-       nested.  Marking part of a pattern as a subpattern does two things:
-
-       1. It localizes a set of alternatives. For example, the pattern
+        1. It localizes a set of alternatives. For example, the pattern
 
          cat(aract|erpillar|)
 
-       matches  one  of the words "cat", "cataract", or "caterpillar". Without
-       the parentheses, it would match "cataract",  "erpillar"  or  the  empty
-       string.
+        matches one of the words "cat", "cataract", or "caterpillar". Without
+        the parentheses, it would match "cataract", "erpillar" or the empty
+        string.
 
-       2.  It  sets  up  the  subpattern as a capturing subpattern (as defined
-       above).  When the whole pattern matches, that portion  of  the  subject
-       string that matched the subpattern is passed back to the caller via the
-       ovector argument of pcre_exec(). Opening parentheses are  counted  from
-       left  to right (starting from 1) to obtain the numbers of the capturing
-       subpatterns.
+        2. It sets up the subpattern as a capturing subpattern (as defined
+        above). When the whole pattern matches, that portion of the subject
+        string that matched the subpattern is passed back to the caller via the
+        ovector argument of pcre_exec(). Opening parentheses are counted from
+        left to right (starting from 1) to obtain the numbers of the capturing
+        subpatterns.
 
-       For example, if the string "the red king" is matched against  the  pat-
-       tern
+        For example, if the string "the red king" is matched against the
+        pattern
 
          the ((red|white) (king|queen))
 
-       the captured substrings are "red king", "red", and "king", and are num-
-       bered 1, 2, and 3, respectively.
+        the captured substrings are "red king", "red", and "king", and are
+        numbered 1, 2, and 3, respectively.
 
-       The fact that plain parentheses fulfil  two  functions  is  not  always
-       helpful.   There are often times when a grouping subpattern is required
-       without a capturing requirement. If an opening parenthesis is  followed
-       by  a question mark and a colon, the subpattern does not do any captur-
-       ing, and is not counted when computing the  number  of  any  subsequent
-       capturing  subpatterns. For example, if the string "the white queen" is
-       matched against the pattern
+        The fact that plain parentheses fulfil two functions is not always
+        helpful. There are often times when a grouping subpattern is required
+        without a capturing requirement. If an opening parenthesis is followed
+        by a question mark and a colon, the subpattern does not do any
+        capturing, and is not counted when computing the number of any
+        subsequent capturing subpatterns. For example, if the string "the white
+        queen" is matched against the pattern
 
          the ((?:red|white) (king|queen))
 
-       the captured substrings are "white queen" and "queen", and are numbered
-       1  and 2. The maximum number of capturing subpatterns is 65535, and the
-       maximum depth of nesting of all subpatterns, both  capturing  and  non-
-       capturing, is 200.
+        the captured substrings are "white queen" and "queen", and are numbered
+        1 and 2. The maximum number of capturing subpatterns is 65535, and the
+        maximum depth of nesting of all subpatterns, both capturing and
+        noncapturing, is 200.
 
-       As  a  convenient shorthand, if any option settings are required at the
-       start of a non-capturing subpattern,  the  option  letters  may  appear
-       between the "?" and the ":". Thus the two patterns
+        As a convenient shorthand, if any option settings are required at the
+        start of a non-capturing subpattern, the option letters may appear
+        between the "?" and the ":". Thus the two patterns
 
          (?i:saturday|sunday)
          (?:(?i)saturday|sunday)
 
-       match exactly the same set of strings. Because alternative branches are
-       tried from left to right, and options are not reset until  the  end  of
-       the  subpattern is reached, an option setting in one branch does affect
-       subsequent branches, so the above patterns match "SUNDAY"  as  well  as
-       "Saturday".
-
+        match exactly the same set of strings. Because alternative branches are
+        tried from left to right, and options are not reset until the end of
+        the subpattern is reached, an option setting in one branch does affect
+        subsequent branches, so the above patterns match "SUNDAY" as well as
+        "Saturday".
 
 NAMED SUBPATTERNS
+        Identifying capturing parentheses by number is simple, but it can be
+        very hard to keep track of the numbers in complicated regular
+        expressions. Furthermore, if an expression is modified, the numbers may
+        change. To help with the difficulty, PCRE supports the naming of
+        subpatterns, something that Perl does not provide. The Python syntax
+        (?P<name>...) is used. Names consist of alphanumeric characters and
+        underscores, and must be unique within a pattern.
 
-       Identifying  capturing  parentheses  by number is simple, but it can be
-       very hard to keep track of the numbers in complicated  regular  expres-
-       sions.  Furthermore,  if  an  expression  is  modified, the numbers may
-       change. To help with the difficulty, PCRE supports the naming  of  sub-
-       patterns,  something  that  Perl  does  not  provide. The Python syntax
-       (?P<name>...) is used. Names consist  of  alphanumeric  characters  and
-       underscores, and must be unique within a pattern.
-
-       Named  capturing  parentheses  are  still  allocated numbers as well as
-       names. The PCRE API provides function calls for extracting the name-to-
-       number  translation  table from a compiled pattern. For further details
-       see the pcreapi documentation.
-
+        Named capturing parentheses are still allocated numbers as well as
+        names. The PCRE API provides function calls for extracting the name-to-
+        number translation table from a compiled pattern. For further details
+        see the pcreapi documentation.
 
 REPETITION
-
-       Repetition is specified by quantifiers, which can  follow  any  of  the
-       following items:
+        Repetition is specified by quantifiers, which can follow any of the
+        following items:
 
          a literal data character
          the . metacharacter
@@ -633,693 +608,681 @@
          a back reference (see next section)
          a parenthesized subpattern (unless it is an assertion)
 
-       The  general repetition quantifier specifies a minimum and maximum num-
-       ber of permitted matches, by giving the two numbers in  curly  brackets
-       (braces),  separated  by  a comma. The numbers must be less than 65536,
-       and the first must be less than or equal to the second. For example:
+        The general repetition quantifier specifies a minimum and maximum
+        number of permitted matches, by giving the two numbers in curly
+        brackets (braces), separated by a comma. The numbers must be less than
+        65536, and the first must be less than or equal to the second. For
+        example:
 
          z{2,4}
 
-       matches "zz", "zzz", or "zzzz". A closing brace on its  own  is  not  a
-       special  character.  If  the second number is omitted, but the comma is
-       present, there is no upper limit; if the second number  and  the  comma
-       are  both omitted, the quantifier specifies an exact number of required
-       matches. Thus
+        matches "zz", "zzz", or "zzzz". A closing brace on its own is not a
+        special character. If the second number is omitted, but the comma is
+        present, there is no upper limit; if the second number and the comma
+        are both omitted, the quantifier specifies an exact number of required
+        matches. Thus
 
          [aeiou]{3,}
 
-       matches at least 3 successive vowels, but may match many more, while
+        matches at least 3 successive vowels, but may match many more, while
 
          \d{8}
 
-       matches exactly 8 digits. An opening curly bracket that  appears  in  a
-       position  where a quantifier is not allowed, or one that does not match
-       the syntax of a quantifier, is taken as a literal character. For  exam-
-       ple, {,6} is not a quantifier, but a literal string of four characters.
+        matches exactly 8 digits. An opening curly bracket that appears in a
+        position where a quantifier is not allowed, or one that does not match
+        the syntax of a quantifier, is taken as a literal character. For
+        example, {,6} is not a quantifier, but a literal string of four
+        characters.
 
-       In UTF-8 mode, quantifiers apply to UTF-8  characters  rather  than  to
-       individual bytes. Thus, for example, \x{100}{2} matches two UTF-8 char-
-       acters, each of which is represented by a two-byte sequence.
+        In UTF-8 mode, quantifiers apply to UTF-8 characters rather than to
+        individual bytes. Thus, for example, \x{100}{2} matches two UTF-8
+        characters, each of which is represented by a two-byte sequence.
 
-       The quantifier {0} is permitted, causing the expression to behave as if
-       the previous item and the quantifier were not present.
+        The quantifier {0} is permitted, causing the expression to behave as if
+        the previous item and the quantifier were not present.
 
-       For  convenience  (and  historical compatibility) the three most common
-       quantifiers have single-character abbreviations:
+        For convenience (and historical compatibility) the three most common
+        quantifiers have single-character abbreviations:
 
          *    is equivalent to {0,}
          +    is equivalent to {1,}
          ?    is equivalent to {0,1}
 
-       It is possible to construct infinite loops by  following  a  subpattern
-       that can match no characters with a quantifier that has no upper limit,
-       for example:
+        It is possible to construct infinite loops by following a subpattern
+        that can match no characters with a quantifier that has no upper limit,
+        for example:
 
          (a?)*
 
-       Earlier versions of Perl and PCRE used to give an error at compile time
-       for  such  patterns. However, because there are cases where this can be
-       useful, such patterns are now accepted, but if any  repetition  of  the
-       subpattern  does in fact match no characters, the loop is forcibly bro-
-       ken.
+        Earlier versions of Perl and PCRE used to give an error at compile time
+        for such patterns. However, because there are cases where this can be
+        useful, such patterns are now accepted, but if any repetition of the
+        subpattern does in fact match no characters, the loop is forcibly
+        broken.
 
-       By default, the quantifiers are "greedy", that is, they match  as  much
-       as  possible  (up  to  the  maximum number of permitted times), without
-       causing the rest of the pattern to fail. The classic example  of  where
-       this gives problems is in trying to match comments in C programs. These
-       appear between the sequences /* and */ and within the  sequence,  indi-
-       vidual * and / characters may appear. An attempt to match C comments by
-       applying the pattern
+        By default, the quantifiers are "greedy", that is, they match as much
+        as possible (up to the maximum number of permitted times), without
+        causing the rest of the pattern to fail. The classic example of where
+        this gives problems is in trying to match comments in C programs. These
+        appear between the sequences /* and */ and within the sequence,
+        individual * and / characters may appear. An attempt to match C
+        comments by applying the pattern
 
          /\*.*\*/
 
-       to the string
+        to the string
 
-         /* first command */  not comment  /* second comment */
+         /* first command */ not comment /* second comment */
 
-       fails, because it matches the entire string owing to the greediness  of
-       the .*  item.
+        fails, because it matches the entire string owing to the greediness of
+        the .* item.
 
-       However,  if  a quantifier is followed by a question mark, it ceases to
-       be greedy, and instead matches the minimum number of times possible, so
-       the pattern
+        However, if a quantifier is followed by a question mark, it ceases to
+        be greedy, and instead matches the minimum number of times possible, so
+        the pattern
 
          /\*.*?\*/
 
-       does  the  right  thing with the C comments. The meaning of the various
-       quantifiers is not otherwise changed,  just  the  preferred  number  of
-       matches.   Do  not  confuse this use of question mark with its use as a
-       quantifier in its own right. Because it has two uses, it can  sometimes
-       appear doubled, as in
+        does the right thing with the C comments. The meaning of the various
+        quantifiers is not otherwise changed, just the preferred number of
+        matches. Do not confuse this use of question mark with its use as a
+        quantifier in its own right. Because it has two uses, it can sometimes
+        appear doubled, as in
 
          \d??\d
 
-       which matches one digit by preference, but can match two if that is the
-       only way the rest of the pattern matches.
+        which matches one digit by preference, but can match two if that is the
+        only way the rest of the pattern matches.
 
-       If the PCRE_UNGREEDY option is set (an option which is not available in
-       Perl),  the  quantifiers are not greedy by default, but individual ones
-       can be made greedy by following them with a  question  mark.  In  other
-       words, it inverts the default behaviour.
+        If the PCRE_UNGREEDY option is set (an option which is not available in
+        Perl), the quantifiers are not greedy by default, but individual ones
+        can be made greedy by following them with a question mark. In other
+        words, it inverts the default behaviour.
 
-       When  a  parenthesized  subpattern  is quantified with a minimum repeat
-       count that is greater than 1 or with a limited maximum, more  store  is
-       required  for  the  compiled  pattern, in proportion to the size of the
-       minimum or maximum.
+        When a parenthesized subpattern is quantified with a minimum repeat
+        count that is greater than 1 or with a limited maximum, more store is
+        required for the compiled pattern, in proportion to the size of the
+        minimum or maximum.
 
-       If a pattern starts with .* or .{0,} and the PCRE_DOTALL option (equiv-
-       alent  to Perl's /s) is set, thus allowing the . to match newlines, the
-       pattern is implicitly anchored, because whatever follows will be  tried
-       against  every character position in the subject string, so there is no
-       point in retrying the overall match at any position  after  the  first.
-       PCRE normally treats such a pattern as though it were preceded by \A.
+        If a pattern starts with .* or .{0,} and the PCRE_DOTALL option
+        (equivalent to Perl's /s) is set, thus allowing the . to match
+        newlines, the pattern is implicitly anchored, because whatever follows
+        will be tried against every character position in the subject string,
+        so there is no point in retrying the overall match at any position
+        after the first. PCRE normally treats such a pattern as though it were
+        preceded by \A.
 
-       In  cases  where  it  is known that the subject string contains no new-
-       lines, it is worth setting PCRE_DOTALL in order to  obtain  this  opti-
-       mization, or alternatively using ^ to indicate anchoring explicitly.
+        In cases where it is known that the subject string contains no
+        newlines, it is worth setting PCRE_DOTALL in order to obtain this
+        optimization, or alternatively using ^ to indicate anchoring
+        explicitly.
 
-       However,  there is one situation where the optimization cannot be used.
-       When .*  is inside capturing parentheses that  are  the  subject  of  a
-       backreference  elsewhere in the pattern, a match at the start may fail,
-       and a later one succeed. Consider, for example:
+        However, there is one situation where the optimization cannot be used.
+        When .* is inside capturing parentheses that are the subject of a
+        backreference elsewhere in the pattern, a match at the start may fail,
+        and a later one succeed. Consider, for example:
 
          (.*)abc\1
 
-       If the subject is "xyz123abc123" the match point is the fourth  charac-
-       ter. For this reason, such a pattern is not implicitly anchored.
+        If the subject is "xyz123abc123" the match point is the fourth
+        character. For this reason, such a pattern is not implicitly anchored.
 
-       When a capturing subpattern is repeated, the value captured is the sub-
-       string that matched the final iteration. For example, after
+        When a capturing subpattern is repeated, the value captured is the
+        substring that matched the final iteration. For example, after
 
          (tweedle[dume]{3}\s*)+
 
-       has matched "tweedledum tweedledee" the value of the captured substring
-       is  "tweedledee".  However,  if there are nested capturing subpatterns,
-       the corresponding captured values may have been set in previous  itera-
-       tions. For example, after
+        has matched "tweedledum tweedledee" the value of the captured substring
+        is "tweedledee". However, if there are nested capturing subpatterns,
+        the corresponding captured values may have been set in previous
+        iterations. For example, after
 
          /(a|(b))+/
 
-       matches "aba" the value of the second captured substring is "b".
-
+        matches "aba" the value of the second captured substring is "b".
 
 ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS
+        With both maximizing and minimizing repetition, failure of what follows
+        normally causes the repeated item to be re-evaluated to see if a
+        different number of repeats allows the rest of the pattern to match.
+        Sometimes it is useful to prevent this, either to change the nature of
+        the match, or to cause it fail earlier than it otherwise might, when
+        the author of the pattern knows there is no point in carrying on.
 
-       With both maximizing and minimizing repetition, failure of what follows
-       normally causes the repeated item to be re-evaluated to see if  a  dif-
-       ferent number of repeats allows the rest of the pattern to match. Some-
-       times it is useful to prevent this, either to change the nature of  the
-       match,  or  to  cause it fail earlier than it otherwise might, when the
-       author of the pattern knows there is no point in carrying on.
-
-       Consider, for example, the pattern \d+foo when applied to  the  subject
-       line
+        Consider, for example, the pattern \d+foo when applied to the subject
+        line
 
          123456bar
 
-       After matching all 6 digits and then failing to match "foo", the normal
-       action of the matcher is to try again with only 5 digits  matching  the
-       \d+  item,  and  then  with  4,  and  so on, before ultimately failing.
-       "Atomic grouping" (a term taken from Jeffrey  Friedl's  book)  provides
-       the  means for specifying that once a subpattern has matched, it is not
-       to be re-evaluated in this way.
+        After matching all 6 digits and then failing to match "foo", the normal
+        action of the matcher is to try again with only 5 digits matching the
+        \d+ item, and then with 4, and so on, before ultimately failing.
+        "Atomic grouping" (a term taken from Jeffrey Friedl's book) provides
+        the means for specifying that once a subpattern has matched, it is not
+        to be re-evaluated in this way.
 
-       If we use atomic grouping for the previous example, the  matcher  would
-       give up immediately on failing to match "foo" the first time. The nota-
-       tion is a kind of special parenthesis, starting with  (?>  as  in  this
-       example:
+        If we use atomic grouping for the previous example, the matcher would
+        give up immediately on failing to match "foo" the first time. The
+        notation is a kind of special parenthesis, starting with (?> as in this
+        example:
 
          (?>\d+)foo
 
-       This  kind  of  parenthesis "locks up" the  part of the pattern it con-
-       tains once it has matched, and a failure further into  the  pattern  is
-       prevented  from  backtracking into it. Backtracking past it to previous
-       items, however, works as normal.
+        This kind of parenthesis "locks up" the part of the pattern it
+        contains once it has matched, and a failure further into the pattern is
+        prevented from backtracking into it. Backtracking past it to previous
+        items, however, works as normal.
 
-       An alternative description is that a subpattern of  this  type  matches
-       the  string  of  characters  that an identical standalone pattern would
-       match, if anchored at the current point in the subject string.
+        An alternative description is that a subpattern of this type matches
+        the string of characters that an identical standalone pattern would
+        match, if anchored at the current point in the subject string.
 
-       Atomic grouping subpatterns are not capturing subpatterns. Simple cases
-       such as the above example can be thought of as a maximizing repeat that
-       must swallow everything it can. So, while both \d+ and  \d+?  are  pre-
-       pared  to  adjust  the number of digits they match in order to make the
-       rest of the pattern match, (?>\d+) can only match an entire sequence of
-       digits.
+        Atomic grouping subpatterns are not capturing subpatterns. Simple cases
+        such as the above example can be thought of as a maximizing repeat that
+        must swallow everything it can. So, while both \d+ and \d+? are
+        prepared to adjust the number of digits they match in order to make the
+        rest of the pattern match, (?>\d+) can only match an entire sequence of
+        digits.
 
-       Atomic  groups in general can of course contain arbitrarily complicated
-       subpatterns, and can be nested. However, when  the  subpattern  for  an
-       atomic group is just a single repeated item, as in the example above, a
-       simpler notation, called a "possessive quantifier" can  be  used.  This
-       consists  of  an  additional  + character following a quantifier. Using
-       this notation, the previous example can be rewritten as
+        Atomic groups in general can of course contain arbitrarily complicated
+        subpatterns, and can be nested. However, when the subpattern for an
+        atomic group is just a single repeated item, as in the example above, a
+        simpler notation, called a "possessive quantifier" can be used. This
+        consists of an additional + character following a quantifier. Using
+        this notation, the previous example can be rewritten as
 
          \d++bar
 
-       Possessive  quantifiers  are  always  greedy;  the   setting   of   the
-       PCRE_UNGREEDY option is ignored. They are a convenient notation for the
-       simpler forms of atomic group. However, there is no difference  in  the
-       meaning  or  processing  of  a possessive quantifier and the equivalent
-       atomic group.
+        Possessive quantifiers are always greedy; the setting of the
+        PCRE_UNGREEDY option is ignored. They are a convenient notation for the
+        simpler forms of atomic group. However, there is no difference in the
+        meaning or processing of a possessive quantifier and the equivalent
+        atomic group.
 
-       The possessive quantifier syntax is an extension to the Perl syntax. It
-       originates in Sun's Java package.
+        The possessive quantifier syntax is an extension to the Perl syntax. It
+        originates in Sun's Java package.
 
-       When  a  pattern  contains an unlimited repeat inside a subpattern that
-       can itself be repeated an unlimited number of  times,  the  use  of  an
-       atomic  group  is  the  only way to avoid some failing matches taking a
-       very long time indeed. The pattern
+        When a pattern contains an unlimited repeat inside a subpattern that
+        can itself be repeated an unlimited number of times, the use of an
+        atomic group is the only way to avoid some failing matches taking a
+        very long time indeed. The pattern
 
          (\D+|<\d+>)*[!?]
 
-       matches an unlimited number of substrings that either consist  of  non-
-       digits,  or  digits  enclosed in <>, followed by either ! or ?. When it
-       matches, it runs quickly. However, if it is applied to
+        matches an unlimited number of substrings that either consist of non-
+        digits, or digits enclosed in <>, followed by either ! or ?. When it
+        matches, it runs quickly. However, if it is applied to
 
          aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
-       it takes a long time before reporting  failure.  This  is  because  the
-       string  can  be  divided  between  the two repeats in a large number of
-       ways, and all have to be tried. (The example used [!?]  rather  than  a
-       single  character  at the end, because both PCRE and Perl have an opti-
-       mization that allows for fast failure when a single character is  used.
-       They  remember  the last single character that is required for a match,
-       and fail early if it is not present in the string.)  If the pattern  is
-       changed to
+        it takes a long time before reporting failure. This is because the
+        string can be divided between the two repeats in a large number of
+        ways, and all have to be tried. (The example used [!?] rather than a
+        single character at the end, because both PCRE and Perl have an
+        optimization that allows for fast failure when a single character is
+        used. They remember the last single character that is required for a
+        match, and fail early if it is not present in the string.) If the
+        pattern is changed to
 
          ((?>\D+)|<\d+>)*[!?]
 
-       sequences  of non-digits cannot be broken, and failure happens quickly.
-
+        sequences of non-digits cannot be broken, and failure happens quickly.
 
 BACK REFERENCES
+        Outside a character class, a backslash followed by a digit greater than
+        0 (and possibly further digits) is a back reference to a capturing
+        subpattern earlier (that is, to its left) in the pattern, provided
+        there have been that many previous capturing left parentheses.
 
-       Outside a character class, a backslash followed by a digit greater than
-       0 (and possibly further digits) is a back reference to a capturing sub-
-       pattern earlier (that is, to its left) in the pattern,  provided  there
-       have been that many previous capturing left parentheses.
+        However, if the decimal number following the backslash is less than 10,
+        it is always taken as a back reference, and causes an error only if
+        there are not that many capturing left parentheses in the entire
+        pattern. In other words, the parentheses that are referenced need not
+        be to the left of the reference for numbers less than 10. See the
+        section entitled "Backslash" above for further details of the handling
+        of digits following a backslash.
 
-       However, if the decimal number following the backslash is less than 10,
-       it is always taken as a back reference, and causes  an  error  only  if
-       there  are  not that many capturing left parentheses in the entire pat-
-       tern. In other words, the parentheses that are referenced need  not  be
-       to  the left of the reference for numbers less than 10. See the section
-       entitled "Backslash" above for further details of the handling of  dig-
-       its following a backslash.
-
-       A  back  reference matches whatever actually matched the capturing sub-
-       pattern in the current subject string, rather  than  anything  matching
-       the subpattern itself (see "Subpatterns as subroutines" below for a way
-       of doing that). So the pattern
+        A back reference matches whatever actually matched the capturing
+        subpattern in the current subject string, rather than anything matching
+        the subpattern itself (see "Subpatterns as subroutines" below for a way
+        of doing that). So the pattern
 
          (sens|respons)e and \1ibility
 
-       matches "sense and sensibility" and "response and responsibility",  but
-       not  "sense and responsibility". If caseful matching is in force at the
-       time of the back reference, the case of letters is relevant. For  exam-
-       ple,
+        matches "sense and sensibility" and "response and responsibility", but
+        not "sense and responsibility". If caseful matching is in force at the
+        time of the back reference, the case of letters is relevant. For
+        example,
 
          ((?i)rah)\s+\1
 
-       matches  "rah  rah"  and  "RAH RAH", but not "RAH rah", even though the
-       original capturing subpattern is matched caselessly.
+        matches "rah rah" and "RAH RAH", but not "RAH rah", even though the
+        original capturing subpattern is matched caselessly.
 
-       Back references to named subpatterns use the Python  syntax  (?P=name).
-       We could rewrite the above example as follows:
+        Back references to named subpatterns use the Python syntax (?P=name).
+        We could rewrite the above example as follows:
 
          (?<p1>(?i)rah)\s+(?P=p1)
 
-       There  may be more than one back reference to the same subpattern. If a
-       subpattern has not actually been used in a particular match,  any  back
-       references to it always fail. For example, the pattern
+        There may be more than one back reference to the same subpattern. If a
+        subpattern has not actually been used in a particular match, any back
+        references to it always fail. For example, the pattern
 
          (a|(bc))\2
 
-       always  fails if it starts to match "a" rather than "bc". Because there
-       may be many capturing parentheses in a pattern,  all  digits  following
-       the  backslash  are taken as part of a potential back reference number.
-       If the pattern continues with a digit character, some delimiter must be
-       used  to  terminate  the back reference. If the PCRE_EXTENDED option is
-       set, this can be whitespace.  Otherwise an empty comment can be used.
+        always fails if it starts to match "a" rather than "bc". Because there
+        may be many capturing parentheses in a pattern, all digits following
+        the backslash are taken as part of a potential back reference number.
+        If the pattern continues with a digit character, some delimiter must be
+        used to terminate the back reference. If the PCRE_EXTENDED option is
+        set, this can be whitespace. Otherwise an empty comment can be used.
 
-       A back reference that occurs inside the parentheses to which it  refers
-       fails  when  the subpattern is first used, so, for example, (a\1) never
-       matches.  However, such references can be useful inside  repeated  sub-
-       patterns. For example, the pattern
+        A back reference that occurs inside the parentheses to which it refers
+        fails when the subpattern is first used, so, for example, (a\1) never
+        matches. However, such references can be useful inside repeated
+        subpatterns. For example, the pattern
 
          (a|b\1)+
 
-       matches any number of "a"s and also "aba", "ababbaa" etc. At each iter-
-       ation of the subpattern,  the  back  reference  matches  the  character
-       string  corresponding  to  the previous iteration. In order for this to
-       work, the pattern must be such that the first iteration does  not  need
-       to  match the back reference. This can be done using alternation, as in
-       the example above, or by a quantifier with a minimum of zero.
-
+        matches any number of "a"s and also "aba", "ababbaa" etc. At each
+        iteration of the subpattern, the back reference matches the character
+        string corresponding to the previous iteration. In order for this to
+        work, the pattern must be such that the first iteration does not need
+        to match the back reference. This can be done using alternation, as in
+        the example above, or by a quantifier with a minimum of zero.
 
 ASSERTIONS
+        An assertion is a test on the characters following or preceding the
+        current matching point that does not actually consume any characters.
+        The simple assertions coded as \b, \B, \A, \G, \Z, \z, ^ and $ are
+        described above. More complicated assertions are coded as subpatterns.
+        There are two kinds: those that look ahead of the current position in
+        the subject string, and those that look behind it.
 
-       An assertion is a test on the characters  following  or  preceding  the
-       current  matching  point that does not actually consume any characters.
-       The simple assertions coded as \b, \B, \A, \G, \Z,  \z,  ^  and  $  are
-       described above.  More complicated assertions are coded as subpatterns.
-       There are two kinds: those that look ahead of the current  position  in
-       the subject string, and those that look behind it.
-
-       An  assertion  subpattern  is matched in the normal way, except that it
-       does not cause the current matching position to be  changed.  Lookahead
-       assertions  start with (?= for positive assertions and (?! for negative
-       assertions. For example,
+        An assertion subpattern is matched in the normal way, except that it
+        does not cause the current matching position to be changed. Lookahead
+        assertions start with (?= for positive assertions and (?! for negative
+        assertions. For example,
 
          \w+(?=;)
 
-       matches a word followed by a semicolon, but does not include the  semi-
-       colon in the match, and
+        matches a word followed by a semicolon, but does not include the
+        semicolon in the match, and
 
          foo(?!bar)
 
-       matches  any  occurrence  of  "foo" that is not followed by "bar". Note
-       that the apparently similar pattern
+        matches any occurrence of "foo" that is not followed by "bar". Note
+        that the apparently similar pattern
 
          (?!foo)bar
 
-       does not find an occurrence of "bar"  that  is  preceded  by  something
-       other  than "foo"; it finds any occurrence of "bar" whatsoever, because
-       the assertion (?!foo) is always true when the next three characters are
-       "bar". A lookbehind assertion is needed to achieve this effect.
+        does not find an occurrence of "bar" that is preceded by something
+        other than "foo"; it finds any occurrence of "bar" whatsoever, because
+        the assertion (?!foo) is always true when the next three characters are
+        "bar". A lookbehind assertion is needed to achieve this effect.
 
-       If you want to force a matching failure at some point in a pattern, the
-       most convenient way to do it is  with  (?!)  because  an  empty  string
-       always  matches, so an assertion that requires there not to be an empty
-       string must always fail.
+        If you want to force a matching failure at some point in a pattern, the
+        most convenient way to do it is with (?!) because an empty string
+        always matches, so an assertion that requires there not to be an empty
+        string must always fail.
 
-       Lookbehind assertions start with (?<= for positive assertions and  (?<!
-       for negative assertions. For example,
+        Lookbehind assertions start with (?<= for positive assertions and (?<!
+        for negative assertions. For example,
 
          (?<!foo)bar
 
-       does  find  an  occurrence  of "bar" that is not preceded by "foo". The
-       contents of a lookbehind assertion are restricted  such  that  all  the
-       strings it matches must have a fixed length. However, if there are sev-
-       eral alternatives, they do not all have to have the same fixed  length.
-       Thus
+        does find an occurrence of "bar" that is not preceded by "foo". The
+        contents of a lookbehind assertion are restricted such that all the
+        strings it matches must have a fixed length. However, if there are
+        several alternatives, they do not all have to have the same fixed
+        length. Thus
 
          (?<=bullock|donkey)
 
-       is permitted, but
+        is permitted, but
 
          (?<!dogs?|cats?)
 
-       causes  an  error at compile time. Branches that match different length
-       strings are permitted only at the top level of a lookbehind  assertion.
-       This  is  an  extension  compared  with  Perl (at least for 5.8), which
-       requires all branches to match the same length of string. An  assertion
-       such as
+        causes an error at compile time. Branches that match different length
+        strings are permitted only at the top level of a lookbehind assertion.
+        This is an extension compared with Perl (at least for 5.8), which
+        requires all branches to match the same length of string. An assertion
+        such as
 
          (?<=ab(c|de))
 
-       is  not  permitted,  because  its single top-level branch can match two
-       different lengths, but it is acceptable if rewritten to  use  two  top-
-       level branches:
+        is not permitted, because its single top-level branch can match two
+        different lengths, but it is acceptable if rewritten to use two top-
+        level branches:
 
          (?<=abc|abde)
 
-       The  implementation  of lookbehind assertions is, for each alternative,
-       to temporarily move the current position back by the  fixed  width  and
-       then try to match. If there are insufficient characters before the cur-
-       rent position, the match is deemed to fail.
+        The implementation of lookbehind assertions is, for each alternative,
+        to temporarily move the current position back by the fixed width and
+        then try to match. If there are insufficient characters before the
+        current position, the match is deemed to fail.
 
-       PCRE does not allow the \C escape (which matches a single byte in UTF-8
-       mode)  to appear in lookbehind assertions, because it makes it impossi-
-       ble to calculate the length of the lookbehind.
+        PCRE does not allow the \C escape (which matches a single byte in UTF-8
+        mode) to appear in lookbehind assertions, because it makes it
+        impossible to calculate the length of the lookbehind.
 
-       Atomic groups can be used in conjunction with lookbehind assertions  to
-       specify efficient matching at the end of the subject string. Consider a
-       simple pattern such as
+        Atomic groups can be used in conjunction with lookbehind assertions to
+        specify efficient matching at the end of the subject string. Consider a
+        simple pattern such as
 
          abcd$
 
-       when applied to a long string that does  not  match.  Because  matching
-       proceeds from left to right, PCRE will look for each "a" in the subject
-       and then see if what follows matches the rest of the  pattern.  If  the
-       pattern is specified as
+        when applied to a long string that does not match. Because matching
+        proceeds from left to right, PCRE will look for each "a" in the subject
+        and then see if what follows matches the rest of the pattern. If the
+        pattern is specified as
 
          ^.*abcd$
 
-       the  initial .* matches the entire string at first, but when this fails
-       (because there is no following "a"), it backtracks to match all but the
-       last  character,  then all but the last two characters, and so on. Once
-       again the search for "a" covers the entire string, from right to  left,
-       so we are no better off. However, if the pattern is written as
+        the initial .* matches the entire string at first, but when this fails
+        (because there is no following "a"), it backtracks to match all but the
+        last character, then all but the last two characters, and so on. Once
+        again the search for "a" covers the entire string, from right to left,
+        so we are no better off. However, if the pattern is written as
 
          ^(?>.*)(?<=abcd)
 
-       or, equivalently,
+        or, equivalently,
 
          ^.*+(?<=abcd)
 
-       there  can  be  no  backtracking for the .* item; it can match only the
-       entire string. The subsequent lookbehind assertion does a  single  test
-       on  the last four characters. If it fails, the match fails immediately.
-       For long strings, this approach makes a significant difference  to  the
-       processing time.
+        there can be no backtracking for the .* item; it can match only the
+        entire string. The subsequent lookbehind assertion does a single test
+        on the last four characters. If it fails, the match fails immediately.
+        For long strings, this approach makes a significant difference to the
+        processing time.
 
-       Several assertions (of any sort) may occur in succession. For example,
+        Several assertions (of any sort) may occur in succession. For example,
 
          (?<=\d{3})(?<!999)foo
 
-       matches  "foo" preceded by three digits that are not "999". Notice that
-       each of the assertions is applied independently at the  same  point  in
-       the  subject  string.  First  there  is a check that the previous three
-       characters are all digits, and then there is  a  check  that  the  same
-       three characters are not "999".  This pattern does not match "foo" pre-
-       ceded by six characters, the first of which are  digits  and  the  last
-       three  of  which  are not "999". For example, it doesn't match "123abc-
-       foo". A pattern to do that is
+        matches "foo" preceded by three digits that are not "999". Notice that
+        each of the assertions is applied independently at the same point in
+        the subject string. First there is a check that the previous three
+        characters are all digits, and then there is a check that the same
+        three characters are not "999". This pattern does not match "foo"
+        preceded by six characters, the first of which are digits and the last
+        three of which are not "999". For example, it doesn't match
+        "123abcfoo". A pattern to do that is
 
          (?<=\d{3}...)(?<!999)foo
 
-       This time the first assertion looks at the  preceding  six  characters,
-       checking that the first three are digits, and then the second assertion
-       checks that the preceding three characters are not "999".
+        This time the first assertion looks at the preceding six characters,
+        checking that the first three are digits, and then the second assertion
+        checks that the preceding three characters are not "999".
 
-       Assertions can be nested in any combination. For example,
+        Assertions can be nested in any combination. For example,
 
          (?<=(?<!foo)bar)baz
 
-       matches an occurrence of "baz" that is preceded by "bar" which in  turn
-       is not preceded by "foo", while
+        matches an occurrence of "baz" that is preceded by "bar" which in turn
+        is not preceded by "foo", while
 
          (?<=\d{3}(?!999)...)foo
 
-       is another pattern which matches "foo" preceded by three digits and any
-       three characters that are not "999".
+        is another pattern which matches "foo" preceded by three digits and any
+        three characters that are not "999".
 
-       Assertion subpatterns are not capturing subpatterns,  and  may  not  be
-       repeated,  because  it  makes no sense to assert the same thing several
-       times. If any kind of assertion contains capturing  subpatterns  within
-       it,  these are counted for the purposes of numbering the capturing sub-
-       patterns in the whole pattern.  However, substring capturing is carried
-       out  only  for  positive assertions, because it does not make sense for
-       negative assertions.
-
+        Assertion subpatterns are not capturing subpatterns, and may not be
+        repeated, because it makes no sense to assert the same thing several
+        times. If any kind of assertion contains capturing subpatterns within
+        it, these are counted for the purposes of numbering the capturing
+        subpatterns in the whole pattern. However, substring capturing is
+        carried out only for positive assertions, because it does not make
+        sense for negative assertions.
 
 CONDITIONAL SUBPATTERNS
-
-       It is possible to cause the matching process to obey a subpattern  con-
-       ditionally  or to choose between two alternative subpatterns, depending
-       on the  result  of  an  assertion,  or  whether  a  previous  capturing
-       subpattern  matched  or not. The two possible forms of conditional sub-
-       pattern are
+        It is possible to cause the matching process to obey a subpattern
+        conditionally or to choose between two alternative subpatterns,
+        depending on the result of an assertion, or whether a previous
+        capturing subpattern matched or not. The two possible forms of
+        conditional subpattern are
 
          (?(condition)yes-pattern)
          (?(condition)yes-pattern|no-pattern)
 
-       If the condition is satisfied, the yes-pattern is used;  otherwise  the
-       no-pattern  (if  present)  is used. If there are more than two alterna-
-       tives in the subpattern, a compile-time error occurs.
+        If the condition is satisfied, the yes-pattern is used; otherwise the
+        no-pattern (if present) is used. If there are more than two
+        alternatives in the subpattern, a compile-time error occurs.
 
-       There are three kinds of condition. If the text between the parentheses
-       consists  of  a  sequence  of digits, the condition is satisfied if the
-       capturing subpattern of that number has previously matched. The  number
-       must  be  greater than zero. Consider the following pattern, which con-
-       tains non-significant white space to make it more readable (assume  the
-       PCRE_EXTENDED  option)  and  to  divide it into three parts for ease of
-       discussion:
+        There are three kinds of condition. If the text between the parentheses
+        consists of a sequence of digits, the condition is satisfied if the
+        capturing subpattern of that number has previously matched. The number
+        must be greater than zero. Consider the following pattern, which
+        contains non-significant white space to make it more readable (assume
+        the PCRE_EXTENDED option) and to divide it into three parts for ease of
+        discussion:
 
-         ( \( )?    [^()]+    (?(1) \) )
+         ( \( )?    [^()]+   (?(1) \) )
 
-       The first part matches an optional opening  parenthesis,  and  if  that
-       character is present, sets it as the first captured substring. The sec-
-       ond part matches one or more characters that are not  parentheses.  The
-       third part is a conditional subpattern that tests whether the first set
-       of parentheses matched or not. If they did, that is, if subject started
-       with an opening parenthesis, the condition is true, and so the yes-pat-
-       tern is executed and a  closing  parenthesis  is  required.  Otherwise,
-       since  no-pattern  is  not  present, the subpattern matches nothing. In
-       other words,  this  pattern  matches  a  sequence  of  non-parentheses,
-       optionally enclosed in parentheses.
+        The first part matches an optional opening parenthesis, and if that
+        character is present, sets it as the first captured substring. The
+        second part matches one or more characters that are not parentheses.
+        The third part is a conditional subpattern that tests whether the first
+        set of parentheses matched or not. If they did, that is, if subject
+        started with an opening parenthesis, the condition is true, and so the
+        yes-pattern is executed and a closing parenthesis is required.
+        Otherwise, since no-pattern is not present, the subpattern matches
+        nothing. In other words, this pattern matches a sequence of
+        non-parentheses, optionally enclosed in parentheses.
 
-       If the condition is the string (R), it is satisfied if a recursive call
-       to the pattern or subpattern has been made. At "top level", the  condi-
-       tion  is  false.   This  is  a  PCRE  extension. Recursive patterns are
-       described in the next section.
+        If the condition is the string (R), it is satisfied if a recursive call
+        to the pattern or subpattern has been made. At "top level", the
+        condition is false. This is a PCRE extension. Recursive patterns are
+        described in the next section.
 
-       If the condition is not a sequence of digits or  (R),  it  must  be  an
-       assertion.   This may be a positive or negative lookahead or lookbehind
-       assertion. Consider  this  pattern,  again  containing  non-significant
-       white space, and with the two alternatives on the second line:
+        If the condition is not a sequence of digits or (R), it must be an
+        assertion. This may be a positive or negative lookahead or lookbehind
+        assertion. Consider this pattern, again containing non-significant
+        white space, and with the two alternatives on the second line:
 
          (?(?=[^a-z]*[a-z])
          \d{2}-[a-z]{3}-\d{2}  |  \d{2}-\d{2}-\d{2} )
 
-       The  condition  is  a  positive  lookahead  assertion  that  matches an
-       optional sequence of non-letters followed by a letter. In other  words,
-       it  tests  for the presence of at least one letter in the subject. If a
-       letter is found, the subject is matched against the first  alternative;
-       otherwise  it  is  matched  against  the  second.  This pattern matches
-       strings in one of the two forms dd-aaa-dd or dd-dd-dd,  where  aaa  are
-       letters and dd are digits.
-
+        The condition is a positive lookahead assertion that matches an
+        optional sequence of non-letters followed by a letter. In other words,
+        it tests for the presence of at least one letter in the subject. If a
+        letter is found, the subject is matched against the first alternative;
+        otherwise it is matched against the second. This pattern matches
+        strings in one of the two forms dd-aaa-dd or dd-dd-dd, where aaa are
+        letters and dd are digits.
 
 COMMENTS
+        The sequence (?# marks the start of a comment which continues up to the
+        next closing parenthesis. Nested parentheses are not permitted. The
+        characters that make up a comment play no part in the pattern matching
+        at all.
 
-       The sequence (?# marks the start of a comment which continues up to the
-       next closing parenthesis. Nested parentheses  are  not  permitted.  The
-       characters  that make up a comment play no part in the pattern matching
-       at all.
-
-       If the PCRE_EXTENDED option is set, an unescaped # character outside  a
-       character class introduces a comment that continues up to the next new-
-       line character in the pattern.
-
+        If the PCRE_EXTENDED option is set, an unescaped # character outside a
+        character class introduces a comment that continues up to the next
+        newline character in the pattern.
 
 RECURSIVE PATTERNS
-
-       Consider the problem of matching a string in parentheses, allowing  for
-       unlimited  nested  parentheses.  Without the use of recursion, the best
-       that can be done is to use a pattern that  matches  up  to  some  fixed
-       depth  of  nesting.  It  is not possible to handle an arbitrary nesting
-       depth. Perl has provided an experimental facility that  allows  regular
-       expressions to recurse (amongst other things). It does this by interpo-
-       lating Perl code in the expression at run time, and the code can  refer
-       to the expression itself. A Perl pattern to solve the parentheses prob-
-       lem can be created like this:
+        Consider the problem of matching a string in parentheses, allowing for
+        unlimited nested parentheses. Without the use of recursion, the best
+        that can be done is to use a pattern that matches up to some fixed
+        depth of nesting. It is not possible to handle an arbitrary nesting
+        depth. Perl has provided an experimental facility that allows regular
+        expressions to recurse (amongst other things). It does this by
+        interpolating Perl code in the expression at run time, and the code can
+        refer to the expression itself. A Perl pattern to solve the parentheses
+        problem can be created like this:
 
          $re = qr{\( (?: (?>[^()]+) | (?p{$re}) )* \)}x;
 
-       The (?p{...}) item interpolates Perl code at run time, and in this case
-       refers  recursively to the pattern in which it appears. Obviously, PCRE
-       cannot support the interpolation of Perl  code.  Instead,  it  supports
-       some  special  syntax for recursion of the entire pattern, and also for
-       individual subpattern recursion.
+        The (?p{...}) item interpolates Perl code at run time, and in this case
+        refers recursively to the pattern in which it appears. Obviously, PCRE
+        cannot support the interpolation of Perl code. Instead, it supports
+        some special syntax for recursion of the entire pattern, and also for
+        individual subpattern recursion.
 
-       The special item that consists of (? followed by a number greater  than
-       zero and a closing parenthesis is a recursive call of the subpattern of
-       the given number, provided that it occurs inside that  subpattern.  (If
-       not,  it  is  a  "subroutine" call, which is described in the next sec-
-       tion.) The special item (?R) is a recursive call of the entire  regular
-       expression.
+        The special item that consists of (? followed by a number greater than
+        zero and a closing parenthesis is a recursive call of the subpattern of
+        the given number, provided that it occurs inside that subpattern. (If
+        not, it is a "subroutine" call, which is described in the next
+        section.) The special item (?R) is a recursive call of the entire
+        regular expression.
 
-       For  example,  this  PCRE pattern solves the nested parentheses problem
-       (assume the  PCRE_EXTENDED  option  is  set  so  that  white  space  is
-       ignored):
+        For example, this PCRE pattern solves the nested parentheses problem
+        (assume the PCRE_EXTENDED option is set so that white space is
+        ignored):
 
          \( ( (?>[^()]+) | (?R) )* \)
 
-       First  it matches an opening parenthesis. Then it matches any number of
-       substrings which can either be a  sequence  of  non-parentheses,  or  a
-       recursive  match  of  the pattern itself (that is a correctly parenthe-
-       sized substring).  Finally there is a closing parenthesis.
+        First it matches an opening parenthesis. Then it matches any number of
+        substrings which can either be a sequence of non-parentheses, or a
+        recursive match of the pattern itself (that is a correctly
+        parenthesized substring). Finally there is a closing parenthesis.
 
-       If this were part of a larger pattern, you would not  want  to  recurse
-       the entire pattern, so instead you could use this:
+        If this were part of a larger pattern, you would not want to recurse
+        the entire pattern, so instead you could use this:
 
          ( \( ( (?>[^()]+) | (?1) )* \) )
 
-       We  have  put the pattern into parentheses, and caused the recursion to
-       refer to them instead of the whole pattern. In a larger pattern,  keep-
-       ing  track  of parenthesis numbers can be tricky. It may be more conve-
-       nient to use named parentheses instead. For this, PCRE uses  (?P>name),
-       which  is  an  extension  to the Python syntax that PCRE uses for named
-       parentheses (Perl does not provide named parentheses). We could rewrite
-       the above example as follows:
+        We have put the pattern into parentheses, and caused the recursion to
+        refer to them instead of the whole pattern. In a larger pattern,
+        keeping track of parenthesis numbers can be tricky. It may be more
+        convenient to use named parentheses instead. For this, PCRE uses
+        (?P>name), which is an extension to the Python syntax that PCRE uses
+        for named parentheses (Perl does not provide named parentheses). We
+        could rewrite the above example as follows:
 
          (?P<pn> \( ( (?>[^()]+) | (?P>pn) )* \) )
 
-       This  particular example pattern contains nested unlimited repeats, and
-       so the use of atomic grouping for matching strings  of  non-parentheses
-       is  important  when  applying the pattern to strings that do not match.
-       For example, when this pattern is applied to
+        This particular example pattern contains nested unlimited repeats, and
+        so the use of atomic grouping for matching strings of non-parentheses
+        is important when applying the pattern to strings that do not match.
+        For example, when this pattern is applied to
 
          (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
 
-       it yields "no match" quickly. However, if atomic grouping is not  used,
-       the  match  runs  for a very long time indeed because there are so many
-       different ways the + and * repeats can carve up the  subject,  and  all
-       have to be tested before failure can be reported.
+        it yields "no match" quickly. However, if atomic grouping is not used,
+        the match runs for a very long time indeed because there are so many
+        different ways the + and * repeats can carve up the subject, and all
+        have to be tested before failure can be reported.
 
-       At the end of a match, the values set for any capturing subpatterns are
-       those from the outermost level of the recursion at which the subpattern
-       value  is  set.   If  you want to obtain intermediate values, a callout
-       function can be used (see below and the pcrecallout documentation).  If
-       the pattern above is matched against
+        At the end of a match, the values set for any capturing subpatterns are
+        those from the outermost level of the recursion at which the subpattern
+        value is set. If you want to obtain intermediate values, a callout
+        function can be used (see below and the pcrecallout documentation). If
+        the pattern above is matched against
 
          (ab(cd)ef)
 
-       the  value  for  the  capturing  parentheses is "ef", which is the last
-       value taken on at the top level. If additional parentheses  are  added,
-       giving
+        the value for the capturing parentheses is "ef", which is the last
+        value taken on at the top level. If additional parentheses are added,
+        giving
 
          \( ( ( (?>[^()]+) | (?R) )* ) \)
             ^                        ^
             ^                        ^
 
-       the  string  they  capture is "ab(cd)ef", the contents of the top level
-       parentheses. If there are more than 15 capturing parentheses in a  pat-
-       tern, PCRE has to obtain extra memory to store data during a recursion,
-       which it does by using pcre_malloc, freeing  it  via  pcre_free  after-
-       wards.  If  no  memory  can  be  obtained,  the  match  fails  with the
-       PCRE_ERROR_NOMEMORY error.
+        the string they capture is "ab(cd)ef", the contents of the top level
+        parentheses. If there are more than 15 capturing parentheses in a
+        pattern, PCRE has to obtain extra memory to store data during a
+        recursion, which it does by using pcre_malloc, freeing it via pcre_free
+        afterwards. If no memory can be obtained, the match fails with the
+        PCRE_ERROR_NOMEMORY error.
 
-       Do not confuse the (?R) item with the condition (R),  which  tests  for
-       recursion.   Consider  this pattern, which matches text in angle brack-
-       ets, allowing for arbitrary nesting. Only digits are allowed in  nested
-       brackets  (that is, when recursing), whereas any characters are permit-
-       ted at the outer level.
+        Do not confuse the (?R) item with the condition (R), which tests for
+        recursion. Consider this pattern, which matches text in angle
+        brackets, allowing for arbitrary nesting. Only digits are allowed in
+        nested brackets (that is, when recursing), whereas any characters are
+        permitted at the outer level.
 
          < (?: (?(R) \d++  | [^<>]*+) | (?R)) * >
 
-       In this pattern, (?(R) is the start of a conditional  subpattern,  with
-       two  different  alternatives for the recursive and non-recursive cases.
-       The (?R) item is the actual recursive call.
-
+        In this pattern, (?(R) is the start of a conditional subpattern, with
+        two different alternatives for the recursive and non-recursive cases.
+        The (?R) item is the actual recursive call.
 
 SUBPATTERNS AS SUBROUTINES
-
-       If the syntax for a recursive subpattern reference (either by number or
-       by  name)  is used outside the parentheses to which it refers, it oper-
-       ates like a subroutine in a programming language.  An  earlier  example
-       pointed out that the pattern
+        If the syntax for a recursive subpattern reference (either by number or
+        by name) is used outside the parentheses to which it refers, it
+        operates like a subroutine in a programming language. An earlier
+        example pointed out that the pattern
 
          (sens|respons)e and \1ibility
 
-       matches  "sense and sensibility" and "response and responsibility", but
-       not "sense and responsibility". If instead the pattern
+        matches "sense and sensibility" and "response and responsibility", but
+        not "sense and responsibility". If instead the pattern
 
          (sens|respons)e and (?1)ibility
 
-       is used, it does match "sense and responsibility" as well as the  other
-       two  strings.  Such  references must, however, follow the subpattern to
-       which they refer.
-
+        is used, it does match "sense and responsibility" as well as the other
+        two strings. Such references must, however, follow the subpattern to
+        which they refer.
 
 CALLOUTS
+        Perl has a feature whereby using the sequence (?{...}) causes arbitrary
+        Perl code to be obeyed in the middle of matching a regular expression.
+        This makes it possible, amongst other things, to extract different
+        substrings that match the same pair of parentheses when there is a
+        repetition.
 
-       Perl has a feature whereby using the sequence (?{...}) causes arbitrary
-       Perl  code to be obeyed in the middle of matching a regular expression.
-       This makes it possible, amongst other things, to extract different sub-
-       strings that match the same pair of parentheses when there is a repeti-
-       tion.
+        PCRE provides a similar feature, but of course it cannot obey arbitrary
+        Perl code. The feature is called "callout". The caller of PCRE provides
+        an external function by putting its entry point in the global variable
+        pcre_callout. By default, this variable contains NULL, which disables
+        all calling out.
 
-       PCRE provides a similar feature, but of course it cannot obey arbitrary
-       Perl code. The feature is called "callout". The caller of PCRE provides
-       an external function by putting its entry point in the global  variable
-       pcre_callout.   By default, this variable contains NULL, which disables
-       all calling out.
-
-       Within a regular expression, (?C) indicates the  points  at  which  the
-       external  function  is  to be called. If you want to identify different
-       callout points, you can put a number less than 256 after the letter  C.
-       The  default  value is zero.  For example, this pattern has two callout
-       points:
+        Within a regular expression, (?C) indicates the points at which the
+        external function is to be called. If you want to identify different
+        callout points, you can put a number less than 256 after the letter C.
+        The default value is zero. For example, this pattern has two callout
+        points:
 
          (?C1)abc(?C2)def
 
-       During matching, when PCRE reaches a callout point (and pcre_callout is
-       set),  the  external function is called. It is provided with the number
-       of the callout, and, optionally, one item of data  originally  supplied
-       by  the  caller of pcre_exec(). The callout function may cause matching
-       to backtrack, or to fail altogether.  A  complete  description  of  the
-       interface  to the callout function is given in the pcrecallout documen-
-       tation.
-
+        During matching, when PCRE reaches a callout point (and pcre_callout is
+        set), the external function is called. It is provided with the number
+        of the callout, and, optionally, one item of data originally supplied
+        by the caller of pcre_exec(). The callout function may cause matching
+        to backtrack, or to fail altogether. A complete description of the
+        interface to the callout function is given in the pcrecallout
+        documentation.
 
 DIFFERENCES FROM PERL
-       This section escribes the differences in the ways that PCRE and Perl
-       handle regular expressions. The differences  described  here  are  with
-       respect to Perl 5.8.
+        This section escribes the differences in the ways that PCRE and Perl
+        handle regular expressions. The differences described here are with
+        respect to Perl 5.8.
 
-       1.  PCRE does not have full UTF-8 support. Details of what it does have
-       are given in the section on UTF-8 support in the main pcre page.
+        1.  PCRE does not have full UTF-8 support. Details of what it does have
+        are given in the section on UTF-8 support in the main pcre page.
 
-       2. PCRE does not allow repeat quantifiers on lookahead assertions. Perl
-       permits  them,  but they do not mean what you might think. For example,
-       (?!a){3} does not assert that the next three characters are not "a". It
-       just asserts that the next character is not "a" three times.
+        2.  PCRE does not allow repeat quantifiers on lookahead assertions.
+        Perl permits them, but they do not mean what you might think. For
+        example, (?!a){3} does not assert that the next three characters are
+        not "a". It just asserts that the next character is not "a" three
+        times.
 
-       3.  Capturing  subpatterns  that occur inside negative lookahead asser-
-       tions are counted, but their entries in the offsets  vector  are  never
-       set.  Perl sets its numerical variables from any such patterns that are
-       matched before the assertion fails to match something (thereby succeed-
-       ing),  but  only  if the negative lookahead assertion contains just one
-       branch.
+        3.  Capturing subpatterns that occur inside negative lookahead
+        assertions are counted, but their entries in the offsets vector are
+        never set. Perl sets its numerical variables from any such patterns
+        that are matched before the assertion fails to match something
+        (thereby succeeding), but only if the negative lookahead assertion
+        contains just one branch.
 
-       4. Though binary zero characters are supported in the  subject  string,
-       they are not allowed in a pattern string because it is passed as a nor-
-       mal C string, terminated by zero. The escape sequence "\0" can be  used
-       in the pattern to represent a binary zero.
+        4.  Though binary zero characters are supported in the subject string,
+        they are not allowed in a pattern string because it is passed as a
+        normal C string, terminated by zero. The escape sequence "\0" can be
+        used in the pattern to represent a binary zero.
 
-       5.  The  following Perl escape sequences are not supported: \l, \u, \L,
-       \U, \P, \p, \N, and \X. In fact these are implemented by Perl's general
-       string-handling and are not part of its pattern matching engine. If any
-       of these are encountered by PCRE, an error is generated.
+        5.  The following Perl escape sequences are not supported: \l, \u, \L,
+        \U, \P, \p, \N, and \X. In fact these are implemented by Perl's general
+        string-handling and are not part of its pattern matching engine. If any
+        of these are encountered by PCRE, an error is generated.
 
-       6. PCRE does support the \Q...\E escape for quoting substrings. Charac-
-       ters  in  between  are  treated as literals. This is slightly different
-       from Perl in that $ and @ are  also  handled  as  literals  inside  the
-       quotes.  In Perl, they cause variable interpolation (but of course PCRE
-       does not have variables). Note the following examples:
+        6.  PCRE does support the \Q...\E escape for quoting substrings.
+        Characters in between are treated as literals. This is slightly
+        different from Perl in that $ and @ are also handled as literals inside
+        the quotes. In Perl, they cause variable interpolation (but of course
+        PCRE does not have variables). Note the following examples:
 
            Pattern            PCRE matches      Perl matches
 
@@ -1328,61 +1291,59 @@
            \Qabc\$xyz\E       abc\$xyz          abc\$xyz
            \Qabc\E\$\Qxyz\E   abc$xyz           abc$xyz
 
-       The \Q...\E sequence is recognized both inside  and  outside  character
-       classes.
+        The \Q...\E sequence is recognized both inside and outside character
+        classes.
 
-       7. Fairly obviously, PCRE does not support the (?{code}) and (?p{code})
-       constructions. However, there is some experimental support  for  recur-
-       sive  patterns  using the non-Perl items (?R), (?number) and (?P>name).
-       Also, the PCRE "callout" feature allows  an  external  function  to  be
-       called during pattern matching.
+        7.  Fairly obviously, PCRE does not support the (?{code}) and
+        (?p{code}) constructions. However, there is some experimental support
+        for recursive patterns using the non-Perl items (?R), (?number) and
+        (?P>name). Also, the PCRE "callout" feature allows an external function
+        to be called during pattern matching.
 
-       8.  There  are some differences that are concerned with the settings of
-       captured strings when part of  a  pattern  is  repeated.  For  example,
-       matching  "aba"  against  the  pattern  /^(a(b)?)+$/  in Perl leaves $2
-       unset, but in PCRE it is set to "b".
+        8.  There are some differences that are concerned with the settings of
+        captured strings when part of a pattern is repeated. For example,
+        matching "aba" against the pattern /^(a(b)?)+$/ in Perl leaves $2
+        unset, but in PCRE it is set to "b".
 
-       9. PCRE  provides  some  extensions  to  the  Perl  regular  expression
-       facilities:
+        9.  PCRE provides some extensions to the Perl regular expression
+        facilities:
 
-       (a)  Although  lookbehind  assertions  must match fixed length strings,
-       each alternative branch of a lookbehind assertion can match a different
-       length of string. Perl requires them all to have the same length.
+        (a)  Although lookbehind assertions must match fixed length strings,
+        each alternative branch of a lookbehind assertion can match a different
+        length of string. Perl requires them all to have the same length.
 
-       (b)  If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $
-       meta-character matches only at the very end of the string.
+        (b)  If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $
+        meta-character matches only at the very end of the string.
 
-       (c) If PCRE_EXTRA is set, a backslash followed by a letter with no spe-
-       cial meaning is faulted.
+        (c) If PCRE_EXTRA is set, a backslash followed by a letter with no
+        special meaning is faulted.
 
-       (d)  If  PCRE_UNGREEDY is set, the greediness of the repetition quanti-
-       fiers is inverted, that is, by default they are not greedy, but if fol-
-       lowed by a question mark they are.
+        (d)  If PCRE_UNGREEDY is set, the greediness of the repetition
+        quantifiers is inverted, that is, by default they are not greedy, but
+        if followed by a question mark they are.
 
-       (e)  PCRE_ANCHORED  can  be used to force a pattern to be tried only at
-       the first matching position in the subject string.
+        (e)  PCRE_ANCHORED can be used to force a pattern to be tried only at
+        the first matching position in the subject string.
 
-       (f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, and  PCRE_NO_AUTO_CAP-
-       TURE options for pcre_exec() have no Perl equivalents.
+        (f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, and
+        PCRE_NO_AUTO_CAPTURE options for pcre_exec() have no Perl equivalents.
 
-       (g)  The (?R), (?number), and (?P>name) constructs allows for recursive
-       pattern matching (Perl can do  this  using  the  (?p{code})  construct,
-       which PCRE cannot support.)
+        (g)  The (?R), (?number), and (?P>name) constructs allows for recursive
+        pattern matching (Perl can do this using the (?p{code}) construct,
+        which PCRE cannot support.)
 
-       (h)  PCRE supports named capturing substrings, using the Python syntax.
+        (h)  PCRE supports named capturing substrings, using the Python syntax.
 
-       (i) PCRE supports the possessive quantifier  "++"  syntax,  taken  from
-       Sun's Java package.
+        (i)  PCRE supports the possessive quantifier "++" syntax, taken from
+        Sun's Java package.
 
-       (j) The (R) condition, for testing recursion, is a PCRE extension.
+        (j)  The (R) condition, for testing recursion, is a PCRE extension.
 
-       (k) The callout facility is PCRE-specific.
-
-
+        (k)  The callout facility is PCRE-specific.
 
 NOTES
         The \< and \> metacharacters from Henry Spencers package
-        are not available in PCRE, but can be emulate with \b,
+        are not available in PCRE, but can be emulated with \b,
         as required, also in conjunction with \W or \w.
 
         In LDMud, backtracks are limited by the EVAL_COST runtime
@@ -1391,25 +1352,23 @@
 
         LDMud doesn't support PCRE callouts.
 
-
 LIMITATIONS
         There are some size limitations in PCRE but it is hoped that
-        they will never in practice be relevant.  The maximum length
-        of a compiled pattern is 65539 (sic) bytes.  All  values  in
-        repeating  quantifiers  must be less than 65536.  There max-
-        imum number of capturing subpatterns is 65535.  There is  no
-        limit  to  the  number of non-capturing subpatterns, but the
-        maximum depth of nesting of all kinds of parenthesized  sub-
-        pattern,  including  capturing  subpatterns, assertions, and
-        other types of subpattern, is 200.
+        they will never in practice be relevant. The maximum length
+        of a compiled pattern is 65539 (sic) bytes. All values in
+        repeating quantifiers must be less than 65536. There
+        maximum number of capturing subpatterns is 65535. There is no
+        limit to the number of non-capturing subpatterns, but the
+        maximum depth of nesting of all kinds of parenthesized
+        subpattern, including capturing subpatterns, assertions,
+        and other types of subpattern, is 200.
 
-        The maximum length of a subject string is the largest  posi-
-        tive number that an integer variable can hold. However, PCRE
-        uses recursion to handle subpatterns and indefinite  repeti-
-        tion.  This  means  that the available stack space may limit
-        the size of a subject string that can be processed  by  cer-
-        tain patterns.
-
+        The maximum length of a subject string is the largest
+        positive number that an integer variable can hold. However,
+        PCRE uses recursion to handle subpatterns and indefinite
+        repetition. This means that the available stack space may
+        limit the size of a subject string that can be processed by
+        certain patterns.
 
 AUTHOR
         Philip Hazel <ph10@cam.ac.uk>
commit	7ea4a03dfec8fd917fa08d5e21c0519a02a8e7a2	[log] [tgz]
author	Zesstra <zesstra@zesstra.de>	Tue Nov 26 20:11:40 2019 +0100
committer	MG Mud User <mud@mg.mud.de>	Tue Nov 26 20:11:40 2019 +0100
tree	0f2606527a1910c08f1d19a8f8f1c192943cd418
parent	c70bf58803d2d854a1bd7e255031a562db0d94e6 [diff] [blame]