A regular expression describes a set of strings. The simplest regexp is one
that has no special characters in it. For example, the regexp hello matches
hello and nothing else.
Non-trivial regular expressions use certain special constructs so that they
can match more than one string. For example, the regexp hello|word matches
either the string hello or the string word.
As a more complex example, the regexp B[an]*s matches any of the strings
Bananas, Baaaaas, Bs, and any other string starting with a B, ending with
an s, and containing any number of a or n characters in between.
A regular expression may use any of the following special characters/constructs:
^ Match the beginning of a string.
$ Match the end of a string.
. Match any character (including newline).
a* Match any sequence of zero or more a characters.
a+ Match any sequence of one or more a characters.
a? Match either zero or one a character.
de|abc Match either of the sequences de or abc.
(abc)* Match zero or more instances of the sequence abc.
a* Can be written as a{0,}
a+ Can be written as a{1,}
a? Can be written as a{0,1}.
To be more precise, an atom followed by a bound containing one integer i
and no comma matches a sequence of exactly i matches of the atom. An atom
followed by a bound containing one integer i and a comma matches a sequence
of i or more matches of the atom. An atom followed by a bound containing
two integers i and j matches a sequence of i through j (inclusive) matches
of the atom. Both arguments must be in the range from 0 to RE_DUP_MAX
(default 255), inclusive. If there are two arguments, the second must be
greater than or equal to the first.
[^a-dX]
Matches any character which is (or is not, if ^ is used) either a, b, c, d
or X. To include a literal ] character, it must immediately follow the
opening bracket [. To include a literal - character, it must be written
first or last. So [0-9] matches any decimal digit. Any character that does
not have a defined meaning inside a [] pair has no special meaning and
matches only itself.
[[.characters.]]
The sequence of characters of that collating element. The sequence is
a single element of the bracket expression's list. A bracket expression
containing a multi-character collating element can thus match more than
one character, for example, if the collating sequence includes a ch
collating element, then the regular expression [[.ch.]]*c matches the
first five characters of chchcc.
[=character_class=]
An equivalence class, standing for the sequences of characters of all
collating elements equivalent to that one, including itself. For example,
if o and (+) are the members of an equivalence class, then [[=o=]],
[[=(+)=]], and [o(+)] are all synonymous. An equivalence class may not
be an endpoint of a range.
[:character_class:]
Within a bracket expression, the name of a character class enclosed in
[: and :] stands for the list of all characters belonging to that class.
Standard character class names are:
alnum digit punct
alpha graph space
blank lower upper
cntrl print xdigit
[[:<:]]
[[:>:]]
These match the null string at the beginning and end of a word
respectively. A word is defined as a sequence of word characters
which is neither preceded nor followed by word characters. A word
character is an alnum character (as defined by ctype(3)) or an
underscore (_).