^ | Beginning of a string |
$ | End of a string |
. | Match any character (except newline) when used outside of a character set |
* | Match 0 or more times |
+ | Match 1 or more times |
? | Match 0 or 1 times |
| | Alternation |
( ) | Grouping; "storing" |
[ ] | Character set |
{ } | Repetition modifier |
\ | Quote or special |
Note | To present a metacharacter as a data character standing for itself, precede it with a backslash '\'.
For example, \. will match the full stop character '.' |
---|
Repetitions
a* | 0 or more 'a' |
a+ | 1 or more 'a' |
a? | 0 or 1 'a' (i.e., optional 'a') |
a{m} | exactly m 'a' |
a{m, } | at least m 'a' |
a{m, n} | at least m but at most n 'a' |
expr? | shortest match taken from the repetition expression expr (i.e., a*?) |
Note | By default it will find the longest match. Use the expr? to find the shortest match. |
---|
Single characters
\t | tab |
\n | newline |
\r | return (CR) |
\xhh | character with hexadecimal code hh |
Zero-width assertions
\b | "word" boundary |
\B | not a "word" boundary |
Matching
\w | matches any single character classified as a "word" character (alphanumeric or '_') |
\W | matches any non-"word" character |
\s | matches any whitespace character (space, tab, newline) |
\S | matches any non-whitespace character |
\d | matches any digit character (equivalent to [0-9]) |
\D | matches any non-digit character |
Character sets
[characters] | Matches any of the characters in the sequence |
[x-y] | Matches any of the characters from x to y (inclusively) |
[\-] | Matches the hyphen character '-' |
[\n] | Matches the newline character |
[^expr] | Matches any characters except those specified in the expression expr |
Note | Other single character denotations with the backslash '\' apply normally too. |
---|
Character and equivalence classes
Character and equivalence classes can only be used in Character sets and use the following syntax:
[:class:] | Will match any characters from the specified class Where class can be any of the character class names |
[=name=] | Will match the base character from the specified character or symbolic name. Where name can be any character or it's symbolic name |
Note | A base character is a character that ignores case, any accents or unique regional tailoring. |
---|
Example | [[:digit:]] will match any digit character. [[=a=]] will match a, à, á, â, ã, ä, å, A, À, Á, Â, Ã, Ä and Å. |
---|
Operator precedence
1 | [==] [::] | Collation-related bracket symbols |
2 | \ | Escaped characters |
3 | [ ] | Character set (bracket expression) |
4 | ( ) | Grouping |
5 | * + ? {m,n} | Repetitions |
6 | | Concatenation |
7 | ^ $ | Anchoring |
8 | | | Alternation |