Regular Expressions
Regular expressions are also referred to as regex or regexp, and provide a concise and flexible means for matching strings of text, such as particular characters, words, or patterns of characters. The following examples illustrate a few specifications that could be expressed in a regular expression:
-
The sequence of characters 'car' in any context, such as 'car', 'cartoon', or 'bicarbonate
- The word 'car' when it appears as an isolated word
- The word 'car' when preceded by the word 'blue' or 'red
- A dollar sign immediately followed by one or more digits, and then optionally a period and exactly two more digits (for example '$10', or '$245.99')
|
Regular Expression |
Functional Description |
|---|---|
|
[\d] |
Match 1, 2, 3, 4, … \d represents a single digit |
|
\[[1-3]\] |
Match [1], or [2], or [3] |
|
[1-3] |
Represents a single digit between 1 and 3, inclusive |
|
\[ and \] |
Escaped characters so they represent [ and ] themselves rather than the class, as in the example [1-3] above |
|
^0 |
Match 0asdfsdfs BUT NOT dfsdfgfsdfg0 |
|
^ |
Anchors the pattern at the beginning of the string for a single 0 character |
Example
A regexp that matches a string containing integers in the range 0 to 99.-
Start with [0-9]{1} which means match a digit exactly once. This regexp alone will match integers in the range 0 to 9.
- To match one or two digits, increase the maximum number of occurrences so the regexp becomes [0-9]{1,2} meaning match a digit at least once and at most twice. However, this regexp as it stands will not match correctly. This regexp will match one or two digits within a string.
- To ensure that a match against the whole string you must use the anchor assertions. Use a ^ (caret) which when it is the first character in the regexp means that the regexp must match from the beginning of the string. You also need $ (dollar) which when it is the last character in the regexp means that the regexp must match until the end of the string. So now our regexp is ^[0-9]{1,2}$. Note that assertions, such as ^ and $, do not match any characters.
Examples of Regular Expressions and Exprected Results
|
Pattern |
Exprected Results |
|---|---|
|
rhd |
Matches strings containing "rhd" |
|
^rhd |
Matches strings starting with "rhd", like "rhd is the beginning of this string" |
|
\d\d\d\d |
Matches strings containing a series of 4 digits, like "1234" but not "x34x33x" |
|
^\d\d\d\D |
Matches strings starting with a series of 3 digits and a non-digit, like "123xaa" or "133fdffd" |
|
abc$ |
Matches strings which end with ‘c’, like "ironic" but not "interim" |
|
interim|ideal |
Matches strings which contain either "interim" or "ideal" |
|
gr[ea]y |
Using brackets allows matching against a list of characters so this matches "grey" or "gray" |
|
stringer[1-3] |
Using brackets allows matching against a range of characters so this matches "stringer1", "stringer2", or "stringer3" |
|
[xX] |
Allows matches against both lower and uppercase X, like "box" or "Xavier" |
|
RR[x1-3abc] |
With this bracketed pattern, matches will contain the string "RR" followed by one of several characters: x, 1, 2, 3, a, b, or c |
|
\^\$ |
Escaping a metacharacter with the backslash, matches the metacharacter itself so "xxx^$xxxx" would match |
|
\[\d\] |
Escaped square brackets, match a single digit within square brackets including: "[1]", "[6]", "[[9]]" |
|
P[^V]T |
Matches a three character string, but the second character is anything but a ‘V’. "PxT", "P2T", and "PTT" all match. |
|
P.T |
This expression uses the wildcard dot. It will match any single character between the ‘P’ and ‘T’, so valid matches include "PVT", "P@T", "P$T". |
|
P.*T |
Uses the star to modify the wildcard dot. Now the pattern will match any number of characters between a ‘P’ and ‘T’, so valid matches include "PVT", "PT", and"PxxxxxxxxxxT". |
|
(dog) |
Grouping allows applying the other metacharacters to a pattern, rather than just a single character. Without a modifying metacharacter, the round brackets are ignored, so strings containing "dog" will match. |
|
11(34)?22 |
Grouping in combination with the question mark, makes the contents of the group optional. In this case, valid matches include "1122" and "113422". |
|
1+xx |
The use of the plus sign matches one or more of the preceding character. In this case, "1xx" and "111111xx" are valid matches. |
|
1(xy)+ |
The plus sign may be used on a group like (xy) here. In this case, "1xy", "4xyxyxyxyxy", and "3xyxy" will all match. |