Regular Expressions

Regular expressions are also referred to as regex or regexp, and provide a concise and flexible means for matching strings of text, such as particular characters, words, or patterns of characters. The following examples illustrate a few specifications that could be expressed in a regular expression:

  • The sequence of characters 'car' in any context, such as 'car', 'cartoon', or 'bicarbonate

  • The word 'car' when it appears as an isolated word
  • The word 'car' when preceded by the word 'blue' or 'red
  • A dollar sign immediately followed by one or more digits, and then optionally a period and exactly two more digits (for example '$10', or '$245.99')
You can use Regular Expressions (Regexps) to perform complicated text queries. The following are examples of regular expressions:

Regular Expression

Functional Description

[\d]

Match 1, 2, 3, 4, …

\d represents a single digit

\[[1-3]\]

Match [1], or [2], or [3]

[1-3]

Represents a single digit between 1 and 3, inclusive

\[ and \]

Escaped characters so they represent [ and ] themselves rather than the class, as in the example [1-3] above

^0

Match 0asdfsdfs BUT NOT dfsdfgfsdfg0

^

Anchors the pattern at the beginning of the string for a single 0 character

Regular Expressions are built up from expressions, quantifiers, and assertions. The simplest form of an expression is simply a character; for example, 'x' or '5'. An expression can also be a set of characters. For example, the expression '[ABCD]', will match an 'A' or a 'B' or a 'C' or a 'D'. As shorthand, this could be written as '[A-D]'. If you wish to match any of the capital letters in the English alphabet, it can be written as '[A-Z]'. A quantifier informs the regexp engine of the number of occurrences of the expression that are required; for example, 'x{1,2}' - match an 'x' which occurs at least once, and at most twice.

Example

A regexp that matches a string containing integers in the range 0 to 99.
  1. Start with [0-9]{1} which means match a digit exactly once. This regexp alone will match integers in the range 0 to 9.

  2. To match one or two digits, increase the maximum number of occurrences so the regexp becomes [0-9]{1,2} meaning match a digit at least once and at most twice. However, this regexp as it stands will not match correctly. This regexp will match one or two digits within a string.
  3. To ensure that a match against the whole string you must use the anchor assertions. Use a ^ (caret) which when it is the first character in the regexp means that the regexp must match from the beginning of the string. You also need $ (dollar) which when it is the last character in the regexp means that the regexp must match until the end of the string. So now our regexp is ^[0-9]{1,2}$. Note that assertions, such as ^ and $, do not match any characters.

Examples of Regular Expressions and Exprected Results

Pattern

Exprected Results

rhd

Matches strings containing "rhd"

^rhd

Matches strings starting with "rhd", like "rhd is the beginning of this string"

\d\d\d\d

Matches strings containing a series of 4 digits, like "1234" but not "x34x33x"

^\d\d\d\D

Matches strings starting with a series of 3 digits and a non-digit, like "123xaa" or "133fdffd"

abc$

Matches strings which end with ‘c’, like "ironic" but not "interim"

interim|ideal

Matches strings which contain either "interim" or "ideal"

gr[ea]y

Using brackets allows matching against a list of characters so this matches "grey" or "gray"

stringer[1-3]

Using brackets allows matching against a range of characters so this matches "stringer1", "stringer2", or "stringer3"

[xX]

Allows matches against both lower and uppercase X, like "box" or "Xavier"

RR[x1-3abc]

With this bracketed pattern, matches will contain the string "RR" followed by one of several characters: x, 1, 2, 3, a, b, or c

\^\$

Escaping a metacharacter with the backslash, matches the metacharacter itself so "xxx^$xxxx" would match

\[\d\]

Escaped square brackets, match a single digit within square brackets including: "[1]", "[6]", "[[9]]"

P[^V]T

Matches a three character string, but the second character is anything but a ‘V’. "PxT", "P2T", and "PTT" all match.

P.T

This expression uses the wildcard dot. It will match any single character between the ‘P’ and ‘T’, so valid matches include "PVT", "P@T", "P$T".

P.*T

Uses the star to modify the wildcard dot. Now the pattern will match any number of characters between a ‘P’ and ‘T’, so valid matches include "PVT", "PT", and"PxxxxxxxxxxT".

(dog)

Grouping allows applying the other metacharacters to a pattern, rather than just a single character. Without a modifying metacharacter, the round brackets are ignored, so strings containing "dog" will match.

11(34)?22

Grouping in combination with the question mark, makes the contents of the group optional. In this case, valid matches include "1122" and "113422".

1+xx

The use of the plus sign matches one or more of the preceding character. In this case, "1xx" and "111111xx" are valid matches.

1(xy)+

The plus sign may be used on a group like (xy) here. In this case, "1xy", "4xyxyxyxyxy", and "3xyxy" will all match.