Regular Expressions

The following is taken from pages 39-41 of the user's manual for ListProc, the list management software supported by CREN, the Corporation for Research and Educational Networking. It describes ListProc's concept of a regular expression.


A regular expression is a group of symbols which describe a unique string of characters. An example of a simple regular expression is the word "donkey". In a group of words, the regular expression "donkey" matches only other instances of the word "donkey" and nothing else. So if we had a text file and did a search for "donkey" every time that word appeared in the text it would show up in our search. This definition can be expanded if we define the period "." as a wild card replacement for a single character. Then the regular expression ".onkey" would include "honkey" or "tonkey" or "bonkey" as well as "donkey". We can continue to expand on the definition by adding the asterisk to the period " .* " as a substitute for any number of characters. Then the regular expression "don.* " will not only include "donkey" but will include all strings of characters beginning with "don" including "don" itself. A search of a text file for "don.* " will turn up dongle, donkey, donner, dondoodit, dondiddle, etc. Regular expression matching, therefore, puts together a whole set of rules that allow you to test whether a string fits into a specific syntactic shape. You can also search a string for a substring that fits a pattern, and just as importantly, you can replace one string with another.

Regular expressions have a syntax in which a few characters are special constructs and the rest are "ordinary". An ordinary character is a simple regular expression which matches that character and nothing else. The special characters are `$', `^', `.', `*', `+', `?',`[', `]' and `\'. Any other character appearing in a regular expression is ordinary, unless a `\' precedes it.

For example, `f' is not a special character, so it is ordinary, and therefore `f' is a regular expression that matches the string `f' and no other string. (It does not match the string `ff'.) Likewise, `o' is a regular expression that matches only `o'.

Any two regular expressions A and B can be concatenated. The result is a regular expression which matches a string if A matches some amount of the beginning of that string and B matches the rest of the string.

As a simple example, we can concatenate the regular expressions `f' and `o' to get the regular expression `fo', which matches only the string `fo'. Still trivial.

The following are the characters and character sequences which have special meaning within regular expressions. Any character not mentioned here is not special; it stands for exactly itself for the purposes of searching and matching.