A regular expression is the set of rules that define a pattern. The rules for creating a regular expression have been borrowed from Perl. In Perl a pattern is expressed between the delimiters / ... /, however, in Java a regular expression can be string " ... ".
The following are meta characters in a regular expression.
. * + ? ^ $ [ ] ( ) { } | \To escape the special significance of these characters in a regular expression the back slash (\) is used.
The period . is a placeholder for a single character other than the new line character.
To denote a range the square brackets are used. For example [aeiou] will match any vowel character. The hyphen is used as a shorthand like [a-z] which represents any lower case letter. The caret (^) indicates not in the range. [^0-9] will match any character that is not a digit.
The following escape sequences denote pre-defined ranges:
\d | Any digit [0 - 9] |
\D | Not a digit [^0 - 9] |
\w | Any alphanumeric character and the underscore character |
\W | Not an alphanumeric character |
\s | White space, tab, or newline |
\S | Not a white space, tab, or newline character |
The backslash is an escape character in Java. For example \n denotes the new line character. Hence to use a backslash in the context of a regular expression you will have to use two backslashes, the first one to denote that the second backslash is used as a literal and not as a character in an escape sequence. For example, the regular expression for a single space in Java would be "\\s".
One can denote the position of a pattern within a string. The caret (^) denotes the beginning of the string. The pattern "^abc" will match a string starting with abc. The dollar sign ($) denotes the end of the string. The pattern "xyz$" will match a string that ends with xyz. \b denotes word boundary and \B denotes not a word boundary.
There are several ways to denote a recurring pattern:
The parenthesis () is used to group patterns. To indicate a choice of patterns we can write something like - /(abc)|(def)|(ghi)/. This will match a pattern with abc or def or ghi.
There are several methods in the String class that use regular expressions as input parameter. For a given String s here is how you would use a regular expression regex in these methods:
s.matches ("regex") | Returns true if the whole string matches the regular expression |
s.split ("regex") | Returns an array of substrings of s divided at every occurence of the regular expression. The regular expression is not included as a part of the substrings |
s.replaceAll ("regex", "replacement") | Replaces all substrings that match the regular expression with the given replacement string |
For more advanced pattern matching Java provides two classes Pattern and Matcher. Both are in the package java.util.regex. The regular expression is first specified as a String and then compiled as a Pattern object.
Pattern p = Pattern.compile ("a*b");Then a Matcher object is created for a target string. The Matcher object can match arbitrary character sequences against that regular expression.
Matcher m = p.matcher ("aaaaab"); boolean b = m.matches();The matches() method attempts to match the entire input sequence against the pattern. There are two other methods in Matcher class that you will find useful. The lookingAt() method attempts to match the input sequence, starting at the beginning, against the pattern. It returns a true if there is a substring of the target string that matches the pattern unlike the match() method where the whole string has to be matched. The find() method scans the target string looking for the next subsequence that matches the pattern and returns true if it finds a match.
Here are two online Regular Expression testers that you can use to try out your regular expression skills: