Java Using regexes in Java: Test for a Pattern

Java Using regexes in Java: Test for a Pattern


Problem

You’re ready to get started using regular expression processing to beef up your Java
code by testing to see if a given pattern can match in a given string.

Solution

Use the Java Regular Expressions Package, java.util.regex.

Explained

The good news is that the Java API for regexes is actually easy to use. If all you need
is to find out whether a given regex matches a string, you can use the convenient
boolean matches( ) method of the String class, which accepts a regex pattern in
String form as its argument:

if (inputString.matches(stringRegexPattern)) {
// it matched... do something with it...
}

This is, however, a convenience routine, and convenience always comes at a price. If the regex is going to be used more than once or twice in a program, it is more effi- cient to construct and use a Pattern and its Matcher (s). A complete program con- structing a Pattern and using it to match is shown here:

import java.util.regex.*;
/**
* Simple example of using regex class.
*/
public class RESimple {
public static void main(String[] argv) throws PatternSyntaxException {
String pattern = "^Q[^u]\\d+\\.";
String input = "QA777. is the next flight. It is on time.";
Pattern p = Pattern.compile(pattern);
boolean found = p.matcher(input).lookingAt( );
System.out.println("'" + pattern + "'" +
(found ? " matches '" : " doesn't match '") + input + "'");
}
}

The java.util.regex package consists of two classes, Pattern and Matcher , which provide the public API.

Regex public API
/** The main public API of the java.util.regex package.
* Prepared by javap and Ian Darwin.
*/
package java.util.regex;
public final class Pattern {
// Flags values ('or' together)
public static final int
UNIX_LINES, CASE_INSENSITIVE, COMMENTS, MULTILINE,
DOTALL, UNICODE_CASE, CANON_EQ;
// Factory methods (no public constructors)
public static Pattern compile(String patt);
public static Pattern compile(String patt, int flags);
// Method to get a Matcher for this Pattern
public Matcher matcher(CharSequence input);
// Information methods
public String pattern( );
public int flags( );
// Convenience methods
public static boolean matches(String pattern, CharSequence input);
public String[] split(CharSequence input);
public String[] split(CharSequence input, int max);
}
public final class Matcher {
// Action: find or match methods
public boolean matches( );
public boolean find( );
public boolean find(int start);
public boolean lookingAt( );
// "Information about the previous match" methods
public int start( );
public int start(int whichGroup);
public int end( );
public int end(int whichGroup);
public int groupCount( );
public String group( );
public String group(int whichGroup);
// Reset methods
public Matcher reset( );
public Matcher reset(CharSequence newInput);
// Replacement methods
public Matcher appendReplacement(StringBuffer where, String newText);
public StringBuffer appendTail(StringBuffer where);
public String replaceAll(String newText);
public String replaceFirst(String newText);
// information methods
public Pattern pattern( );
}
/* String, showing only the regex-related methods */
public final class String {
public boolean matches(String regex);
public String replaceFirst(String regex, String newStr);
public String replaceAll(String regex, String newStr)
public String[] split(String regex)
public String[] split(String regex, int max);
}

This API is large enough to require some explanation. The normal steps for regex matching in a production program are: 1. Create a Pattern by calling the static method Pattern.compile( ) . 2. Request a Matcher from the pattern by calling pattern.matcher(CharSequence) for each String (or other CharSequence ) you wish to look through. 3. Call (once or more) one of the finder methods (discussed later in this section) in the resulting Matcher . The CharSequence interface, added to java.lang with JDK 1.4, provides simple read- only access to objects containing a collection of characters. The standard implemen- tations are String and StringBuffer (described in Chapter 3), and the “new I/O” class java.nio.CharBuffer . Of course, you can perform regex matching in other ways, such as using the conve- nience methods in Pattern or even in java.lang.String . For example:

// StringConvenience.java -- show String convenience routine for "match"
String pattern = ".*Q[^u]\\d+\\..*";
String line = "Order QT300. Now!";
if (line.matches(pattern)) {
System.out.println(line + " matches \"" + pattern + "\"");
} else {
System.out.println("NO MATCH");
}

But the three-step list just described is the “standard” pattern for matching. You’d likely use the String convenience routine in a program that only used the regex once; if the regex were being used more than once, it is worth taking the time to “compile” it, since the compiled version runs faster. As well, the Matcher has several finder methods, which provide more flexibility than the String convenience routine match( ) . The Matcher methods are: match( ) Used to compare the entire string against the pattern; this is the same as the rou- tine in java.lang.String .

Since it matches the entire String , I had to put .* before and after the pattern. lookingAt( ) Used to match the pattern only at the beginning of the string. find( ) Used to match the pattern in the string (not necessarily at the first character of the string), starting at the beginning of the string or, if the method was previ- ously called and succeeded, at the first character not matched by the previous match. Each of these methods returns boolean , with true meaning a match and false mean- ing no match. To check whether a given string matches a given pattern, you need only type something like the following:

Matcher m = Pattern.compile(patt).matcher(line);
if (m.find( )) {
System.out.println(line + " matches " + patt)
}

But you may also want to extract the text that matched, which is the subject of the next recipe. The following recipes cover uses of this API. Initially, the examples just use argu- ments of type String as the input source.

0 comments:

Post a Comment