Java Controlling Case in Regular Expressions

Java Controlling Case in Regular Expressions


You want to find text regardless of case.


Compile the Pattern passing in the flags argument Pattern.CASE_INSENSITIVE to
indicate that matching should be case-independent (“fold” or ignore differences in
case). If your code might run in different locales (see Chapter 15), add Pattern.
UNICODE_CASE . Without these flags, the default is normal, case-sensitive matching
behavior. This flag (and others) are passed to the Pattern.compile() method, as in:

Pattern reCaseInsens = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE |
// will match case-insensitively

This flag must be passed when you create the Pattern ; as Pattern objects are immu- table, they cannot be changed once constructed. The full source code for this example is online as

Pattern.compile( ) Flags 

Half a dozen flags can be passed as the second argument to Pattern.compile( ) . If more than one value is needed, they can be or’d together using the | bitwise or operator. In alphabetical order, the flags are:


Enables so-called “canonical equivalence,” that is, characters are matched by their base character, so that the character e followed by the “combining character mark” for the acute accent ( ́ ) can be matched either by the composite character é or the letter e followed by the character mark for the accent (see Recipe 4.8). 


Turns on case-insensitive matching (see Recipe 4.7). 


Causes whitespace and comments (from # to end-of-line) to be ignored in the pattern. 


Allows dot ( . ) to match any regular character or the newline, not just newline (see Recipe 4.9). 


Specifies multiline mode (see Recipe 4.9). 


Enables Unicode-aware case folding (see Recipe 4.7). 


Makes \n the only valid “newline” sequence for MULTILINE mode (see Recipe 4.9).


Post a Comment