org.apache.xerces.impl.xpath.regex
Class RegularExpression

java.lang.Object
  |
  +--org.apache.xerces.impl.xpath.regex.RegularExpression
All Implemented Interfaces:
java.io.Serializable

public class RegularExpression
extends java.lang.Object
implements java.io.Serializable

A regular expression matching engine using Non-deterministic Finite Automaton (NFA). This engine does not conform to the POSIX regular expression.


How to use

A. Standard way
 RegularExpression re = new RegularExpression(regex);
 if (re.matches(text)) { ... }
 
B. Capturing groups
 RegularExpression re = new RegularExpression(regex);
 Match match = new Match();
 if (re.matches(text, match)) {
     ... // You can refer captured texts with methods of the Match class.
 }
 

Case-insensitive matching

 RegularExpression re = new RegularExpression(regex, "i");
 if (re.matches(text) >= 0) { ...}
 

Options

You can specify options to RegularExpression(regex, options) or setPattern(regex, options). This options parameter consists of the following characters.

"i"
This option indicates case-insensitive matching.
"m"
^ and $ consider the EOL characters within the text.
"s"
. matches any one character.
"u"
Redefines \d \D \w \W \s \S \b \B \< \> as becoming to Unicode.
"w"
By this option, \b \B \< \> are processed with the method of 'Unicode Regular Expression Guidelines' Revision 4. When "w" and "u" are specified at the same time, \b \B \< \> are processed for the "w" option.
","
The parser treats a comma in a character class as a range separator. [a,b] matches a or , or b without this option. [a,b] matches a or b with this option.
"X"
By this option, the engine confoms to XML Schema: Regular Expression. The match() method does not do subsring matching but entire string matching.

Syntax

Differences from the Perl 5 regular expression

  • There is 6-digit hexadecimal character representation (\vHHHHHH.)
  • Supports subtraction, union, and intersection operations for character classes.
  • Not supported: \ooo (Octal character representations), \G, \C, \lc, \ uc, \L, \U, \E, \Q, \N{name}, (?{code}), (??{code})

Meta characters are `. * + ? { [ ( ) | \ ^ $'.