This section describes the syntax that should be used to construct regular expressions for nete:rule elements. A nete:xprcond element takes the following form:
<nete:xprcond>
<nete:xpr>
<nete:rule>regular_expression</nete:rule> <nete:result>result</nete:result>
</nete:xpr> <nete:xpr-default>forward_destination</nete:xpr-default>
</nete:xprcond>
In the nete:xpr element, the nete:rule element must consist of a regular expression that uses the syntax described in the following table. This syntax is consistent with the regular expression syntax supported by Apache and described at http://www.apache.org.
Characters |
Results |
---|---|
unicode character |
Matches any identical unicode character |
\ |
Used to quote a meta-character like '*') |
\\ |
Matches a single ’\’ character |
\0nnn |
Matches a given octal character |
\xhh |
Matches a given 8-bit hexadecimal character |
\\uhhhh |
Matches a given 16-bit hexadecimal character |
\t |
Matches an ASCII tab character |
\n |
Matches an ASCII newline character |
\r |
Matches an ASCII return character |
\f |
Matches an ASCII form feed character |
[abc] |
Simple character class |
[a-zA-Z] |
Character class with ranges |
[^abc] |
Negated character class |
[:alnum:] |
Alphanumeric characters |
[:alpha:] |
Alphabetic characters |
[:blank:] |
Space and tab characters |
[:cntrl:] |
Control characters |
[:digit:] |
Numeric characters |
[:graph:] |
Characters that are printable and are also visible (A space is printable, but not visible, while an ‘a’ is both) |
[:lower:] |
Lower-case alphabetic characters |
[:print:] |
Printable characters (characters that are not control characters) |
[:punct:] |
Punctuation characters (characters that are not letter, digits, control characters, or space characters) |
[:space:] |
Space characters (such as space, tab, and formfeed) |
[:upper:] |
Upper-case alphabetic characters |
[:xdigit:] |
Characters that are hexadecimal digits |
[:javastart:] |
Start of a Java identifier |
[:javapart:] |
Part of a Java identifier |
. |
Matches any character other than newline |
\w |
Matches a "word" character (alphanumeric plus "_") |
\W |
Matches a non-word character |
\s |
Matches a whitespace character |
\S |
Matches a non-whitespace character |
\d |
Matches a digit character |
\D |
Matches a non-digit character |
^ |
Matches only at the beginning of a line |
$ |
Matches only at the end of a line |
\b |
Matches only at a word boundary |
\B |
Matches only at a non-word boundary |
A* |
Matches A 0 or more times (greedy) |
A+ |
Matches A 1 or more times (greedy) |
A? |
Matches A 1 or 0 times (greedy) |
A{n} |
Matches A exactly n times (greedy) |
A{n,} |
Matches A at least n times (greedy) |
A{n,m} |
Matches A at least n but not more than m times (greedy) |
A*? |
Matches A 0 or more times (reluctant) |
A+? |
Matches A 1 or more times (reluctant) |
A?? |
Matches A 0 or 1 times (reluctant) |
AB |
Matches A followed by B |
A|B |
Matches either A or B |
(A) |
Used for subexpression grouping |
\1 |
Backreference to 1st parenthesized subexpression |
\n |
Backreference to nth parenthesized subexpression |
All closure operators (+, *, ?, {m,n}) are greedy by default, meaning that they match as many elements of the string as possible without causing the overall match to fail. If you want a closure to be reluctant (non-greedy), you can simply follow it with a ’?’. A reluctant closure will match as few elements of the string as possible when finding matches. {m,n} closures don’t currently support reluctancy.
Copyright © 2011 CA. All rights reserved. | Email CA Technologies about this topic |