Previous Topic: How nete:xprcond Elements Works

Next Topic: Regular Expression Examples in nete:rule and nete:result

Regular Expression Syntax

This section describes the syntax that should be used to construct regular expressions for nete:rule elements. A nete:xprcond element takes the following form:

<nete:xprcond>
<nete:xpr>
<nete:rule>regular_expression</nete:rule>
<nete:result>result</nete:result>
</nete:xpr>
<nete:xpr-default>forward_destination</nete:xpr-default>
</nete:xprcond>

In the nete:xpr element, the nete:rule element must consist of a regular expression that uses the syntax described in the following table. This syntax is consistent with the regular expression syntax supported by Apache and described at http://www.apache.org.

Characters

Results

unicode character

Matches any identical unicode character

\

Used to quote a meta-character like '*')

\\

Matches a single ’\’ character

\0nnn

Matches a given octal character

\xhh

Matches a given 8-bit hexadecimal character

\\uhhhh

Matches a given 16-bit hexadecimal character

\t

Matches an ASCII tab character

\n

Matches an ASCII newline character

\r

Matches an ASCII return character

\f

Matches an ASCII form feed character

[abc]

Simple character class

[a-zA-Z]

Character class with ranges

[^abc]

Negated character class

[:alnum:]

Alphanumeric characters

[:alpha:]

Alphabetic characters

[:blank:]

Space and tab characters

[:cntrl:]

Control characters

[:digit:]

Numeric characters

[:graph:]

Characters that are printable and are also visible (A space is printable, but not visible, while an ‘a’ is both)

[:lower:]

Lower-case alphabetic characters

[:print:]

Printable characters (characters that are not control characters)

[:punct:]

Punctuation characters (characters that are not letter, digits, control characters, or space characters)

[:space:]

Space characters (such as space, tab, and formfeed)

[:upper:]

Upper-case alphabetic characters

[:xdigit:]

Characters that are hexadecimal digits

[:javastart:]

Start of a Java identifier

[:javapart:]

Part of a Java identifier

.

Matches any character other than newline

\w

Matches a "word" character (alphanumeric plus "_")

\W

Matches a non-word character

\s

Matches a whitespace character

\S

Matches a non-whitespace character

\d

Matches a digit character

\D

Matches a non-digit character

^

Matches only at the beginning of a line

$

Matches only at the end of a line

\b

Matches only at a word boundary

\B

Matches only at a non-word boundary

A*

Matches A 0 or more times (greedy)

A+

Matches A 1 or more times (greedy)

A?

Matches A 1 or 0 times (greedy)

A{n}

Matches A exactly n times (greedy)

A{n,}

Matches A at least n times (greedy)

A{n,m}

Matches A at least n but not more than m times (greedy)

A*?

Matches A 0 or more times (reluctant)

A+?

Matches A 1 or more times (reluctant)

A??

Matches A 0 or 1 times (reluctant)

AB

Matches A followed by B

A|B

Matches either A or B

(A)

Used for subexpression grouping

\1

Backreference to 1st parenthesized subexpression

\n

Backreference to nth parenthesized subexpression

All closure operators (+, *, ?, {m,n}) are greedy by default, meaning that they match as many elements of the string as possible without causing the overall match to fail. If you want a closure to be reluctant (non-greedy), you can simply follow it with a ’?’. A reluctant closure will match as few elements of the string as possible when finding matches. {m,n} closures don’t currently support reluctancy.