Regular Expression Syntax

Administration Guide › Configuring Proxy Rules › Regular Expression Syntax

Regular Expression Syntax

This section describes the syntax that should be used to construct regular expressions for nete:rule elements. A nete:xprcond element takes the following form:

<nete:xprcond>

<nete:xpr>

<nete:rule>regular_expression</nete:rule>
<nete:result>result</nete:result>

</nete:xpr>
<nete:xpr-default>forward_destination</nete:xpr-default>

</nete:xprcond>

In the nete:xpr element, the nete:rule element must consist of a regular expression that uses the syntax described in the following table. This syntax is consistent with the regular expression syntax supported by Apache and described at http://www.apache.org.

Characters	Results
unicode character	Matches any identical unicode character
\	Used to quote a meta-character like '*')
\\	Matches a single ’\’ character
\0nnn	Matches a given octal character
\xhh	Matches a given 8-bit hexadecimal character
\\uhhhh	Matches a given 16-bit hexadecimal character
\t	Matches an ASCII tab character
\n	Matches an ASCII newline character
\r	Matches an ASCII return character
\f	Matches an ASCII form feed character
[abc]	Simple character class
[a-zA-Z]	Character class with ranges
[^abc]	Negated character class
[:alnum:]	Alphanumeric characters
[:alpha:]	Alphabetic characters
[:blank:]	Space and tab characters
[:cntrl:]	Control characters
[:digit:]	Numeric characters
[:graph:]	Characters that are printable and are also visible (A space is printable, but not visible, while an ‘a’ is both)
[:lower:]	Lower-case alphabetic characters
[:print:]	Printable characters (characters that are not control characters)
[:punct:]	Punctuation characters (characters that are not letter, digits, control characters, or space characters)
[:space:]	Space characters (such as space, tab, and formfeed)
[:upper:]	Upper-case alphabetic characters
[:xdigit:]	Characters that are hexadecimal digits
[:javastart:]	Start of a Java identifier
[:javapart:]	Part of a Java identifier
.	Matches any character other than newline
\w	Matches a "word" character (alphanumeric plus "_")
\W	Matches a non-word character
\s	Matches a whitespace character
\S	Matches a non-whitespace character
\d	Matches a digit character
\D	Matches a non-digit character
^	Matches only at the beginning of a line
$	Matches only at the end of a line
\b	Matches only at a word boundary
\B	Matches only at a non-word boundary
A*	Matches A 0 or more times (greedy)
A+	Matches A 1 or more times (greedy)
A?	Matches A 1 or 0 times (greedy)
A{n}	Matches A exactly n times (greedy)
A{n,}	Matches A at least n times (greedy)
A{n,m}	Matches A at least n but not more than m times (greedy)
A*?	Matches A 0 or more times (reluctant)
A+?	Matches A 1 or more times (reluctant)
A??	Matches A 0 or 1 times (reluctant)
AB	Matches A followed by B
A\|B	Matches either A or B
(A)	Used for subexpression grouping
\1	Backreference to 1st parenthesized subexpression
\n	Backreference to nth parenthesized subexpression

All closure operators (+, *, ?, {m,n}) are greedy by default, meaning that they match as many elements of the string as possible without causing the overall match to fail. If you want a closure to be reluctant (non-greedy), you can simply follow it with a ’?’. A reluctant closure will match as few elements of the string as possible when finding matches. {m,n} closures don’t currently support reluctancy.