rex
Extracts values from a text field using named capture groups in a regular expression. JDK and RE2/J regular expression engines are supported.
Command properties
| Item | Description |
|---|---|
| Command type | Processing query |
| Required permission | None |
| License usage | N/A |
| Parallel execution | Supported |
| Distributed execution | Not supported |
Syntax
Options
field=FIELD(required)- Name of the target field to which the regular expression is applied.
engine={jdk|re2j|jdk-re2j|jdk-re2j-lax}- Regular expression engine setting (default:
jdk-re2j-lax)
jdk: The default Java regular expression engine.re2j: Google's RE2/J regular expression engine.jdk-re2j: Uses the Java engine (jdk) but automatically switches to RE2/J if execution time becomes excessive. The regular expression must be valid in both the Java engine and RE2/J.jdk-re2j-lax: Uses the Java engine (jdk) but automatically switches to RE2/J if execution time becomes excessive. The regular expression must be valid in the Java engine. The switch occurs only if the regular expression is also valid in RE2/J.
debug={t|f}- Debug setting. When set to
t, outputs the engine type used for matching in the_enginefield. (default:f)
Target
"REGEX"(required)- Regular expression. Specify the fields to extract using named capture groups in the format
(?<field_name>pattern). The capture group name becomes the output field name.
Output fields
The input field specified by the field option and fields extracted from the named capture groups in the regular expression are output. If debug=t is set, the _engine field is also added.
| Field | Type | Description |
|---|---|---|
| (extracted) | string | Value matched by the named capture group ((?<name>...)) |
_engine | string | When debug=t, the name of the regular expression engine used (JDK, RE2/J) |
Error codes
Parse errors
| Error code | Message | Description |
|---|---|---|
| 20900 | The value of the field option is missing. | The field option is not specified |
| 20901 | There is an error in the regular expression you entered. Check the format again. | The regular expression syntax is incorrect |
| 20902 | There is an error in the regular expression engine setting. Check the value again. | The engine option value is incorrect |
| 20903 | There is an error in the regular expression you entered (JDK). Check the format again: [message] | JDK regular expression compilation failed |
| 20904 | There is an error in the regular expression you entered (RE2/J). Check the format again: [message] | RE2/J regular expression compilation failed |
Runtime errors
| Error code | Message | Description | Post-action |
|---|---|---|---|
| 20905 | The regular expression engine execution step limit has been reached: count=[count] limit=[limit] | The regular expression execution step limit has been reached | Query cancelled |
Description
The rex command extracts values from a text field using named capture groups ((?<name>...)) in a regular expression. The capture group name becomes the output field name.
The default engine (jdk-re2j-lax) uses the JDK regular expression engine but automatically switches to the RE2/J engine when execution time is excessive for certain patterns. If switching to RE2/J is not possible and the step limit is reached, the query is cancelled.
The JDK regular expression engine counts the number of execution steps while processing an input string. The default step limit is 300,000,000 steps. You can change this value using the system property araqne.logdb.regex.jdk_regex_step_limit.
Examples
-
Extracting a filename from an HTTP request
table duration=1h WEB_LOGS | rex field=line "(GET|POST) /game/flash/(?<filename>[^ ]*)"Extracts the filename from GET or POST requests in the
linefield into afilenamefield. -
Extracting a timestamp
table duration=1h APP_LOGS | rex field=line "(?<timestamp>\d+-\d+-\d+ \d+:\d+:\d+)"Extracts a timestamp in
yyyy-MM-dd HH:mm:ssformat from thelinefield. -
Extracting a URL and query string
table duration=1h WEB_LOGS | rex field=line "(GET|POST) (?<url>[^ ]*) (?<querystring>[^ ]*) "Extracts the URL and query string from the
linefield intourlandquerystringfields respectively. -
Checking the engine used in debug mode
table duration=1h WEB_LOGS | rex field=line debug=t "(?<method>GET|POST) (?<path>/[^ ]*)"Outputs
JDKorRE2/Jin the_enginefield, showing which regular expression engine was actually used.