textfile
Retrieves data from a text file.
Command properties
| Property | Description |
|---|---|
| Command type | Driver query |
| Required permission | Local file read access |
| License usage | Counted |
| Parallel execution | Not supported |
| Distributed execution | Not supported |
Syntax
Options
offset=INT- Number of records to skip.
limit=INT- Maximum number of records to retrieve.
brex=REGEX- Begin regular expression. Determines the first line of a multi-line record. Lines before a line matching this regex are merged into a single record. If not specified, each line delimited by CR LF or LF is treated as a separate record.
erex=REGEX- End regular expression. Determines the last line of a multi-line record. When a line matching this regex is found, the record is finalized. If not specified, each line delimited by CR LF or LF is treated as a separate record.
df=STR- Date format. Specifies the format for parsing the timestamp extracted by the
dpoption. If not specified, the_timefield value is set to the time when the data was loaded. dp=REGEX- Date regular expression. Enter the regular expression used to extract the
_timefield. If not specified, the_timefield value is set to the time when the data was loaded. cs=STR- Character encoding of the text file (default:
utf-8).
Target
PATH- Text file path. You can use a wildcard (
*), and files with a.gzextension are automatically decompressed.
Output fields
| Field | Type | Description |
|---|---|---|
| line | string | Original text read from the text file |
| _file | string | Original file name |
| _time | timestamp | Timestamp extracted using the dp and df options |
Error codes
Parsing errors
| Error code | Message | Description |
|---|---|---|
| 10702 | - | No file path was specified |
| 10700 | - | The file does not exist or cannot be read |
| 10701 | - | The parent directory does not exist or cannot be read |
| 10703 | - | Access to the file was denied |
Description
The textfile command reads a text file line by line and returns each line as a record with the content stored in the line field. Using the brex (begin regex) or erex (end regex) options, you can recognize a single record that spans multiple lines.
When you use the dp and df options together, a timestamp is extracted from the text and assigned to the _time field.
Examples
-
Read a text file
textfile /opt/logpresso/data/sample.logRetrieves each line of the text file as the
linefield. -
Retrieve with timestamp extraction
textfile dp="^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})" df="yyyy-MM-dd HH:mm:ss" /var/log/syslogExtracts a timestamp from the beginning of each line and assigns it to the
_timefield. -
Recognize multi-line records
textfile brex="^\d{4}-\d{2}-\d{2}" /var/log/application.logTreats lines starting with a date pattern as the beginning of a record, grouping multi-line log entries into a single record.
-
Retrieve a file with EUC-KR encoding
textfile cs=euc-kr /opt/logpresso/data/legacy.logRetrieves a text file encoded in EUC-KR.