textfile

Retrieves data from a text file.

Command properties

PropertyDescription
Command typeDriver query
Required permissionLocal file read access
License usageCounted
Parallel executionNot supported
Distributed executionNot supported

Syntax

textfile [offset=INT] [limit=INT] [brex=REGEX] [erex=REGEX] [df=STR] [dp=REGEX] [cs=STR] PATH

Options

offset=INT
Number of records to skip.
limit=INT
Maximum number of records to retrieve.
brex=REGEX
Begin regular expression. Determines the first line of a multi-line record. Lines before a line matching this regex are merged into a single record. If not specified, each line delimited by CR LF or LF is treated as a separate record.
erex=REGEX
End regular expression. Determines the last line of a multi-line record. When a line matching this regex is found, the record is finalized. If not specified, each line delimited by CR LF or LF is treated as a separate record.
df=STR
Date format. Specifies the format for parsing the timestamp extracted by the dp option. If not specified, the _time field value is set to the time when the data was loaded.
dp=REGEX
Date regular expression. Enter the regular expression used to extract the _time field. If not specified, the _time field value is set to the time when the data was loaded.
cs=STR
Character encoding of the text file (default: utf-8).

Target

PATH
Text file path. You can use a wildcard (*), and files with a .gz extension are automatically decompressed.

Output fields

FieldTypeDescription
linestringOriginal text read from the text file
_filestringOriginal file name
_timetimestampTimestamp extracted using the dp and df options

Error codes

Parsing errors
Error codeMessageDescription
10702-No file path was specified
10700-The file does not exist or cannot be read
10701-The parent directory does not exist or cannot be read
10703-Access to the file was denied

Description

The textfile command reads a text file line by line and returns each line as a record with the content stored in the line field. Using the brex (begin regex) or erex (end regex) options, you can recognize a single record that spans multiple lines.

When you use the dp and df options together, a timestamp is extracted from the text and assigned to the _time field.

Examples

  1. Read a text file

    textfile /opt/logpresso/data/sample.log
    

    Retrieves each line of the text file as the line field.

  2. Retrieve with timestamp extraction

    textfile dp="^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})" df="yyyy-MM-dd HH:mm:ss" /var/log/syslog
    

    Extracts a timestamp from the beginning of each line and assigns it to the _time field.

  3. Recognize multi-line records

    textfile brex="^\d{4}-\d{2}-\d{2}" /var/log/application.log
    

    Treats lines starting with a date pattern as the beginning of a record, grouping multi-line log entries into a single record.

  4. Retrieve a file with EUC-KR encoding

    textfile cs=euc-kr /opt/logpresso/data/legacy.log
    

    Retrieves a text file encoded in EUC-KR.