kapezip

Reads per-module CSV outputs from a ZIP archive produced by KAPE (Kroll Artifact Parser and Extractor) modules without extracting the archive. KAPE is a Windows forensic triage tool from Kroll that collects and parses artifacts and packages each module's output as CSV inside a ZIP. The kapezip command takes such a ZIP as input so module outputs can be analyzed directly in queries.

Command properties

ItemDescription
Command typeDriver query
Required permissionCluster administrator
License usageCounted
Parallel executionNot supported
Distributed executionRuns on Data Node (mapper)

Syntax

kapezip [entrycs=STR] [cs=STR] [rest=BOOL] FILE_PATH [ENTRY_PATH]

Options

entrycs=STR
Character encoding of ZIP file entry names (default: utf-8). Specify this when the entry names inside the ZIP are stored in a non-standard encoding such as cp949.
cs=STR
Character encoding of CSV files (default: utf-8). Specify this when the CSV body is stored in a non-standard encoding.
rest=BOOL
When set to t, columns that exceed the maximum number of columns allowed by the CSV parser are collected into the _rest field and returned. When omitted or set to f, excess columns are discarded (default: f).

Target

FILE_PATH
Path to the ZIP file to query. You can use the * wildcard in the file name portion to specify multiple ZIP files in the same directory at once.
ENTRY_PATH
Path to the CSV entry inside the ZIP to query. You can use the * wildcard. When omitted, all .csv entries inside the ZIP are queried.

Output fields

FieldTypeDescription
_filestringPath of the ZIP entry that the record came from
(header columns)stringField names defined in the header (first line) of the CSV and their corresponding values
column<N>stringField assigned to excess columns when a data row has more columns than the header (N is a zero-based index starting from the header column count)
_reststringHeader columns that exceed the CSV parser maximum (output only when rest=t is specified)

Error codes

Parse errors

N/A

Runtime errors
Error codeMessageDescriptionPost-processing
-csvfile load failureAn error occurred while reading or parsing a CSV entry inside the ZIPStops query execution
--The ZIP file cannot be opened or is corrupted (propagated as RuntimeException)Stops query execution

Description

The kapezip command opens the specified ZIP file and sequentially reads only the entries with the .csv extension. It treats the first line of each CSV entry as the header to assign field names automatically. When a single ZIP file contains multiple CSV entries, records from all entries are returned in order.

When ENTRY_PATH is specified, only entries that match the pattern are queried. If the pattern contains a wildcard (*), pattern matching is used; otherwise, the entry name is compared for an exact match. In either case, entries without a .csv extension are skipped automatically.

When a CSV file begins with a UTF-8 BOM (Byte Order Mark, 0xEF 0xBB 0xBF), it is detected and skipped automatically. However, when the cs option specifies an encoding other than utf-8, the BOM is not checked.

When the header exceeds the maximum number of columns allowed by the CSV parser, the default behavior discards the excess header columns. Specifying rest=t replaces the excess header columns with the _rest field. When a data row has more columns than the header, the excess columns are returned as fields in the form column<N>, where N is a zero-based index starting from the header column count.

When FILE_PATH includes a wildcard, all matching ZIP files are processed in order. The wildcard works only in the file name portion and cannot be used in the directory portion.

This command requires cluster administrator permission, and accessible file paths are restricted by the ALLOWED_FILE_SCAN_PATHS setting. For details, see File access restrictions.

Examples

The file paths in the following examples are assumed to be included in the ALLOWED_FILE_SCAN_PATHS setting.

  1. Query all CSV entries inside a ZIP

    kapezip /opt/logpresso/evidence/kape-output.zip
    

    Returns records from every .csv entry contained in kape-output.zip in order. The _file field of each record records the path of the source entry.

  2. Query a specific CSV entry only

    kapezip /opt/logpresso/evidence/kape-output.zip Modules/FileSystem/MFTECmd.csv
    

    Reads only the entry whose path inside the ZIP exactly matches Modules/FileSystem/MFTECmd.csv and returns its records.

  3. Query multiple entries with a wildcard

    kapezip /opt/logpresso/evidence/kape-output.zip Modules/Registry/*.csv
    

    Queries all CSV entries under the Modules/Registry/ directory.

  4. Query a ZIP package with Korean encoding

    kapezip entrycs=cp949 cs=cp949 /opt/logpresso/evidence/kape-kr.zip
    

    Correctly reads a ZIP package whose entry names and CSV bodies are both encoded in CP949.

  5. Process multiple ZIP files in batch with a wildcard

    kapezip /opt/logpresso/evidence/host-*.zip
    

    Queries CSV entries from every ZIP file beginning with host- in the same directory.

  6. Preserve excess columns and then filter by source

    kapezip rest=t /opt/logpresso/evidence/kape-output.zip
    | search _file == "Modules/FileSystem/MFTECmd.csv"
    

    Queries the entire ZIP while preserving header columns that exceed the maximum count in the _rest field, and then filters records originating from a specific entry based on the _file field.

Compatibility

The kapezip command has been available since before Sonar 4.0. Starting from version 4.0.2511.0, it requires cluster administrator permission, and accessible file paths are restricted.