kapezip
Reads per-module CSV outputs from a ZIP archive produced by KAPE (Kroll Artifact Parser and Extractor) modules without extracting the archive. KAPE is a Windows forensic triage tool from Kroll that collects and parses artifacts and packages each module's output as CSV inside a ZIP. The kapezip command takes such a ZIP as input so module outputs can be analyzed directly in queries.
Command properties
| Item | Description |
|---|---|
| Command type | Driver query |
| Required permission | Cluster administrator |
| License usage | Counted |
| Parallel execution | Not supported |
| Distributed execution | Runs on Data Node (mapper) |
Syntax
Options
entrycs=STR- Character encoding of ZIP file entry names (default:
utf-8). Specify this when the entry names inside the ZIP are stored in a non-standard encoding such ascp949. cs=STR- Character encoding of CSV files (default:
utf-8). Specify this when the CSV body is stored in a non-standard encoding. rest=BOOL- When set to
t, columns that exceed the maximum number of columns allowed by the CSV parser are collected into the_restfield and returned. When omitted or set tof, excess columns are discarded (default:f).
Target
FILE_PATH- Path to the ZIP file to query. You can use the
*wildcard in the file name portion to specify multiple ZIP files in the same directory at once. ENTRY_PATH- Path to the CSV entry inside the ZIP to query. You can use the
*wildcard. When omitted, all.csventries inside the ZIP are queried.
Output fields
| Field | Type | Description |
|---|---|---|
_file | string | Path of the ZIP entry that the record came from |
| (header columns) | string | Field names defined in the header (first line) of the CSV and their corresponding values |
column<N> | string | Field assigned to excess columns when a data row has more columns than the header (N is a zero-based index starting from the header column count) |
_rest | string | Header columns that exceed the CSV parser maximum (output only when rest=t is specified) |
Error codes
Parse errors
N/A
Runtime errors
| Error code | Message | Description | Post-processing |
|---|---|---|---|
| - | csvfile load failure | An error occurred while reading or parsing a CSV entry inside the ZIP | Stops query execution |
| - | - | The ZIP file cannot be opened or is corrupted (propagated as RuntimeException) | Stops query execution |
Description
The kapezip command opens the specified ZIP file and sequentially reads only the entries with the .csv extension. It treats the first line of each CSV entry as the header to assign field names automatically. When a single ZIP file contains multiple CSV entries, records from all entries are returned in order.
When ENTRY_PATH is specified, only entries that match the pattern are queried. If the pattern contains a wildcard (*), pattern matching is used; otherwise, the entry name is compared for an exact match. In either case, entries without a .csv extension are skipped automatically.
When a CSV file begins with a UTF-8 BOM (Byte Order Mark, 0xEF 0xBB 0xBF), it is detected and skipped automatically. However, when the cs option specifies an encoding other than utf-8, the BOM is not checked.
When the header exceeds the maximum number of columns allowed by the CSV parser, the default behavior discards the excess header columns. Specifying rest=t replaces the excess header columns with the _rest field. When a data row has more columns than the header, the excess columns are returned as fields in the form column<N>, where N is a zero-based index starting from the header column count.
When FILE_PATH includes a wildcard, all matching ZIP files are processed in order. The wildcard works only in the file name portion and cannot be used in the directory portion.
This command requires cluster administrator permission, and accessible file paths are restricted by the ALLOWED_FILE_SCAN_PATHS setting. For details, see File access restrictions.
Examples
The file paths in the following examples are assumed to be included in the ALLOWED_FILE_SCAN_PATHS setting.
-
Query all CSV entries inside a ZIP
kapezip /opt/logpresso/evidence/kape-output.zipReturns records from every
.csventry contained inkape-output.zipin order. The_filefield of each record records the path of the source entry. -
Query a specific CSV entry only
kapezip /opt/logpresso/evidence/kape-output.zip Modules/FileSystem/MFTECmd.csvReads only the entry whose path inside the ZIP exactly matches
Modules/FileSystem/MFTECmd.csvand returns its records. -
Query multiple entries with a wildcard
kapezip /opt/logpresso/evidence/kape-output.zip Modules/Registry/*.csvQueries all CSV entries under the
Modules/Registry/directory. -
Query a ZIP package with Korean encoding
kapezip entrycs=cp949 cs=cp949 /opt/logpresso/evidence/kape-kr.zipCorrectly reads a ZIP package whose entry names and CSV bodies are both encoded in CP949.
-
Process multiple ZIP files in batch with a wildcard
kapezip /opt/logpresso/evidence/host-*.zipQueries CSV entries from every ZIP file beginning with
host-in the same directory. -
Preserve excess columns and then filter by source
kapezip rest=t /opt/logpresso/evidence/kape-output.zip | search _file == "Modules/FileSystem/MFTECmd.csv"Queries the entire ZIP while preserving header columns that exceed the maximum count in the
_restfield, and then filters records originating from a specific entry based on the_filefield.
Compatibility
The kapezip command has been available since before Sonar 4.0. Starting from version 4.0.2511.0, it requires cluster administrator permission, and accessible file paths are restricted.