kapezip

Reads per-module CSV outputs from a ZIP archive produced by KAPE (Kroll Artifact Parser and Extractor) modules without extracting the archive. KAPE is a Windows forensic triage tool from Kroll that collects and parses artifacts and packages each module's output as CSV inside a ZIP. The kapezip command takes such a ZIP as input so module outputs can be analyzed directly in queries.

Command properties

Item	Description
Command type	Driver query
Required permission	Cluster administrator
License usage	Counted
Parallel execution	Not supported
Distributed execution	Runs on Data Node (mapper)

Syntax

kapezip [entrycs=STR] [cs=STR] [rest=BOOL] FILE_PATH [ENTRY_PATH]

Options

entrycs=STR: Character encoding of ZIP file entry names (default: utf-8). Specify this when the entry names inside the ZIP are stored in a non-standard encoding such as cp949.
cs=STR: Character encoding of CSV files (default: utf-8). Specify this when the CSV body is stored in a non-standard encoding.
rest=BOOL: When set to t, columns that exceed the maximum number of columns allowed by the CSV parser are collected into the _rest field and returned. When omitted or set to f, excess columns are discarded (default: f).

Target

FILE_PATH: Path to the ZIP file to query. You can use the * wildcard in the file name portion to specify multiple ZIP files in the same directory at once.
ENTRY_PATH: Path to the CSV entry inside the ZIP to query. You can use the * wildcard. When omitted, all .csv entries inside the ZIP are queried.

Output fields

Field	Type	Description
`_file`	string	Path of the ZIP entry that the record came from
(header columns)	string	Field names defined in the header (first line) of the CSV and their corresponding values
`column<N>`	string	Field assigned to excess columns when a data row has more columns than the header (`N` is a zero-based index starting from the header column count)
`_rest`	string	Header columns that exceed the CSV parser maximum (output only when `rest=t` is specified)

Error codes

Parse errors

N/A

Runtime errors

Error code	Message	Description	Post-processing
-	csvfile load failure	An error occurred while reading or parsing a CSV entry inside the ZIP	Stops query execution
-	-	The ZIP file cannot be opened or is corrupted (propagated as `RuntimeException`)	Stops query execution

Description

The kapezip command opens the specified ZIP file and sequentially reads only the entries with the .csv extension. It treats the first line of each CSV entry as the header to assign field names automatically. When a single ZIP file contains multiple CSV entries, records from all entries are returned in order.

When ENTRY_PATH is specified, only entries that match the pattern are queried. If the pattern contains a wildcard (*), pattern matching is used; otherwise, the entry name is compared for an exact match. In either case, entries without a .csv extension are skipped automatically.

When a CSV file begins with a UTF-8 BOM (Byte Order Mark, 0xEF 0xBB 0xBF), it is detected and skipped automatically. However, when the cs option specifies an encoding other than utf-8, the BOM is not checked.

When the header exceeds the maximum number of columns allowed by the CSV parser, the default behavior discards the excess header columns. Specifying rest=t replaces the excess header columns with the _rest field. When a data row has more columns than the header, the excess columns are returned as fields in the form column<N>, where N is a zero-based index starting from the header column count.

When FILE_PATH includes a wildcard, all matching ZIP files are processed in order. The wildcard works only in the file name portion and cannot be used in the directory portion.

This command requires cluster administrator permission, and accessible file paths are restricted by the ALLOWED_FILE_SCAN_PATHS setting. For details, see File access restrictions.

Examples

The file paths in the following examples are assumed to be included in the ALLOWED_FILE_SCAN_PATHS setting.

Query all CSV entries inside a ZIP
```
kapezip /opt/logpresso/evidence/kape-output.zip
```
Returns records from every .csv entry contained in kape-output.zip in order. The _file field of each record records the path of the source entry.
Query a specific CSV entry only
```
kapezip /opt/logpresso/evidence/kape-output.zip Modules/FileSystem/MFTECmd.csv
```
Reads only the entry whose path inside the ZIP exactly matches Modules/FileSystem/MFTECmd.csv and returns its records.
Query multiple entries with a wildcard
```
kapezip /opt/logpresso/evidence/kape-output.zip Modules/Registry/*.csv
```
Queries all CSV entries under the Modules/Registry/ directory.
Query a ZIP package with Korean encoding
```
kapezip entrycs=cp949 cs=cp949 /opt/logpresso/evidence/kape-kr.zip
```
Correctly reads a ZIP package whose entry names and CSV bodies are both encoded in CP949.
Process multiple ZIP files in batch with a wildcard
```
kapezip /opt/logpresso/evidence/host-*.zip
```
Queries CSV entries from every ZIP file beginning with host- in the same directory.
Preserve excess columns and then filter by source
```
kapezip rest=t /opt/logpresso/evidence/kape-output.zip
| search _file == "Modules/FileSystem/MFTECmd.csv"
```
Queries the entire ZIP while preserving header columns that exceed the maximum count in the _rest field, and then filters records originating from a specific entry based on the _file field.

Compatibility

The kapezip command has been available since before Sonar 4.0. Starting from version 4.0.2511.0, it requires cluster administrator permission, and accessible file paths are restricted.