cube

Aggregates input records for all possible combinations of the group fields specified in the by clause. This produces multi-dimensional subtotals equivalent to SQL's GROUP BY CUBE.

Command properties

ItemDescription
Command typeProcessing query
Required permissionNone
License usageN/A
Parallel executionSupported
Distributed executionRuns on Control Node (reducer)

Syntax

cube [parallel=BOOL] [label=STR] AGG_FUNC [as ALIAS], ... by FIELD, ...

Options

parallel=BOOL
Whether to use parallel aggregation. Specify t or f.
label=STR
Label value to display in group fields for subtotal and grand total rows. When omitted, these fields are shown as null.

Target

AGG_FUNC [as ALIAS], ...
Aggregate functions with optional aliases. Separate multiple functions with commas (,). Use the as keyword to specify the output field name. Note that order-based aggregate functions such as percentile and median, as well as the dc (distinct count) function, are not supported.
by FIELD, ...
Group fields. Separate multiple fields with commas (,). The by clause is required for the cube command.

Output fields

Outputs the group fields followed by the aggregate function result fields. The fields specified in the by clause appear first, followed by the aggregate results. Output is sorted by group field values. In subtotal and grand total rows, group fields that are not part of the current aggregation grouping are displayed as the value specified in the label option or as null.

Error codes

Parse errors
Error codeMessageDescription
21700Enter an aggregate function.No aggregate function is specified.
21701Invalid query.The by clause syntax is incorrect.
21702{function} is an unsupported function.A non-existent aggregate function is used.
21703Cannot create {function} function. {msg}Failed to create the aggregate function.
21704The grouping set command requires a BY clause.No by clause is specified.
21705{function} is not supported in rollup/cube.An order-based aggregate function or the dc function is used.
Runtime errors

N/A

Description

The cube command groups input records by the fields specified in the by clause and applies aggregate functions to all possible field combinations. Unlike the rollup command, which only generates hierarchical subtotals, cube generates subtotals for all dimensional combinations.

For example, with by a, b, the command outputs aggregations at the following four levels:

  • Aggregation per a, b combination (detail)
  • Aggregation per a (subtotal)
  • Aggregation per b (subtotal)
  • Overall aggregation (grand total)

While rollup only outputs the subtotal by a and the grand total, cube additionally outputs the subtotal by b. With N group fields, cube generates 2^N combination aggregations.

In subtotal and grand total rows, group fields that are not part of the current aggregation grouping are filled with the string specified in the label option or null. Output is sorted by group field values.

In a distributed environment, the final aggregation runs on the Control Node.

The cube command accumulates aggregation results in an in-memory buffer while processing input records. The default buffer size is 100,000 records. When the buffer exceeds this threshold, intermediate results are flushed to disk. You can change this value using the system property araqne.logdb.aggregate_flush_size. In addition, with N group fields, cube generates 2^N combination aggregations, so memory usage and processing volume increase exponentially as the number of group fields grows.

Examples

  1. Aggregating all combinations of two fields

    json "[{'action':'login','status':'success','size':100},{'action':'login','status':'fail','size':50},{'action':'logout','status':'success','size':80},{'action':'logout','status':'success','size':120}]"
    | cube count, sum(size) as total_size by action, status
    

    Outputs aggregations for all combinations of action and status, subtotals by action, subtotals by status, and an overall grand total.

  2. Multi-dimensional aggregation with a label

    table duration=1h web_logs
    | eval date=string(date, "yyyy-MM-dd")
    | cube label="TOTAL" count by date, status
    

    Aggregates counts for all combinations of date and status code. In subtotal and grand total rows, group fields that are not part of the current grouping display TOTAL.

  3. Using multiple aggregate functions

    json "[{'dept':'sales','region':'east','amount':100},{'dept':'sales','region':'west','amount':200},{'dept':'dev','region':'east','amount':150},{'dept':'dev','region':'west','amount':300}]"
    | cube label="ALL" count, sum(amount) as total, avg(amount) as avg_amount by dept, region
    

    Aggregates count, sum, and average for all combinations of dept and region. Subtotal and grand total group fields display ALL.