join

Merges subquery results based on join keys.

Command properties

PropertyValue
Command typeTransforming
Required permissionNone
License usageN/A
Parallel executionSupported
Distributed executionRuns on Control Node (reducer)

Syntax

join [type={inner|left|leftonly|right|rightonly|cross|full}] FIELD, ... [ SUBQUERY ]

Options

type={inner|left|leftonly|right|rightonly|cross|full}
Join type. (Default: inner)
  • inner: Outputs only records where both sides have a matching join key.
  • left: Outputs all left (input) records and merges right (subquery) fields where the join key matches.
  • leftonly: Outputs only left records that have no matching key on the right.
  • right: Outputs all right (subquery) records and merges left (input) fields where the join key matches.
  • rightonly: Outputs only right records that have no matching key on the left.
  • cross: Outputs the Cartesian product of both sides. No join key is specified.
  • full: Outputs all records from both sides and merges fields where the join key matches.

Target

FIELD, ...
Join key fields. Separate multiple fields with commas (,). Not specified for cross joins.
[ SUBQUERY ]
A subquery enclosed in square brackets ([]). The subquery result is used as the right-side dataset.

Error codes

Parsing errors
Error codeMessageDescription
20500지원되지 않는 조인 방식입니다. inner, left, leftonly, right, rightonly, cross, full 중 하나를 사용하세요.An unsupported join type was specified
20501조인 키를 지정하세요.No join key field was specified
Runtime errors

N/A

Description

The join command merges input records (left) with subquery results (right) based on join keys. The subquery runs first. If the subquery result contains 100,000 records or fewer, a hash join is used; otherwise, a sort-merge join is used.

Adding a + or - prefix to a join key field forces a sort-merge join and specifies the sort order for that field.

The join command cannot be used in stream queries; use the streamjoin command instead.

In a distributed environment, the join is performed on Control Nodes.

The default hash join threshold is 100,000 records. You can change this value using the system property araqne.hashjointhreshold.

Examples

  1. Inner join

    json "[{'src_ip': '192.0.2.1', 'bytes': 1024}, {'src_ip': '192.0.2.2', 'bytes': 2048}]"
    | join src_ip [
        json "[{'src_ip': '192.0.2.1', 'hostname': 'web-01'}, {'src_ip': '192.0.2.3', 'hostname': 'db-01'}]"
      ]
    

    Performs an inner join on the src_ip field. Only the 192.0.2.1 record, which matches on both sides, is output.

  2. Left join

    json "[{'src_ip': '192.0.2.1', 'bytes': 1024}, {'src_ip': '192.0.2.2', 'bytes': 2048}]"
    | join type=left src_ip [
        json "[{'src_ip': '192.0.2.1', 'hostname': 'web-01'}]"
      ]
    

    Outputs all left records and merges matching right fields. The 192.0.2.2 record is also output but has no hostname field.

  3. Left-only join

    table duration=1h web_logs
    | join type=leftonly src_ip [
        table duration=1d allowlist
      ]
    

    Outputs only web log records whose IP address is not in the allowlist.

  4. Cross join

    json "[{'name': 'Alice'}, {'name': 'Bob'}]"
    | join type=cross [
        json "[{'role': 'admin'}, {'role': 'user'}]"
      ]
    

    Outputs all combinations of both sides' records. A total of 4 records are output.

  5. Use multiple join keys

    table duration=1h web_logs
    | join src_ip, dst_port [
        table duration=1d service_map
      ]
    

    Merges using both src_ip and dst_port as join keys.