sonar-health-check

Runs a fault diagnosis across all nodes in the cluster. Checks six status items — from node aliveness to GC anomalies and heap usage — and returns diagnostic results for nodes where anomalies are detected.

Command properties

ItemDescription
Command typeDriver query
Required permissionAdministrator
License usageCounted
Parallel executionNot supported
Distributed executionNot supported

Syntax

sonar-health-check [duration=INT{mon|w|d|h|m|s}]

Options

duration=INT{mon|w|d|h|m|s}
Time range of recent data to search for log-based diagnostics (GC, heap). Default is 1h.

Output fields

FieldTypeDescription
node_idintegerNode ID
nodestringNode identifier (NID)
node_typestringNode type (CONTROL, DATA, FORWARDER)
typestringDiagnostic item type (Node Aliveness, Policy Sync, Forwarder Swap, Forwarder Delay, GC, Heap Usage)
statusstringDiagnostic result (BAD: anomaly detected, GOOD: normal, FAILURE: check failed)
detailsmapDetailed information per diagnostic item (see below)

The details field contains different content depending on type.

type=Policy Sync

KeyDescription
last_sync_idLast synchronized queue ID
last_sync_timeTime of the last synchronization
last_queue_idMost recent synchronization queue ID
last_queue_timeCreation time of the most recent queue item

type=Forwarder Swap

KeyDescription
channelForwarding channel name
swap_usageSwap usage in bytes
swap_sizeCurrent swap usage size in bytes
swap_capacityMaximum swap capacity in bytes
dropNumber of dropped records

type=Forwarder Delay

KeyDescription
loggerLogger internal identifier
logger_nameLogger name
last_receivedLast time the Data Node received data
last_sentLast time the Forwarder Node sent data

type=GC

KeyDescription
typeGC event type (OOM, ALLOC_STALL, FULL_GC, TO_SPACE_EXHAUSTED)
durationGC duration in milliseconds (included only for ALLOC_STALL and FULL_GC)
timeTime the GC event occurred (included for ALLOC_STALL, FULL_GC, TO_SPACE_EXHAUSTED)
msgOOM occurrence count message (included only for OOM)

type=Heap Usage

KeyDescription
min_heap_usageMinimum heap usage percentage during the query period (e.g., "75.23%")
msgMessage indicating that the threshold was exceeded

Error codes

Parse errors
Error codeMessageDescription
300106No permission.Raised when executed in a session without administrator privileges
Runtime errors

N/A

Description

sonar-health-check checks the following six items in order.

  1. Node Aliveness: Checks whether all nodes registered in the cluster are alive. Nodes that do not respond are immediately returned with a BAD status and are excluded from subsequent log-based checks.

  2. Policy Sync: Checks the policy synchronization status of control nodes. Returns BAD if synchronization is not complete 10 seconds after the most recent sync queue item was created.

  3. Forwarder Swap: Returns BAD for any channel on a Forwarder Node where swap channel usage is 50% or more. Applies only to forwarder nodes (FORWARDER).

  4. Forwarder Delay: Returns BAD for any logger where the difference between the Data Node's last received time and the Forwarder Node's last sent time is 60 seconds or more.

  5. GC: Returns BAD for any node within the specified period where an OOM occurred, an ALLOC_STALL exceeded 100 ms, a FULL_GC exceeded 1,000 ms, or a TO_SPACE_EXHAUSTED event occurred.

  6. Heap Usage: Returns BAD for any node whose minimum heap usage during the specified period is 70% or more.

Among the checked items, only the Policy Sync item outputs GOOD results. All other items output results only when an anomaly (BAD) is detected.

If an exception occurs while connecting to a specific node, the diagnostic result for that node is returned as FAILURE.

Usage examples

  1. Check the status of all nodes in the cluster (default: based on the last hour)

    sonar-health-check
    
  2. Check GC and heap usage based on the last 6 hours of data

    sonar-health-check duration=6h
    
  3. Filter and view only items with a BAD status

    sonar-health-check
    | search status == "BAD"
    | sort node, type
    
  4. Aggregate the number of anomalous nodes by diagnostic item

    sonar-health-check
    | search status == "BAD"
    | stats count by type
    

Compatibility

The sonar-health-check command is available since version 4.0.2511.0.

Related