sonar-health-check
Runs a fault diagnosis across all nodes in the cluster. Checks six status items — from node aliveness to GC anomalies and heap usage — and returns diagnostic results for nodes where anomalies are detected.
Command properties
| Item | Description |
|---|---|
| Command type | Driver query |
| Required permission | Administrator |
| License usage | Counted |
| Parallel execution | Not supported |
| Distributed execution | Not supported |
Syntax
Options
duration=INT{mon|w|d|h|m|s}- Time range of recent data to search for log-based diagnostics (GC, heap). Default is
1h.
Output fields
| Field | Type | Description |
|---|---|---|
node_id | integer | Node ID |
node | string | Node identifier (NID) |
node_type | string | Node type (CONTROL, DATA, FORWARDER) |
type | string | Diagnostic item type (Node Aliveness, Policy Sync, Forwarder Swap, Forwarder Delay, GC, Heap Usage) |
status | string | Diagnostic result (BAD: anomaly detected, GOOD: normal, FAILURE: check failed) |
details | map | Detailed information per diagnostic item (see below) |
The details field contains different content depending on type.
type=Policy Sync
| Key | Description |
|---|---|
last_sync_id | Last synchronized queue ID |
last_sync_time | Time of the last synchronization |
last_queue_id | Most recent synchronization queue ID |
last_queue_time | Creation time of the most recent queue item |
type=Forwarder Swap
| Key | Description |
|---|---|
channel | Forwarding channel name |
swap_usage | Swap usage in bytes |
swap_size | Current swap usage size in bytes |
swap_capacity | Maximum swap capacity in bytes |
drop | Number of dropped records |
type=Forwarder Delay
| Key | Description |
|---|---|
logger | Logger internal identifier |
logger_name | Logger name |
last_received | Last time the Data Node received data |
last_sent | Last time the Forwarder Node sent data |
type=GC
| Key | Description |
|---|---|
type | GC event type (OOM, ALLOC_STALL, FULL_GC, TO_SPACE_EXHAUSTED) |
duration | GC duration in milliseconds (included only for ALLOC_STALL and FULL_GC) |
time | Time the GC event occurred (included for ALLOC_STALL, FULL_GC, TO_SPACE_EXHAUSTED) |
msg | OOM occurrence count message (included only for OOM) |
type=Heap Usage
| Key | Description |
|---|---|
min_heap_usage | Minimum heap usage percentage during the query period (e.g., "75.23%") |
msg | Message indicating that the threshold was exceeded |
Error codes
Parse errors
| Error code | Message | Description |
|---|---|---|
| 300106 | No permission. | Raised when executed in a session without administrator privileges |
Runtime errors
N/A
Description
sonar-health-check checks the following six items in order.
-
Node Aliveness: Checks whether all nodes registered in the cluster are alive. Nodes that do not respond are immediately returned with a
BADstatus and are excluded from subsequent log-based checks. -
Policy Sync: Checks the policy synchronization status of control nodes. Returns
BADif synchronization is not complete 10 seconds after the most recent sync queue item was created. -
Forwarder Swap: Returns
BADfor any channel on a Forwarder Node where swap channel usage is 50% or more. Applies only to forwarder nodes (FORWARDER). -
Forwarder Delay: Returns
BADfor any logger where the difference between the Data Node's last received time and the Forwarder Node's last sent time is 60 seconds or more. -
GC: Returns
BADfor any node within the specified period where an OOM occurred, anALLOC_STALLexceeded 100 ms, aFULL_GCexceeded 1,000 ms, or aTO_SPACE_EXHAUSTEDevent occurred. -
Heap Usage: Returns
BADfor any node whose minimum heap usage during the specified period is 70% or more.
Among the checked items, only the Policy Sync item outputs GOOD results. All other items output results only when an anomaly (BAD) is detected.
If an exception occurs while connecting to a specific node, the diagnostic result for that node is returned as FAILURE.
Usage examples
-
Check the status of all nodes in the cluster (default: based on the last hour)
sonar-health-check -
Check GC and heap usage based on the last 6 hours of data
sonar-health-check duration=6h -
Filter and view only items with a
BADstatussonar-health-check | search status == "BAD" | sort node, type -
Aggregate the number of anomalous nodes by diagnostic item
sonar-health-check | search status == "BAD" | stats count by type
Compatibility
The sonar-health-check command is available since version 4.0.2511.0.
Related
- system-nodes — Query the cluster node list
- system-forwarder-channels — Query Forwarder Node channel status
- system-gc-logs — Query GC logs