corr()

Calculates the Pearson correlation coefficient between two numeric fields in a group.

Syntax

corr(EXPR_X, EXPR_Y)

Parameters

EXPR_X
An expression that returns the value for the first variable.
EXPR_Y
An expression that returns the value for the second variable.

Description

The corr() function accumulates pairs of numeric values returned by EXPR_X and EXPR_Y as it processes records in a group, then calculates the Pearson correlation coefficient. If either value in a pair is null or non-numeric, that record is excluded from aggregation.

The Pearson correlation coefficient is the covariance of the two variables (cov()) divided by the product of their standard deviations. It returns a 64-bit floating-point number (double) between -1 and 1. A value close to 1 indicates a positive correlation, a value close to -1 indicates a negative correlation, and a value close to 0 indicates no correlation. If there are no valid pairs to aggregate, it returns null.

This function can only be used in aggregation commands such as stats and timechart.

Error codes

N/A

Usage examples

The Pearson correlation coefficient measures the linear relationship between two continuous numeric values. Fields that are stored as numbers but represent categories (such as the HTTP status code in the status field) are not valid inputs. Use pairs of continuous measurements, such as the request count per minute and the average response latency, as inputs in real operational scenarios.

  1. Calculate the correlation coefficient between request count and average response latency

    json "[{'minute': 1, 'req_count': 50, 'avg_latency_ms': 30}, {'minute': 2, 'req_count': 120, 'avg_latency_ms': 75}, {'minute': 3, 'req_count': 200, 'avg_latency_ms': 140}, {'minute': 4, 'req_count': 350, 'avg_latency_ms': 280}, {'minute': 5, 'req_count': 500, 'avg_latency_ms': 450}]"
    | stats corr(req_count, avg_latency_ms)
    | # A strong positive correlation (close to 1).
    
  2. Calculate the correlation coefficient between request count and average response latency per host

    json "[{'host': 'web-01', 'req_count': 100, 'avg_latency_ms': 50}, {'host': 'web-01', 'req_count': 220, 'avg_latency_ms': 130}, {'host': 'web-01', 'req_count': 380, 'avg_latency_ms': 260}, {'host': 'web-02', 'req_count': 80, 'avg_latency_ms': 40}, {'host': 'web-02', 'req_count': 180, 'avg_latency_ms': 100}, {'host': 'web-02', 'req_count': 320, 'avg_latency_ms': 220}]"
    | stats corr(req_count, avg_latency_ms) by host
    
  3. Null value handling

    json "[{'x': 1, 'y': 2}, {'x': null, 'y': 4}, {'x': 3, 'y': 6}]"
    | stats corr(x, y)
    | # Records where x is null are excluded from aggregation.
    

Compatibility

The corr() function has been available since before Logpresso Sonar 4.0.