estdc()

Uses the HyperLogLog algorithm to estimate the approximate distinct count of values in a group.

estdcount is an alias for estdc.

Syntax

estdc(EXPR[, BITS])

Parameters

EXPR
An expression that returns the field whose distinct values are to be estimated.
BITS
(Optional) The number of HyperLogLog precision bits. Must be an integer between 4 and 24. The default is 16. Higher values increase accuracy but also increase memory usage.

Description

The estdc() function accumulates the values returned by EXPR into a HyperLogLog probabilistic data structure as it processes records in a group. Null values are not processed. When aggregation is complete, it returns the approximate distinct count as a 64-bit integer (long).

HyperLogLog estimates the distinct count of large datasets quickly using only a small amount of memory. At the default precision of 16 bits, the standard error is approximately 0.8%. When an exact distinct count is required, use the dc() function instead.

For numeric types, the function rehashes values using the Google MurmurHash3 algorithm to prevent bit-biasing issues caused by Java SDK's default hashCode() for numbers within a range.

This function can only be used in aggregation commands such as stats and timechart.

Memory usage per group varies depending on the BITS value. With precision bits b, the register array size is 2^b bytes. At the default value of 16, up to 64 KB is used per group. Increasing BITS improves accuracy but also increases memory usage.

Error codes

CodeDescription
91020The number of arguments is wrong.
91050The BITS parameter is not an integer in the range of 4–24.

Usage examples

To prepare the WEB_APACHE_SAMPLE table used in these examples, refer to Preparing sample data.

  1. Estimate the number of unique source IP addresses (default precision)

    table WEB_APACHE_SAMPLE | stats estdc(src_ip)
    
  2. Estimate the number of unique source IP addresses with higher precision

    table WEB_APACHE_SAMPLE | stats estdc(src_ip, 20)
    
  3. Null value handling

    json "[{'val': 10}, {'val': null}, {'val': 30}]"
    | stats estdc(val)
    | # Null values are excluded from aggregation.
    

Compatibility

The estdc() function has been available since before Logpresso Sonar 4.0.