estdc()
Uses the HyperLogLog algorithm to estimate the approximate distinct count of values in a group.
estdcount is an alias for estdc.
Syntax
Parameters
EXPR- An expression that returns the field whose distinct values are to be estimated.
BITS- (Optional) The number of HyperLogLog precision bits. Must be an integer between 4 and 24. The default is
16. Higher values increase accuracy but also increase memory usage.
Description
The estdc() function accumulates the values returned by EXPR into a HyperLogLog probabilistic data structure as it processes records in a group. Null values are not processed. When aggregation is complete, it returns the approximate distinct count as a 64-bit integer (long).
HyperLogLog estimates the distinct count of large datasets quickly using only a small amount of memory. At the default precision of 16 bits, the standard error is approximately 0.8%. When an exact distinct count is required, use the dc() function instead.
For numeric types, the function rehashes values using the Google MurmurHash3 algorithm to prevent bit-biasing issues caused by Java SDK's default hashCode() for numbers within a range.
This function can only be used in aggregation commands such as stats and timechart.
Memory usage per group varies depending on the BITS value. With precision bits b, the register array size is 2^b bytes. At the default value of 16, up to 64 KB is used per group. Increasing BITS improves accuracy but also increases memory usage.
Error codes
| Code | Description |
|---|---|
| 91020 | The number of arguments is wrong. |
| 91050 | The BITS parameter is not an integer in the range of 4–24. |
Usage examples
To prepare the WEB_APACHE_SAMPLE table used in these examples, refer to Preparing sample data.
-
Estimate the number of unique source IP addresses (default precision)
table WEB_APACHE_SAMPLE | stats estdc(src_ip) -
Estimate the number of unique source IP addresses with higher precision
table WEB_APACHE_SAMPLE | stats estdc(src_ip, 20) -
Null value handling
json "[{'val': 10}, {'val': null}, {'val': 30}]" | stats estdc(val) | # Null values are excluded from aggregation.
Compatibility
The estdc() function has been available since before Logpresso Sonar 4.0.