lof
Calculates the local density of each point based on the k nearest neighbors, compares relative density ratios with adjacent neighbors, and computes the LOF (Local Outlier Factor) index.
Command properties
| Item | Description |
|---|---|
| Command type | Processing query |
| Required permission | None |
| License usage | N/A |
| Parallel execution | Supported |
| Distributed execution | Runs on Control Node (reducer) |
Syntax
To calculate LOF index on input data:
To calculate LOF index on input data after querying training data via a subquery:
Options
k=INT- Number of neighbor nodes to use in the calculation. Must be a positive integer of 1 or more. (Default:
10) eps=DOUBLE- Minimum distance adjustment factor between data points. Prevents divergence to infinity when dividing by the sum of distances between data points. Must be a value greater than 0 and less than or equal to 1. (Default:
0.00001) optimize={t|f}- Whether to use the optimization algorithm. (Default:
t)
t: Uses the optimization algorithm.f: Does not use the optimization algorithm.
cores=INT- Number of threads to use for LOF index calculation. Must be a value between 1 and 10,000. (Default:
1)
Target
FIELD, ...- List of fields to use for LOF index calculation. Multiple fields can be specified separated by commas (
,). Field values must be numeric types such as integer, double, or timestamp. [by GRP_FIELD, ...]- List of group fields. Multiple fields can be specified separated by commas (
,). When abyclause is specified, the LOF index is calculated independently for records with the same group field values. [ SUBQUERY ]- Subquery to retrieve training data. When a subquery is specified, an LOF model is first built from the subquery results, then the LOF index is calculated for input records. To use a subquery, a
byclause must also be specified.
Input fields
| Field | Type | Required | Description |
|---|---|---|---|
| FIELD | integer, double, timestamp | Required | Field containing numeric values. Records with non-numeric values are ignored. |
| GRP_FIELD | any type | Optional | Group field. Records with the same value are used to independently calculate LOF index. |
Output fields
| Field | Type | Description |
|---|---|---|
_lof | double | LOF index. Assigns null when the value is NaN. |
_lof_error | string | In subquery mode, assigns an error message when an error occurs during LOF index calculation. |
Error codes
Parse errors
| Error code | Message | Description |
|---|---|---|
40801 | lof 명령의 k 값은 1 이상의 양수여야 합니다. | When the k value is less than 1 or not a number |
40802 | lof 명령의 그룹 필드가 누락되었습니다. | When a by clause is specified but the group field is empty |
40803 | lof 명령의 대상 필드가 누락되었습니다. | When no target field is specified |
40804 | 머신러닝 라이선스가 필요합니다. | When a machine learning license is not available |
40805 | lof 명령의 eps 값은 0 초과, 1 이하의 양수여야 합니다. | When the eps value is out of range or not a number |
90204 | '['가 짝이 맞지 않습니다. | When the subquery brackets are not matched |
Runtime errors
N/A
Description
The lof command uses the LOF (Local Outlier Factor) algorithm to calculate the outlier index for each record. The LOF algorithm compares the local density of each data point with k nearest neighbors to identify points with lower density than their surroundings as outliers.
The _lof field is assigned to each record with the LOF index, which can be interpreted as follows:
- LOF > 1: Located outside the cluster; the higher the value, the more likely it is an outlier.
- LOF ≈ 1: Located on the boundary of the cluster.
- LOF < 1: Located inside the cluster.
When used without a subquery, all input records are collected, an LOF model is built, and the LOF index is calculated for each record. When a by clause is specified, an independent LOF model is built for each group.
When a subquery is specified, an LOF model is first built from the subquery results, then the LOF index is calculated for input records. This approach is useful when determining whether new data is anomalous based on previously configured reference data.
When calculating LOF index by group using a by clause, each group must have more records than the number of neighbor nodes (k). If the number of records is less than the number of neighbor nodes, all points are captured in one cluster and the LOF index is not calculated meaningfully.
In a distributed environment, the LOF index is calculated on the Control Node.
Examples
-
Basic LOF index calculation
csvfile /opt/logpresso/iris.csv | eval sepal_length = double(sepal_length), sepal_width = double(sepal_width) | lof sepal_length, sepal_width | search _lof > 2Calculates the LOF index based on
sepal_lengthandsepal_widthfields, and filters only outliers (LOF > 2). -
Specify the number of neighbor nodes
csvfile /opt/logpresso/iris.csv | eval sepal_length = double(sepal_length), sepal_width = double(sepal_width) | lof k=20 sepal_length, sepal_widthCalculates the LOF index with the number of neighbor nodes set to 20.
-
Calculate LOF index by group
csvfile /opt/logpresso/iris.csv | eval sepal_length = double(sepal_length), sepal_width = double(sepal_width) | lof sepal_length, sepal_width by speciesIndependently calculates the LOF index for each group with the same
speciesfield value. -
Calculate LOF index using a subquery
table duration=1h sensor_data | lof k=15 temperature, humidity by location [ table duration=7d sensor_data ] | search _lof > 1.5Builds a reference model from sensor data over the past 7 days, then detects outliers in data from the last 1 hour.
-
Calculate LOF index using multiple cores
csvfile /opt/logpresso/iris.csv | eval sepal_length = double(sepal_length), sepal_width = double(sepal_width) | lof k=10 cores=4 sepal_length, sepal_widthCalculates the LOF index using 4 threads.