Calculates Local Outlier Factor (LOF) by calculating the Local Reachability Density (LRD) of each point based on the k-nearest neighbors and calculating the ratio of the local reachability density relative to the adjacent neighbors.
- Fields that contain numeric data such as integers, real numbers, and dates. Use comma (
,) as a separator.
Number of adjacent nodes to be used for calculation (default:
by GRP_FIELD_1, ...
Grouping fields in the aggregation with
bydirective, separated by a comma (
,). This option MUST follow after
If you want to calculate the scoring for each group by using the
byclause, the number of records in each group must be greater than the number of adjacent nodes (the value specified by
k=INT). If the number of records in the group is less than the number of adjacent nodes, the LOF in the _lof field is not calculated as intended.
This calculates the LOF score on the _lof field for each record, and this value can be classified as follows:
- If the value is greater than 1 (LOF(k) > 1): It is located outside the cluster. The greater it is than 1, the more likely it is to be an anomaly.
- If the value is an approximation of 1 (LOF(k) ≈ 1): It is located at the boundary of the cluster.
- If the value is less than 1 (LOF(k) < 1): It is located inside the cluster.
Calculate the anomaly based on the field values of sepal_length and sepal_width (download: https://raw.githubusercontent.com/illinois-cse/data-fa14/gh-pages/data/iris.csv).
wget url="https://raw.githubusercontent.com/illinois-cse/data-fa14/gh-pages/data/iris.csv" | eval line = split(line, "\n") | explode line | split sep="," sepal_length,sepal_width,petal_length,petal_width,species | eval sepal_length = double(sepal_length), sepal_width = double(sepal_width) | lof sepal_length, sepal_width | search _lof > 2