anomalies

Calculates the anomaly score using the Isolation Forest modeling (a way of creating a decision tree model by sampling some data).

Syntax

Calculate the anomaly score using a stored training model.

anomalies [sample=INT] [size=INT] model=MODEL

Calculate the anomaly score using a model trained based on subquery results.

anomalies [sample=INT] [size=INT] FIELD, ... [ SUBQUERY ]

Required Parameter

FIELD, ...: Fields to be used for the Isolation Forest modeling. Use a comma(,) as a separator.
model=MODEL: Name of the Isolation Forest model. You can generate and train the Isolation Forest model by connecting to the Logpresso engine via CLI.
[ SUBQUERY ]: Subquery that returns the data set for model training.

Optional Parameter

sample=INT: Number of samples to draw when training the Isolation Forest model (default: the square root of the number of samples).
size=INT: Number of trees within the Isolation Forest (default: 100).

Description

The anomaly score, ranging from 0 to 1, is assigned to the _score field.

The higher the score, the more likely it is an anomaly.
A score much smaller than 0.5 indicates normal observations.
If all scores are close to 0.5, the entire sample does not seem to have clearly distinct anomalies.

Usages

Calculate the anomaly score using the anomal_stock model.

# Download: https://raw.githubusercontent.com/logpresso/dataset/main/stocks.csv
| table stocks
| anomalies model=anomal_stock
| eval anom = if(_score>0.7, stocks, null)

Calculate using a model trained based on the training data set returned from a subquery.

table stocks
| anomalies sample=256 stocks 
    [ csvfile /test/sam_train.csv
      | eval _time=date(date, "yyyyMMdd"), stocks = int (stocks)
      | fields _time, stocks
    ]
| eval anom = if(_score>0.65, stocks, null)
| fields _time, anom, stocks