anomalies

Calculates an anomaly score for each record using the Isolation Forest algorithm. The command samples a portion of the data to build a decision tree model, then computes a score between 0 and 1 that indicates how different each input record is from normal data.

Command properties

Item	Description
Command type	Processing query
Required permission	None
License usage	N/A
Parallel execution	Not supported
Distributed execution	Runs on Control Node (reducer)

Syntax

To predict anomalies using a pre-trained model:

anomalies [sample=INT] [size=INT] model=STR

To predict anomalies by querying training data with a subquery:

anomalies [sample=INT] [size=INT] [timeout=INT{s|m|h|d}] [core=INT] [scorer=INT] FIELD, ... [ SUBQUERY ]

Options

sample=INT: Number of samples to use for Isolation Forest training. Defaults to the square root of the number of training records.
size=INT: Number of trees in the Isolation Forest. Default: 100
model=STR: Name of a pre-trained Isolation Forest model. You can create and train models from the Sonar web console or the Logpresso shell.
timeout=INT{s|m|h|d}: Maximum time to wait for the subquery to complete. If the subquery does not finish within the specified time, it is cancelled. If omitted, the command waits indefinitely.
core=INT: Number of parallel threads to use for anomaly score calculation. If omitted, the command runs on a single thread.
scorer=INT: Assigns the top N fields that most influenced the anomaly score to the _scorer output field. If omitted, the _scorer field is not output.

Target

FIELD, ...: List of fields to use for Isolation Forest analysis. Separate multiple fields with commas (,).
[ SUBQUERY ]: Subquery that retrieves training data for the Isolation Forest. Enclose the subquery in square brackets ([ ]).

Output fields

Field	Type	Description
_score	double	Anomaly score. A value between 0 and 1; values closer to 1 indicate a higher likelihood of being an anomaly.
_scorer	array	When `scorer` is specified, contains the top N fields and their importance scores that most influenced the anomaly score.

Error codes

Parse errors

Error code	Message	Description
40804	Machine learning license is required.	Machine learning license is not present.
41100	Enter a machine learning model name.	The `model` option value is empty.
41101	Machine learning model not found.	No model with the specified name exists.
41102	Machine learning model handler not found.	No handler is found for the model type.
90204	`[` bracket is not matched.	The subquery square bracket is not closed.
90206	No subquery.	No subquery is specified without the `model` option, or the subquery is empty.

Runtime errors

N/A

Description

The anomalies command calculates anomaly scores for input records using the Isolation Forest algorithm. Isolation Forest builds multiple decision trees that randomly partition data to distinguish normal data from anomalies. Anomalies are isolated with fewer partitions than normal data, so the anomaly score is derived from the average number of partitions required.

The anomaly score is assigned to the _score field as a value between 0 and 1:

Values closer to 1 indicate a higher likelihood of being an anomaly.
Values significantly below 0.5 are considered normal observations.
If all scores are close to 0.5, the data likely contains no anomalies.

When you use the model option, anomalies are predicted based on a pre-trained model. Without the model option, the command queries training data via a subquery and builds the model at query time.

Supported field value types include numeric, timestamp, IP address, and string. String values are internally encoded and converted to numeric vectors. Fields with null values are excluded from training and scoring.

In a distributed environment, the final score calculation runs on the Control Node.

Examples

Predicting anomalies using a pre-trained model
```
table duration=30d stocks
| anomalies model=anomal_stock
| eval anom = if(_score > 0.7, stocks, null)
```
Calculates anomaly scores using the anomal_stock model and assigns the stocks value to the anom field for records with a score above 0.7.

Predicting anomalies using training data from a subquery

table duration=30d stocks
| anomalies sample=256 stocks [
    csvfile /opt/logpresso/dataset/stock_train.csv
    | eval _time = date(date, "yyyyMMdd"), stocks = int(stocks)
    | fields _time, stocks
  ]
| eval anom = if(_score > 0.65, stocks, null)
| fields _time, anom, stocks

Retrieves training data from a CSV file via a subquery to build an Isolation Forest model, then calculates anomaly scores for the stocks field with a sample limit of 256.

Configuring tree count and analyzing top contributing fields

json "[{'cpu': 80, 'mem': 60, 'disk': 30}, {'cpu': 95, 'mem': 90, 'disk': 85}, {'cpu': 40, 'mem': 50, 'disk': 20}]"
| anomalies size=200 scorer=3 cpu, mem, disk [
    json "[{'cpu': 45, 'mem': 55, 'disk': 25}, {'cpu': 50, 'mem': 48, 'disk': 22}, {'cpu': 42, 'mem': 52, 'disk': 28}, {'cpu': 48, 'mem': 51, 'disk': 24}, {'cpu': 44, 'mem': 49, 'disk': 21}]"
  ]

Sets the number of trees to 200 and uses scorer=3 to assign the top 3 fields that most influenced the anomaly score to the _scorer field.