anomalies

Calculates an anomaly score for each record using the Isolation Forest algorithm. The command samples a portion of the data to build a decision tree model, then computes a score between 0 and 1 that indicates how different each input record is from normal data.

Command properties

ItemDescription
Command typeProcessing query
Required permissionNone
License usageN/A
Parallel executionNot supported
Distributed executionRuns on Control Node (reducer)

Syntax

To predict anomalies using a pre-trained model:

anomalies [sample=INT] [size=INT] model=STR

To predict anomalies by querying training data with a subquery:

anomalies [sample=INT] [size=INT] [timeout=INT{s|m|h|d}] [core=INT] [scorer=INT] FIELD, ... [ SUBQUERY ]

Options

sample=INT
Number of samples to use for Isolation Forest training. Defaults to the square root of the number of training records.
size=INT
Number of trees in the Isolation Forest. Default: 100
model=STR
Name of a pre-trained Isolation Forest model. You can create and train models from the Sonar web console or the Logpresso shell.
timeout=INT{s|m|h|d}
Maximum time to wait for the subquery to complete. If the subquery does not finish within the specified time, it is cancelled. If omitted, the command waits indefinitely.
core=INT
Number of parallel threads to use for anomaly score calculation. If omitted, the command runs on a single thread.
scorer=INT
Assigns the top N fields that most influenced the anomaly score to the _scorer output field. If omitted, the _scorer field is not output.

Target

FIELD, ...
List of fields to use for Isolation Forest analysis. Separate multiple fields with commas (,).
[ SUBQUERY ]
Subquery that retrieves training data for the Isolation Forest. Enclose the subquery in square brackets ([ ]).

Output fields

FieldTypeDescription
_scoredoubleAnomaly score. A value between 0 and 1; values closer to 1 indicate a higher likelihood of being an anomaly.
_scorerarrayWhen scorer is specified, contains the top N fields and their importance scores that most influenced the anomaly score.

Error codes

Parse errors
Error codeMessageDescription
40804Machine learning license is required.Machine learning license is not present.
41100Enter a machine learning model name.The model option value is empty.
41101Machine learning model not found.No model with the specified name exists.
41102Machine learning model handler not found.No handler is found for the model type.
90204[ bracket is not matched.The subquery square bracket is not closed.
90206No subquery.No subquery is specified without the model option, or the subquery is empty.
Runtime errors

N/A

Description

The anomalies command calculates anomaly scores for input records using the Isolation Forest algorithm. Isolation Forest builds multiple decision trees that randomly partition data to distinguish normal data from anomalies. Anomalies are isolated with fewer partitions than normal data, so the anomaly score is derived from the average number of partitions required.

The anomaly score is assigned to the _score field as a value between 0 and 1:

  • Values closer to 1 indicate a higher likelihood of being an anomaly.
  • Values significantly below 0.5 are considered normal observations.
  • If all scores are close to 0.5, the data likely contains no anomalies.

When you use the model option, anomalies are predicted based on a pre-trained model. Without the model option, the command queries training data via a subquery and builds the model at query time.

Supported field value types include numeric, timestamp, IP address, and string. String values are internally encoded and converted to numeric vectors. Fields with null values are excluded from training and scoring.

In a distributed environment, the final score calculation runs on the Control Node.

Examples

  1. Predicting anomalies using a pre-trained model

    table duration=30d stocks
    | anomalies model=anomal_stock
    | eval anom = if(_score > 0.7, stocks, null)
    

    Calculates anomaly scores using the anomal_stock model and assigns the stocks value to the anom field for records with a score above 0.7.

  2. Predicting anomalies using training data from a subquery

    table duration=30d stocks
    | anomalies sample=256 stocks [
        csvfile /opt/logpresso/dataset/stock_train.csv
        | eval _time = date(date, "yyyyMMdd"), stocks = int(stocks)
        | fields _time, stocks
      ]
    | eval anom = if(_score > 0.65, stocks, null)
    | fields _time, anom, stocks
    

    Retrieves training data from a CSV file via a subquery to build an Isolation Forest model, then calculates anomaly scores for the stocks field with a sample limit of 256.

  3. Configuring tree count and analyzing top contributing fields

    json "[{'cpu': 80, 'mem': 60, 'disk': 30}, {'cpu': 95, 'mem': 90, 'disk': 85}, {'cpu': 40, 'mem': 50, 'disk': 20}]"
    | anomalies size=200 scorer=3 cpu, mem, disk [
        json "[{'cpu': 45, 'mem': 55, 'disk': 25}, {'cpu': 50, 'mem': 48, 'disk': 22}, {'cpu': 42, 'mem': 52, 'disk': 28}, {'cpu': 48, 'mem': 51, 'disk': 24}, {'cpu': 44, 'mem': 49, 'disk': 21}]"
      ]
    

    Sets the number of trees to 200 and uses scorer=3 to assign the top 3 fields that most influenced the anomaly score to the _scorer field.