anomalies
Calculates an anomaly score for each record using the Isolation Forest algorithm. The command samples a portion of the data to build a decision tree model, then computes a score between 0 and 1 that indicates how different each input record is from normal data.
Command properties
| Item | Description |
|---|---|
| Command type | Processing query |
| Required permission | None |
| License usage | N/A |
| Parallel execution | Not supported |
| Distributed execution | Runs on Control Node (reducer) |
Syntax
To predict anomalies using a pre-trained model:
To predict anomalies by querying training data with a subquery:
Options
sample=INT- Number of samples to use for Isolation Forest training. Defaults to the square root of the number of training records.
size=INT- Number of trees in the Isolation Forest. Default:
100 model=STR- Name of a pre-trained Isolation Forest model. You can create and train models from the Sonar web console or the Logpresso shell.
timeout=INT{s|m|h|d}- Maximum time to wait for the subquery to complete. If the subquery does not finish within the specified time, it is cancelled. If omitted, the command waits indefinitely.
core=INT- Number of parallel threads to use for anomaly score calculation. If omitted, the command runs on a single thread.
scorer=INT- Assigns the top N fields that most influenced the anomaly score to the
_scoreroutput field. If omitted, the_scorerfield is not output.
Target
FIELD, ...- List of fields to use for Isolation Forest analysis. Separate multiple fields with commas (
,). [ SUBQUERY ]- Subquery that retrieves training data for the Isolation Forest. Enclose the subquery in square brackets (
[ ]).
Output fields
| Field | Type | Description |
|---|---|---|
| _score | double | Anomaly score. A value between 0 and 1; values closer to 1 indicate a higher likelihood of being an anomaly. |
| _scorer | array | When scorer is specified, contains the top N fields and their importance scores that most influenced the anomaly score. |
Error codes
Parse errors
| Error code | Message | Description |
|---|---|---|
| 40804 | Machine learning license is required. | Machine learning license is not present. |
| 41100 | Enter a machine learning model name. | The model option value is empty. |
| 41101 | Machine learning model not found. | No model with the specified name exists. |
| 41102 | Machine learning model handler not found. | No handler is found for the model type. |
| 90204 | [ bracket is not matched. | The subquery square bracket is not closed. |
| 90206 | No subquery. | No subquery is specified without the model option, or the subquery is empty. |
Runtime errors
N/A
Description
The anomalies command calculates anomaly scores for input records using the Isolation Forest algorithm. Isolation Forest builds multiple decision trees that randomly partition data to distinguish normal data from anomalies. Anomalies are isolated with fewer partitions than normal data, so the anomaly score is derived from the average number of partitions required.
The anomaly score is assigned to the _score field as a value between 0 and 1:
- Values closer to 1 indicate a higher likelihood of being an anomaly.
- Values significantly below 0.5 are considered normal observations.
- If all scores are close to 0.5, the data likely contains no anomalies.
When you use the model option, anomalies are predicted based on a pre-trained model. Without the model option, the command queries training data via a subquery and builds the model at query time.
Supported field value types include numeric, timestamp, IP address, and string. String values are internally encoded and converted to numeric vectors. Fields with null values are excluded from training and scoring.
In a distributed environment, the final score calculation runs on the Control Node.
Examples
-
Predicting anomalies using a pre-trained model
table duration=30d stocks | anomalies model=anomal_stock | eval anom = if(_score > 0.7, stocks, null)Calculates anomaly scores using the
anomal_stockmodel and assigns thestocksvalue to theanomfield for records with a score above 0.7. -
Predicting anomalies using training data from a subquery
table duration=30d stocks | anomalies sample=256 stocks [ csvfile /opt/logpresso/dataset/stock_train.csv | eval _time = date(date, "yyyyMMdd"), stocks = int(stocks) | fields _time, stocks ] | eval anom = if(_score > 0.65, stocks, null) | fields _time, anom, stocksRetrieves training data from a CSV file via a subquery to build an Isolation Forest model, then calculates anomaly scores for the
stocksfield with a sample limit of 256. -
Configuring tree count and analyzing top contributing fields
json "[{'cpu': 80, 'mem': 60, 'disk': 30}, {'cpu': 95, 'mem': 90, 'disk': 85}, {'cpu': 40, 'mem': 50, 'disk': 20}]" | anomalies size=200 scorer=3 cpu, mem, disk [ json "[{'cpu': 45, 'mem': 55, 'disk': 25}, {'cpu': 50, 'mem': 48, 'disk': 22}, {'cpu': 42, 'mem': 52, 'disk': 28}, {'cpu': 48, 'mem': 51, 'disk': 24}, {'cpu': 44, 'mem': 49, 'disk': 21}]" ]Sets the number of trees to 200 and uses
scorer=3to assign the top 3 fields that most influenced the anomaly score to the_scorerfield.