forecast
Analyzes the trend of time series data to predict future values. Uses STL decomposition (Seasonal and Trend decomposition using Loess) to separate time series data into trend, seasonality, and residual components, then produces forecast values and confidence intervals based on the trend component.
Command properties
| Item | Description |
|---|---|
| Command type | Transforming |
| Required permission | None |
| License usage | N/A |
| Parallel execution | Not supported |
| Distributed execution | Runs on Data Node (mapper) |
Syntax
Options
period=INT- Period length of the time series data. If not specified, the period is automatically calculated using Spectral Density Estimation. If the period is less than 2, the data is treated as non-periodic and simple linear regression is applied.
count=INT- Number of records to forecast (default:
5) smoother={linear|loess|ets}- Smoothing method for the trend component (default:
ets)
linear: Linear regressionloess: LOESS (Locally Estimated Scatterplot Smoothing)ets: Exponential smoothing
normalize=BOOL- Whether to apply Box-Cox transformation. When set to
true, corrects for non-linearly increasing trends. May have a performance impact due to additional computation. (default:false) accumulate=BOOL- Whether to accumulate confidence intervals. When set to
true, the confidence interval widens as the forecast point moves further into the future. (default:true) confidence=INT- Confidence interval percentage (default:
95) seed=INT- Random seed value. Specify a seed value to reproduce the same results for the same input.
time=STR- Name of the time field. Used to calculate the time interval of the time series data. (default:
_time)
Target
FIELD- Field containing the time series data to forecast
[by FIELD, ...]- Group fields. When a
byclause is specified, forecasting is performed independently for records with the same group field value. Specify multiple fields separated by commas (,).
Input fields
| Field | Type | Required | Description |
|---|---|---|---|
| FIELD | numeric | Required | Time series value to forecast. If null, the forecast for that partition is skipped. |
| time field | timestamp | Required | Time value of the time series data. Forecast times are calculated based on the interval between the last two records. |
| by fields | Any | Optional | Group criteria fields |
Output fields
| Field | Type | Description |
|---|---|---|
| _trend | double | Trend component value of the input record |
| _future | double | Forecast value of the predicted record |
| _upper | double | Upper bound of the confidence interval for the predicted record |
| _lower | double | Lower bound of the confidence interval for the predicted record |
Error codes
Parse errors
| Error code | Message | Description |
|---|---|---|
| 40502 | forecast 명령의 그룹 필드가 누락되었습니다. | No group field is specified after the by clause |
| 40503 | forecast 명령의 대상 필드가 누락되었습니다. | The target field to forecast is not specified |
| 40804 | 머신러닝 라이선스가 필요합니다. | A machine learning license is required |
Runtime errors
N/A
Description
The forecast command collects all input records before performing time series forecasting. Input records are output as-is with the trend component value assigned to the _trend field, and forecasted records are appended after the input records with the _future, _upper, and _lower fields.
When the period value is 2 or more, STL decomposition is used to separate the time series data into trend, seasonality, and residual components. The trend component is then smoothed according to the smoother option, and seasonality is incorporated into the forecast. When the period is less than 2, the data is treated as non-periodic and simple linear regression is used.
The forecast record time values are calculated based on the interval between the last two records of the time field specified by the time option. It is recommended to prepare data with a consistent time interval using the timechart command before using this command.
Input data must have at least 5 records per partition, and the number of input records must be greater than twice the period value. If these conditions are not met, input records are output as-is.
When a by clause is used, forecasting is performed independently for records with the same group field value. Temporary files may be used internally for sorting by group.
Examples
-
Output forecast values for time series data
table duration=30d web_logs | timechart span=1d count | forecast countAggregates web logs from the past 30 days by day, then outputs 5 forecast values for the
countfield. -
Specify period and forecast count
table duration=90d web_logs | timechart span=1d sum(bytes) as traffic | forecast period=7 count=14 trafficSets the period to 7 (weekly pattern) and outputs 14 forecast data points.
-
Specify the time field and seed value
table duration=30d web_logs | timechart span=1h sum(bytes) as traffic | forecast period=24 time=_time seed=1234 trafficSets the period to 24 (daily pattern) and fixes the seed value to 1234 to produce reproducible results for the same input.
-
Forecast by group
table duration=30d web_logs | timechart span=1d count by method | forecast count=10 count by methodIndependently forecasts 10 values for the
countfield per group with the samemethodfield value. -
Apply Box-Cox transformation
table duration=90d web_logs | timechart span=1d sum(bytes) as traffic | forecast period=7 normalize=true trafficApplies Box-Cox transformation to correct for non-linearly increasing trends before forecasting.