forecast

Analyzes the trend of time series data to predict future values. Uses STL decomposition (Seasonal and Trend decomposition using Loess) to separate time series data into trend, seasonality, and residual components, then produces forecast values and confidence intervals based on the trend component.

Command properties

ItemDescription
Command typeTransforming
Required permissionNone
License usageN/A
Parallel executionNot supported
Distributed executionRuns on Data Node (mapper)

Syntax

forecast [period=INT] [count=INT] [smoother={linear|loess|ets}] [normalize=BOOL] [accumulate=BOOL] [confidence=INT] [seed=INT] [time=STR] FIELD [by FIELD, ...]

Options

period=INT
Period length of the time series data. If not specified, the period is automatically calculated using Spectral Density Estimation. If the period is less than 2, the data is treated as non-periodic and simple linear regression is applied.
count=INT
Number of records to forecast (default: 5)
smoother={linear|loess|ets}
Smoothing method for the trend component (default: ets)
  • linear: Linear regression
  • loess: LOESS (Locally Estimated Scatterplot Smoothing)
  • ets: Exponential smoothing
normalize=BOOL
Whether to apply Box-Cox transformation. When set to true, corrects for non-linearly increasing trends. May have a performance impact due to additional computation. (default: false)
accumulate=BOOL
Whether to accumulate confidence intervals. When set to true, the confidence interval widens as the forecast point moves further into the future. (default: true)
confidence=INT
Confidence interval percentage (default: 95)
seed=INT
Random seed value. Specify a seed value to reproduce the same results for the same input.
time=STR
Name of the time field. Used to calculate the time interval of the time series data. (default: _time)

Target

FIELD
Field containing the time series data to forecast
[by FIELD, ...]
Group fields. When a by clause is specified, forecasting is performed independently for records with the same group field value. Specify multiple fields separated by commas (,).

Input fields

FieldTypeRequiredDescription
FIELDnumericRequiredTime series value to forecast. If null, the forecast for that partition is skipped.
time fieldtimestampRequiredTime value of the time series data. Forecast times are calculated based on the interval between the last two records.
by fieldsAnyOptionalGroup criteria fields

Output fields

FieldTypeDescription
_trenddoubleTrend component value of the input record
_futuredoubleForecast value of the predicted record
_upperdoubleUpper bound of the confidence interval for the predicted record
_lowerdoubleLower bound of the confidence interval for the predicted record

Error codes

Parse errors
Error codeMessageDescription
40502forecast 명령의 그룹 필드가 누락되었습니다.No group field is specified after the by clause
40503forecast 명령의 대상 필드가 누락되었습니다.The target field to forecast is not specified
40804머신러닝 라이선스가 필요합니다.A machine learning license is required
Runtime errors

N/A

Description

The forecast command collects all input records before performing time series forecasting. Input records are output as-is with the trend component value assigned to the _trend field, and forecasted records are appended after the input records with the _future, _upper, and _lower fields.

When the period value is 2 or more, STL decomposition is used to separate the time series data into trend, seasonality, and residual components. The trend component is then smoothed according to the smoother option, and seasonality is incorporated into the forecast. When the period is less than 2, the data is treated as non-periodic and simple linear regression is used.

The forecast record time values are calculated based on the interval between the last two records of the time field specified by the time option. It is recommended to prepare data with a consistent time interval using the timechart command before using this command.

Input data must have at least 5 records per partition, and the number of input records must be greater than twice the period value. If these conditions are not met, input records are output as-is.

When a by clause is used, forecasting is performed independently for records with the same group field value. Temporary files may be used internally for sorting by group.

Examples

  1. Output forecast values for time series data

    table duration=30d web_logs
    | timechart span=1d count
    | forecast count
    

    Aggregates web logs from the past 30 days by day, then outputs 5 forecast values for the count field.

  2. Specify period and forecast count

    table duration=90d web_logs
    | timechart span=1d sum(bytes) as traffic
    | forecast period=7 count=14 traffic
    

    Sets the period to 7 (weekly pattern) and outputs 14 forecast data points.

  3. Specify the time field and seed value

    table duration=30d web_logs
    | timechart span=1h sum(bytes) as traffic
    | forecast period=24 time=_time seed=1234 traffic
    

    Sets the period to 24 (daily pattern) and fixes the seed value to 1234 to produce reproducible results for the same input.

  4. Forecast by group

    table duration=30d web_logs
    | timechart span=1d count by method
    | forecast count=10 count by method
    

    Independently forecasts 10 values for the count field per group with the same method field value.

  5. Apply Box-Cox transformation

    table duration=90d web_logs
    | timechart span=1d sum(bytes) as traffic
    | forecast period=7 normalize=true traffic
    

    Applies Box-Cox transformation to correct for non-linearly increasing trends before forecasting.