# DeepAR: Revolutionizing Time-Series Forecasting with AI

Written on

## DeepAR: An Overview of Amazon’s Autoregressive Deep Network

Historically, time-series models were limited to analyzing individual sequences. This meant that when faced with multiple time-series, one had to either create a separate model for each sequence or transform the data into a tabular format to apply gradient-boosted tree models, which remain effective today.

The pioneering model that can handle multiple time-series natively is **DeepAR**, an autoregressive recurrent network developed by **Amazon**. This article delves into the mechanics of DeepAR and its significance in the time-series forecasting landscape.

If you're interested in exploring other deep learning models inspired by

DeepAR, check out this article:

The Best Deep Learning Models for Time Series Forecasting

- Everything you need to know about Time Series and Deep Learning
- towardsdatascience.com

## Understanding DeepAR

DeepAR represents a groundbreaking fusion of deep learning and traditional probabilistic forecasting methodologies. Here’s what makes **DeepAR** unique:

**Support for Multiple Time-Series**: The model is trained on a multitude of time-series, allowing it to learn overarching patterns that enhance its forecasting precision.**Incorporation of Extra Covariates**: DeepAR accommodates additional features (covariates). For example, in temperature forecasting, one could include factors like humidity-level and air-pressure.**Probabilistic Outputs**: Rather than providing a single prediction, the model utilizes**Monte Carlo samples**to generate prediction intervals.**Cold Forecasting Capabilities**: By leveraging knowledge from thousands of similar time-series, DeepAR can offer forecasts for series with minimal or no historical data.

## DeepAR's Use of LSTMs

DeepAR employs Long Short-Term Memory (LSTM) networks to produce probabilistic outputs. LSTMs are integral to various time-series forecasting models, including:

- Standard LSTMs
- Multi-stacked LSTMs
- LSTMs combined with CNNs
- LSTMs paired with Time2Vec
- LSTMs in encoder-decoder configurations
- LSTMs in encoder-decoder setups with attention mechanisms

While Transformers have gained prominence in NLP, they do not consistently outperform LSTMs in time-series tasks, primarily due to LSTMs' proficiency in managing local temporal data. For further insights on recurrent networks versus Transformers, refer to this article.

## Architecture of DeepAR

Unlike previous models, DeepAR employs LSTMs in a distinctive manner. Instead of directly calculating predictions, it uses LSTMs to parameterize a Gaussian likelihood function. Essentially, the model estimates the parameters (mean and standard deviation) of the Gaussian function.

Let’s walk through the training process. At time step t for time-series i:

- The LSTM receives the covariates x_i,t from the current time step and the target variable z_i,t-1 from the previous time step, along with the hidden state h_i,t-1.
- The LSTM cell then outputs its hidden state h_i,t for the next iteration.
- The parameters for the Gaussian likelihood function are computed indirectly from h_i,t.
- The model seeks the optimal parameters that yield predictions closest to the target variable z_i,t.
- The current target value z_i and hidden state h_i,t are passed to the next time step, continuing the training cycle. This autoregressive nature characterizes DeepAR.

The inference process mirrors training, but instead of using a target variable, the model utilizes the predicted variable ž_i,t-1 from the previous step to generate the new prediction ž_i,t.

## Understanding Gaussian Likelihood

Before examining DeepAR's autoregressive characteristics, it's crucial to grasp the concept of the likelihood function. If you are already familiar with this, you can skip ahead.

The aim of maximum likelihood estimation is to identify the best parameters for a distribution that accurately represents our sample data. Assuming our data follows a Gaussian distribution, each distribution is defined by the mean and standard deviation. The Gaussian likelihood is thus determined accordingly.

In the context of two Gaussian distributions, our goal is to find the optimal parameters that fit the provided data points, a task known as maximizing the Gaussian log-likelihood function.

## Parameter Estimation

Typically, the parameters are estimated using maximum log-likelihood estimators derived from the likelihood function. However, in DeepAR, we allow the LSTM and two Dense layers to derive these parameters based on inputs.

The parameter estimation process involves the following steps:

- The LSTM computes its hidden state h_i,t.
- This hidden state passes through a dense layer to calculate the mean.
- The same hidden state undergoes a second dense layer to compute the standard deviation.
- With these parameters, the model creates a Gaussian distribution and samples from it, checking the closeness of the sample to the actual observation.
- This concludes the training for the time step t, with the LSTM and dense layers adjusting during backpropagation.

During inference, the model generates predictions without a target variable, utilizing previously learned weights.

## Auto Scaling in DeepAR

Handling multiple heterogeneous time-series can be complex. For instance, in product sales forecasting, one product might have sales in the hundreds, while another could reach millions. This disparity may confuse the model.

To address this, DeepAR incorporates an **auto-scaling mechanism**, calculating an item-specific scaling factor to adjust the autoregressive inputs.

At each time step, the autoregressive inputs are first scaled by this factor.

While the auto-scaling mechanism in DeepAR is effective, it's advisable to normalize the time-series beforehand to enhance model performance.

## DeepAR's Role in Time-Series Forecasting

This section evaluates how DeepAR stacks up against other models and its limitations.

### Comparison with Statistical Models

Research indicates that DeepAR surpasses traditional statistical methods like **ARIMA**. A notable advantage is that it does not necessitate additional preprocessing of features, such as ensuring the time-series is stationary first.

Amazon has since introduced an updated model, **DeepVAR**, which further enhances performance and will be discussed in a future article.

### Comparison with Deep Learning Models

Since DeepAR's debut, numerous deep learning models for time-series forecasting have emerged. Not all can be directly compared to DeepAR due to differing methodologies. The **Temporal Fusion Transformer (TFT)** is among the closest alternatives.

Two key distinctions between DeepAR and TFT are:

**Multiple Time-Series Handling**: DeepAR generates separate embeddings for each time-series, which aids in distinguishing them during LSTM processing. TFT also utilizes LSTMs but conditions the initial hidden state of the LSTM on these embeddings, maintaining temporal dynamics.**Forecasting Type**: TFT is classified as a**multi-horizon forecasting model**, producing predictions in one go rather than sequentially, as seen in autoregressive models. This capability allows TFT to generate forecasts even for future time steps without corresponding covariate values.

## Closing Thoughts

DeepAR is a significant advancement in deep learning for time-series forecasting, marking a pivotal moment for the field. It is widely used in practice, integrated into Amazon's **GluonTS** toolkit for time-series forecasting, and can be trained on Amazon SageMaker.

In the upcoming article, we will implement an end-to-end project using DeepAR. Stay tuned!

## Thank You for Reading!

I publish a comprehensive analysis of impactful AI papers monthly.
**Stay connected!**
- Subscribe to my newsletter!
- Follow me on LinkedIn!

## References

[1] Created with Stable Diffusion, CreativeML Open RAIL-M license. Text prompt: “a nebula traveling through space, digital art, illustration”

[2] D. Salinas et al. *DeepAR: Probabilistic forecasting with autoregressive recurrent networks*, International Journal of Forecasting (2019).

[3] Yonghui Wu et al. *Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation* (2016)

[4] D. Salinas et al. *High-Dimensional Multivariate Forecasting with Low-Rank Gaussian Copula Processes*, International Journal of Forecasting (2019).

[5] Bryan Lim et al. *Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting* (International Journal of Forecasting December 2021)

[6] The GluonTS package by Amazon, https://ts.gluon.ai/stable/api/gluonts/gluonts.model.deepar.html