Learning Objectives
At the conclusion of this lesson, you will be able to:
- define the need for forecasting
- distinguish forecasting from predictive modeling
- create moving average, exponential smoothing, and trend models
- account for trend and seasonality
- evaluate models using MAE, MSE, and MAPE
- tune model hyperparameters
- construct ensemble models
Introduction
Forecasting is a specific way to build predictive models that rely on trends in time series data. They are often used in finance, economics, and various branches of science where predicting future values based on historical time series data is essential and feasible. Forecasting involves making predictions about future events based on historical data. The primary goal is to identify patterns and trends within the data that can be extrapolated to predict future outcomes. Accurate forecasting can guide decision-making processes in various fields, from stock market analysis to weather prediction and inventory management.
In this lesson, we will delve into the fundamental principles of forecasting, explore various forecasting methods, and demonstrate practical implementations using R. Many of the techniques explored in this lesson also apply to constructed predictive models for classification and regression using supervised machine learning.
Forecasting and predictive modeling are related concepts in data science and machine learning, but they have some key differences. Forecasting primarily focuses on estimating future values of a specific variable or set of variables based on historical data and trends. It is often used for time series data and is particularly common in fields like economics, finance, and weather prediction. Predictive modeling, on the other hand, is a broader concept that involves using statistical algorithms and machine learning techniques to identify patterns in data and make predictions about future outcomes or to classify new instances. It can be applied to a wide range of problems, not just time series data. Nevertheless, in both modeling efforts data is used to “train a model” on historical data to make a prediction or derive an estimate.
Forecasting typically deals with time series data, where observations are collected sequentially over time. The temporal aspect is crucial, and the order of data points matters. Predictive modeling can work with various types of data, including structured (tabular), unstructured (text, images), and time series data. For predictive modeling, the order of data points may or may not be relevant, depending on the specific problem, but it certainly matters for forecasting.
Forecasting often employs simple methods such as weighted averages, statistical methods like ARIMA (AutoRegressive Integrated Moving Average), exponential smoothing, and more advanced techniques like Prophet or LSTM (Long Short-Term Memory) neural networks for complex time series. On the other hand, predictive modeling encompasses a broader range of algorithms, including regression, classification, clustering, and deep learning techniques. It may use methods like nearest neighbors, Bayesian networks, decision trees, random forests, support vector machines, neural networks, and various ensemble methods.
Forecasting typically focuses on predicting future values at specific time points or over a defined period. The time horizon can be short-term, medium-term, or long-term, depending on the application. Predictive modeling doesn’t necessarily have a fixed time horizon, instead it estimates a numeric value or assigns some observation to a class.
Forecasting often emphasizes quantifying uncertainty in predictions, providing prediction intervals along with point estimates. Predictive modeling may or may not explicitly quantify uncertainty, depending on the algorithm used. Some models provide probability estimates or confidence scores, while others may only give point predictions.
Forecasting models often prioritize interpretability, as understanding the factors driving future trends is essential in many applications. Predictive modeling can range from highly interpretable (e.g., multiple regression, decision trees) to more black-box approaches (e.g., deep neural networks), depending on the complexity of the problem and the chosen algorithm.
In practice, there’s often an overlap between forecasting and predictive modeling, especially when dealing with time series data. Many modern approaches combine elements of both to create more powerful and flexible models.
Time Series Data
Central to forecasting is time series data, which is a sequence of data points collected or recorded at specific time intervals. Time series data can exhibit various components such as trend, seasonality, and noise. Understanding these components is critical for building effective forecasting models.
- Trend: The long-term movement in the data.
- Seasonality: Regular, repeating patterns or cycles in the data.
- Noise: Random variability in the data that cannot be explained by the model.
Seasonality refers to the presence of regular, predictable patterns or cycles that repeat over a specific period within a time series dataset. These patterns are often influenced by external factors such as weather, holidays, or economic cycles, and they can significantly affect the behavior of the data. Seasonality adjustment is an essential concept in time series forecasting because it helps identify and account for these periodic fluctuations, leading to more accurate predictions.
Trend in forecasting refers to the long-term movement or direction in time series data over a period. It represents the underlying pattern of growth or decline that persists over time, ignoring short-term fluctuations and noise. Identifying and modeling trends is paramount for accurate time series forecasting, as it helps in understanding the general direction in which the data is moving.
The components are often discovered through visualization. For example, the line chart below shows a time series of average daily water usage over the course of several months.
# load data from CSV
csv <- "https://s3.us-east-2.amazonaws.com/artificium.us/datasets/waterusage.csv"
water.usage.ts <- read.csv(csv,
stringsAsFactors = F)
library(ggplot2)
# Convert 'period' to a factor to maintain the order in the plot
water.usage.ts$period <- as.factor(water.usage.ts$period)
# Line plot for daily usage over periods
ggplot(water.usage.ts, aes(x = period, y = daily, group = 1)) +
geom_line(color = "green") +
geom_point(color = "orange") +
labs(title = "Average Daily Water Usage Over Time", x = "Period", y = "Daily Usage (ltr)") +
theme_minimal()
The time series shows some peaks and valleys which might indicate seasonality, but there is no overall upward or downward trend. These characteristics will determine which forecasting model will be most appropriate.
Forecasting Models
A forecasting model uses a series/sequence of data values measuring some event over time (referred to as time series data) and using the sequence to “estimate” the next few values in the sequence. Forecasting solely focuses on the trend of the data values over time, while other models include additional data. For example, if a real estate agent has average monthly home sales prices for the past fifteen years, they can use a forecasting model to “predict” the average sales price for the next few months. But that “prediction” would not take into account any other factors, so it would not be useful to predict the likely selling price for a specific home having a specific lot size, living area, pool, age of home, etc. A predictive model constructed using a machine learning algorithm would take those other “features” into account; those types of models are often referred to as causal models.
Several mathematical methods exist for forecasting, ranging from simple extrapolation techniques to complex machine learning models. In this lesson, we will focus on the following common approaches:
- Simple Moving Averages (SMA)
- Weighted Moving Averages (WMA)
- Simple Exponential Smoothing (SES)
- Holt’s Linear Trend Model (HLTM)
- Linear Regression Trend Model (LRTM)
- AutoRegressive Integrated Moving Average (ARIMA)
In addition, we will explain how to tune and evaluate a model, account for trend and seasonality in the data, combine multiple forecasting methods into an ensemble, and provide a forecast interval in addition to a point estimate.
Sample Data
The examples in the ensuing sections use the time series data in the CSV file wateruage.csv. The first few rows are shown below:
# load data from CSV
csv <- "https://s3.us-east-2.amazonaws.com/artificium.us/datasets/waterusage.csv"
water.usage.ts <- read.csv(csv,
stringsAsFactors = F)
# display first few rows
knitr::kable(head(water.usage.ts, 4),
format = "simple",
align = "c")
1 |
Aug |
2023 |
13 |
32 |
406 |
2 |
Sep |
2023 |
10 |
30 |
333 |
3 |
Oct |
2023 |
10 |
33 |
303 |
4 |
Nov |
2023 |
20 |
30 |
665 |
The data file contains average daily water use (in liters) over the course of several months.
Simple Moving Average
This is the unweighted mean of the previous \(n\) data points. A Simple Moving Average (SMA) is a basic approach to time series analysis which smoothes out short-term fluctuations. It is calculated by averaging a fixed number of past observations.
The SMA for a given time period is calculated by taking the arithmetic mean of the last \(n\) observations. It can be represented mathematically as:
\(\text{SMA}_t = \frac{1}{n} \sum_{i=0}^{n-1} y_{t-i}\)
where:
- \(SMA_t\) is the Simple Moving Average at time \(t\)
- \(n\) is the number of periods over which the average is calculated
- \(y_{t-i}\) are the values of the time series at time \(t, t-1, t-2, \ldots, t-(n-1)\)
For example, suppose we have a time series of daily water usage for the past 7 days: [110, 90, 100, 120, 140, 160, 180]. To calculate the 3-day SMA at the end of this series to get a forecast for time period 8 (the next time period):
\(t(8) = \frac{140 + 160 + 180}{3} = \frac{480}{3} = 160\)
Naturally, this means that the forecast can never be larger than the largest value in the forecast calculation, so any upward (or downward) trend that might exist would be smoothed out. Consequently, SMA is not an appropriate method for calculating a forecast when there is a trend as it simply smoothes out fluctuations.
By calculating the arithmetic mean of a fixed number of past observations, SMA provides a clear and easily interpretable measure of central tendency in the data. The mathematical simplicity and ease of implementation make SMA a popular choice for initial data analysis and trend identification.
Calculating SMA in R
Let’s calculate the SMA for the daily
column in the waterusage.csv
dataset using the SMA
function from the TTR
package to compute the moving average for the next time period.
Here’s the R code to perform these steps:
library(TTR)
csv <- "https://s3.us-east-2.amazonaws.com/artificium.us/datasets/waterusage.csv"
# load data from CSV
water.usage.ts <- read.csv(csv,
stringsAsFactors = F)
# Convert 'daily' to a time series object
daily.ts <- ts(water.usage.ts$daily,
start = c(2023, 8),
frequency = 12)
# Calculate the 3-period Simple Moving Average
sma_3 <- SMA(daily.ts, n = 3)
The function SMA
calculates the 3-day moving average for all time periods (except the first two as it would be impossible to do so). It requires a “time series object” which is produced by the function ts
. In the above code we start the time series at month 8 in 2023 because that is the oldest month in the data set water.usage.ts. We use frequency = 12
because our data set from the CSV has 12 time periods.
The last value in the vector of forecasts is the forecast for the next time period. So the forecast for August 2024 would be in sma_3[12]
and is 494.3.
Weighted Moving Average
A Weighted Moving Average (WMA) is a type of moving average where each data point is assigned a different weight, with more recent observations typically given higher weights. This method gives more importance to recent data points, making it more responsive to changes in the data compared to the Simple Moving Average (SMA).
The WMA for a given time period \(n\) is calculated by assigning a weight to each data point and then taking the weighted average. The weights typically sum to 1 and are written as fractions, although they do not have. The benefit of writing them as fractions that sum to 1 is that the sum of the weights is always 1 making the calculation simpler. If the weights do not sum to 1, then it is important to divide by the sum of the weights.
\(WMA_t = \frac{\sum_{i=0}^{n-1} w_i y_{t-i}}{\sum_{i=0}^{n-1} w_i}\)
where:
- \(WMA_t\) is the Weighted Moving Average at time \(t\)
- \(y_{t-i}\) are the values of the time series at times \(t, t-1, t-2, \ldots, t-(n-1)\)
- \(w_i\) are the weights assigned to each observation
Suppose we have the following data points: [100, 120, 140, 160, 180], and we want to calculate the 3-period WMA with weights 0.1, 0.3, and 0.6 for the most recent three observations.
The calculation for \(WMA_5\) (the WMA at the fifth data point) would be:
\(WMA_5 = \frac{0.1 \times 140 + 0.3 \times 160 + 0.6 \times 180}{0.1 + 0.3 + 0.6}\)
\(WMA_5 = \frac{14 + 48 + 108}{1}\)
\(WMA_5 = 170.0\)
Calculating WMA in R
Let’s implement the Weighted Moving Average using the waterusage.csv
dataset. The forecast package in R does not have a dedicated function specifically for calculating a Weighted Moving Average (WMA) forecast, although the package provides various functions for other types of moving averages, exponential smoothing, and ARIMA models which we will explore later in this lesson.
To calculate a Weighted Moving Average, we can either use the WMA
function from the TTR package or write a custom function.
Custom Function for Calculating WMA
The calculate.wma
function takes three arguments: the data, the number of periods \(n\), and the weights. It initializes a vector wma
with NA
values to store the WMA results. A loop iterates over the data, calculating the WMA for each position where there are enough data points. The WMA is calculated by taking the weighted sum of the last \(n\) observations and dividing by the sum of the weights.
The weights for the 3-period WMA are defined as c(0.1, 0.3, 0.6)
, with the most recent observation given the highest weight. The results are stored in a new column WMA
in the water_usage
data frame.
# Load the data
csv <- "https://s3.us-east-2.amazonaws.com/artificium.us/datasets/waterusage.csv"
water.usage <- read.csv(csv,
stringsAsFactors = F,
header = T)
# Function to calculate the Weighted Moving Average
calculate.wma <- function(data, n, weights) {
# Initialize WMA column with NA values
wma <- rep(NA, length(data))
for (i in n:length(data)) {
wma[i] <- sum(weights * data[(i-n+1):i]) / sum(weights)
}
return(wma)
}
# Define the weights for the WMA (3-period example)
w <- c(0.1, 0.3, 0.6)
# Calculate the WMA
WMA <- calculate.wma(water.usage$daily,
n = 3,
weights = w)
forecast.val <- tail(WMA,1)
The forecast is in the last element of the returned vector of forecasts (it actually calculates the forecast for all prior values) and is 496.9.
Using WMA
Function from TTR
The WMA
function from the TTR package calculates the Weighted Moving Average based on the defined weights. The calculated WMA is added as a new column wma
in the data frame. The forecast for the next period is made using the last value of the WMA. This is a simple approach where the next forecast value is assumed to be the same as the most recent WMA value.
library(TTR)
csv <- "https://s3.us-east-2.amazonaws.com/artificium.us/datasets/waterusage.csv"
# Load the data
water.usage <- read.csv(csv,
stringsAsFactors = F)
# Define the weights for the WMA (example with 3-period weights)
weights <- c(0.1, 0.3, 0.6)
# Calculate the Weighted Moving Average
wma <- TTR::WMA(water.usage$daily, n = length(weights), wts = weights)
# Forecast using the last value of WMA
forecast.val <- tail(wma, 1)
The forecast is in the last element of the returned vector of forecasts (it actually calculates the forecast for all prior values) and is 496.9 – the same as with our custom function (as we would expect.)
The TTR package has a number of other forecasting methods, such as Exponential Weighted Mean, so check it out.
The Weighted Moving Average (WMA) is an effective method for smoothing time series data, giving more importance to recent observations. Naturally, given that it is an average, WMA does not account for either trend or seasonality, although a seasonality adjustment can be added as explained later in this lesson.
Lecture: Moving Average
In the lecture below, Dr. Martin Schedlbauer of Khoury Boston demonstrates the calculation of weighted moving average directly in R without the use of a package.
Simple Exponential Smoothing
Simple Exponential Smoothing (SES) is a time series forecasting method for univariate data that can be used to forecast data with no clear trend or seasonal pattern. SES uses weighted averages of past observations, with more recent observations given higher weights. The method applies exponentially decreasing weights to past data, making it more responsive to recent changes. This method is particularly useful for data with a clear trend and seasonal patterns.
The forecast at time \(t+1\), the next time period, is given by:
\(F_{t+1} = \alpha Y_t + (1 - \alpha) F_t\)
where:
- \(F_{t+1}\) is the forecast for the next period
- \(Y_t\) is the actual value at time \(t\)
- \(F_t\) is the forecast for time \(t\)
- \(\alpha\) (alpha) is the smoothing parameter, \(0 \leq \alpha \leq 1\)
The initial forecast \(F_1\) can be set to the first observation \(Y_1\) or the average of the first few observations.
From the formula above, we can see that it is recursive and requires starting at the first time period.
The SES formula can be applied recursively as:
\(F_{t+1} = \alpha Y_t + (1 - \alpha) (\alpha Y_{t-1} + (1 - \alpha) F_{t-1})\)
This recursive nature means that we apply exponentially decreasing weights to past observations.
Let’s consider a small dataset for illustration. Suppose we have the following data points: [10, 12, 14, 13, 15]. Assume we choose \(\alpha = 0.3\) (a common value). The initial forecast \(F_1\) is set to \(Y_1\), which is 10.
Initial Forecast:
\(F_1 = Y_1 = 10\)
First Forecast:
\(F_2 = \alpha Y_1 + (1 - \alpha) F_1\)
\(F_2 = 0.3 \times 10 + 0.7 \times 10\)
\(F_2 = 10\)
Second Forecast:
\(F_3 = \alpha Y_2 + (1 - \alpha) F_2\)
\(F_3 = 0.3 \times 12 + 0.7 \times 10\)
\(F_3 = 10.6\)
Third Forecast:
\(F_4 = \alpha Y_3 + (1 - \alpha) F_3\)
\(F_4 = 0.3 \times 14 + 0.7 \times 10.6\)
\(F_4 = 11.62\)
Fourth Forecast:
\(F_5 = \alpha Y_4 + (1 - \alpha) F_4\)
\(F_5 = 0.3 \times 13 + 0.7 \times 11.62\)
\(F_5 = 12.134\)
The “ideal” value for \(\alpha\) is found by iteratively trying various values and seeing which model, when applied to its prior observations (called “backfitting”), results in the best fit. The section on Model Evaluation shows this approach in more detail.
Calculating SES in R
Let’s calculate SES for the waterusage.csv
data to forecast daily water usage. We will use the HoltWinters
function from the forecast package. For this function to calculate Simple Exponential Smoothing requires setting the beta and gamma argument to FALSE. Notice that once again we need to convert the data to a “time series” object of type ts.
library(forecast)
csv <- "https://s3.us-east-2.amazonaws.com/artificium.us/datasets/waterusage.csv"
water_usage <- read.csv(csv,
stringsAsFactors = F)
# convert 'daily' to a time series object
daily.ts <- ts(water_usage$daily,
start = c(2023, 8),
frequency = 12)
# create SES model using HoltWinters
# with beta and gamma set to FALSE
ses.model <- HoltWinters(daily.ts,
beta = FALSE,
gamma = FALSE)
# Forecast the next period (next month)
forecast.val <- forecast(ses.model, h = 1)
The SES forecasting model estimates the average daily water use for the next month to be 406 with a 95% prediction range from 172 to 640.
The details of the model can be inspecting by using the function summary(ses.model)
or by accessing its components:
forecast.val$mean[[1]]
is the forecast value
forecast.val$lower[[2]]
is the lower bound of the 95% prediction interval
forecast.val$upper[[2]]
is the upper bound of the 95% prediction interval
The forecast()
function also return the 80% prediction interval.
Lecture: Exponential Smoothing
In the lecture below, Dr. Martin Schedlbauer of Khoury Boston demonstrates the calculation of weighted moving average directly in R without the use of a package.
To summarize, Simple Exponential Smoothing (SES) is an effective method for forecasting time series data that lacks a clear trend or seasonal pattern. By applying exponentially decreasing weights to past observations, SES provides a smooth forecast that emphasizes recent changes in the data. The HoltWinters
function makes calculating an SES forecast quite simple.
Holt’s Linear Trend Model
Holt’s Linear Trend Model (HLTM), also known as Holt’s Exponential Smoothing, is an extension of Simple Exponential Smoothing that allows forecasting of data with a simple linear trend. It incorporates two equations: one for the level and one for the trend. This method is useful when the data exhibits a trend but no seasonality.
Level Equation:
\(L_t = \alpha Y_t + (1 - \alpha) (L_{t-1} + T_{t-1})\)
Trend Equation:
\(T_t = \beta (L_t - L_{t-1}) + (1 - \beta) T_{t-1}\)
The forecast equation for the next h time periods is:
\(F_{t+h} = L_t + h T_t\)
where:
- \(L_t\) is the estimated level at time \(t\)
- \(T_t\) is the estimated trend at time \(t\)
- \(Y_t\) is the actual value at time \(t\)
- \(\alpha\) is the smoothing parameter for the level, \(0 \leq \alpha \leq 1\)
- \(\beta\) is the smoothing parameter for the trend, \(0 \leq \beta \leq 1\)
- \(h\) is the number of periods ahead to be forecasted.
Let’s look at an example. Suppose we have the following data points for a series: [10, 12, 14, 13, 15] and we choose \(\alpha = 0.5\) and \(\beta = 0.5\). Let’s calculate the level and trend for these points.
Initial Level and Trend:
\(L_1 = Y_1 = 10\)
\(T_1 = Y_2 - Y_1 = 2\)
First Update:
\(L_2 = \alpha Y_2 + (1 - \alpha) (L_1 + T_1)\)
\(L_2 = 0.5 \times 12 + 0.5 \times (10 + 2)\)
\(L_2 = 12\)
\(T_2 = \beta (L_2 - L_1) + (1 - \beta) T_1\)
\(T_2 = 0.5 \times (12 - 10) + 0.5 \times 2\)
\(T_2 = 2\)
Second Update:
\(L_3 = \alpha Y_3 + (1 - \alpha) (L_2 + T_2)\)
\(L_3 = 0.5 \times 14 + 0.5 \times (12 + 2)\)
$L_3 = 14 $
\(T_3 = \beta (L_3 - L_2) + (1 - \beta) T_2\)
\(T_3 = 0.5 \times (14 - 12) + 0.5 \times 2\) $T_3 = 2$
The level and trend are updated at each step, and the forecast can be made using these updated values. Once again, the model is recursive and requires starting at the first observation. The first forecast can be set to the first observed value or we could start at observation three and make its first forecast the average of the first two observed values.
Calculating HLTM in R
Let’s implement Holt’s Linear Trend Model using the waterusage.csv
data to forecast daily water usage. We will choose \(\alpha = 0.3\) and \(\beta = 0.5\).
library(forecast)
# convert 'daily' to a time series object
daily.ts <- ts(water_usage$daily, start = c(2023, 8), frequency = 12)
# apply Holt's Linear Trend Model using HoltWinters
# with gamma set to FALSE
holt.model <- HoltWinters(daily.ts,
alpha = 0.3,
beta = 0.5,
gamma = FALSE)
# Forecast the next period (next month)
forecast.val <- forecast(holt.model, h = 1)
The Holt’s Linear Trend Model estimates the average daily water use for the next month to be 510 with a 95% prediction range from 183 to 838.
The details of the model can be inspecting by using the function summary(holt.model)
or by accessing its components:
forecast.val$mean[[1]]
is the forecast value
forecast.val$lower[[2]]
is the lower bound of the 95% prediction interval
forecast.val$upper[[2]]
is the upper bound of the 95% prediction interval
The forecast()
function also returns the residuals and the 80% prediction interval.
Holt’s Linear Trend Model extends simple exponential smoothing to capture linear trends in time series data. By incorporating both level and trend components, it provides a more accurate forecast for data with trends but no seasonality.
Linear Regression Trend Model
Trend in forecasting refers to the long-term movement or direction in a time series data over a period. It represents the underlying pattern of growth or decline that persists over time, ignoring short-term fluctuations and noise. Identifying and modeling trends is crucial for accurate time series forecasting, as it helps in understanding the general direction in which the data is moving.
Trends capture the general direction of the data over a long period. This movement can be upward, downward, or flat (no trend). There may be noise in the data and values may move up or down, but over the long-term, if they remain stable, then there is no trend. Trends are persistent patterns that last for a significant period, distinguishing them from short-term variations.
We generally recognize three types of trends, which can be represented mathematically in various forms depending on the nature of the trend:
Linear Trend: A constant rate of increase or decrease over time.
\(T_t = a + bt\)
where:
- \(T_t\) is the trend component at time \(t\)
- \(a\) is the intercept
- \(b\) is the slope (i.e., rate of change)
Exponential Trend: A rate of increase or decrease that accelerates over time.
\(T_t = ae^{bt}\)
where \(e\) is the base of the natural logarithm.
Polynomial Trend: More complex trends that can change direction over time, modeled as higher-order polynomials.
\(T_t = a + bt + ct^2 + dt^3 + \ldots\)
where \(a, b, c, d, \ldots\) are coefficients of the polynomial.
Trends can be identified through visual inspection of time series plots or by using statistical methods like moving averages and regression analysis. Plotting the data helps in visualizing the long-term direction, while regression techniques can quantify the trend mathematically. For example, if we have monthly sales data for a retail store over several years, and we observe that sales are generally increasing, then we can use a linear regression trend model to express the trend mathematically and provide useful forecasts.
Calculating LRTM in R
Let’s use the waterusage.csv
dataset to identify and model the trend in daily water usage using the lm()
function.
# Load necessary libraries
library(forecast)
csv <- "https://s3.us-east-2.amazonaws.com/artificium.us/datasets/waterusage.csv"
# Load the data
water.usage <- read.csv(csv,
stringsAsFactors = F)
# fit a linear trend model using linear regression
trend.model <- lm(water.usage$daily ~ water.usage$period)
# Forecast future values using the trend model
forecast.val <- predict(trend.model,
newdata = data.frame(period = 13))
The Linear Regression Trend Model forecasting model estimates the average daily water use for the next month to be 468.
The details of the model can be inspecting by using the function summary(ses.model)
or by accessing its components:
trend.model$coefficients[[1]]
is the coefficient of the regression equation
trend.model$coefficients[[2]]
is the slope of the regression equation
The equation for the linear regression trend model is:
\(F(t) = 379.939 + 7.304t\)
Plugging in \(t=12\) results in 4566.6, the same value as we got from the predict()
function.
Lecture: Trend Line
In the lecture below, Dr. Martin Schedlbauer of Khoury Boston demonstrates the calculation of weighted moving average directly in R without the use of a package.
ARIMA Models
ARIMA (AutoRegressive Integrated Moving Average) is a widely used statistical method for time series forecasting. It is particularly effective for data that shows evidence of non-stationarity, where the statistical properties of the series (mean, variance, etc.) change over time. ARIMA models are powerful tools for forecasting time series data that can be made stationary through differencing.
ARIMA combines three components:
- AutoRegression (AR): A model that uses the dependency between an observation and a number of lagged observations (i.e., previous time points).
- Integrated (I): The use of differencing of observations (subtracting an observation from an observation at the previous time step) to make the time series stationary.
- Moving Average (MA): A model that uses the dependency between an observation and a residual error from a moving average model applied to lagged observations.
The ARIMA model is typically denoted as ARIMA(p, d, q), where:
- p: The number of lag observations in the model (lag order).
- d: The number of times that the raw observations are differenced (degree of differencing).
- q: The size of the moving average window (order of moving average).
Calculating ARIMA in R
In R, the forecast package provides functions for fitting ARIMA models. Let’s use the ARIMA model to forecast the average daily water usage for the next time period using the provided CSV data. First, we need to convert the daily
column into a “time series object”. Next, we use the auto.arima
function from the forecast package to fit an ARIMA model to the time series data. This function automatically selects the best ARIMA model based on the data. Finally, we can use the fitted ARIMA model to forecast the daily water usage for the next time period.
library(forecast)
# Convert 'daily' to a time series object
daily.ts <- ts(water.usage.ts$daily,
start = c(2023, 8),
frequency = 12)
# Fit an ARIMA model
arima.model <- auto.arima(daily.ts)
# Forecast the next period (next month)
forecast.val <- forecast(arima.model, h = 1)
The ARIMA forecasting model estimates the average daily water use for the next month to be 427 with a 95% prediction range from 203 to 651.
The details of the model can be inspecting by using the function summary(arima.model)
or by accessing its components:
forecast.val$mean[[1]]
is the forecast value
forecast.val$lower[[2]]
is the lower bound of the 95% prediction interval
forecast.val$upper[[2]]
is the upper bound of the 95% prediction interval
The forecast()
function also contains the residuals and the 80% prediction interval.
Overall, ARIMA is a commonly used method for time series forecasting because of its excellent forecasts and its capability to handle various types of temporal data. By fitting an ARIMA model to historical data, a decision maker can use informed predictions about future values.
Mathematical Foundation of ARIMA
The ARIMA (AutoRegressive Integrated Moving Average) model combines aspects of autoregressive models, differencing, and moving averages to handle non-stationary data. Here, we will delve into the mathematical formulation and components of ARIMA: AutoRegressive (AR) part, Integrated (I) part, and Moving Average (MA) part.
The autoregressive part of ARIMA models the relationship between an observation and a number of lagged observations (previous values). The order of the autoregressive model is denoted by \(p\).
The AR model of order \(p\), AR(p), can be written as:
\[ y_t = \phi_1 y_{t-1} + \phi_2 y_{t-2} + \cdots + \phi_p y_{t-p} + \epsilon_t \]
where:
- \(y_t\) is the value of the time series at time \(t\).
- \(\phi_1, \phi_2, \ldots, \phi_p\) are the parameters of the model.
- \(\epsilon_t\) is white noise (error term) at time \(t\).
The integrated part of ARIMA involves differencing the data to make it stationary. The order of differencing is denoted by \(d\). Differencing means subtracting the previous observation from the current observation.
If \(d = 1\), the differenced series \(y'_t\) is:
\[ y'_t = y_t - y_{t-1} \]
For higher orders of differencing, the differencing process is repeated.
The moving average part models the relationship between an observation and a residual error from a moving average model applied to lagged observations. The order of the moving average model is denoted by \(q\).
The MA model of order \(q\), MA(q), can be written as:
\[ y_t = \epsilon_t + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \cdots + \theta_q \epsilon_{t-q} \]
where:
- \(\epsilon_t, \epsilon_{t-1}, \ldots, \epsilon_{t-q}\) are the white noise error terms.
- \(\theta_1, \theta_2, \ldots, \theta_q\) are the parameters of the model.
The ARIMA model combines these three components. An ARIMA(p, d, q) model is represented as:
\[ y'_t = \phi_1 y'_{t-1} + \phi_2 y'_{t-2} + \cdots + \phi_p y'_{t-p} + \epsilon_t + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \cdots + \theta_q \epsilon_{t-q} \]
where \(y'_t\) is the differenced series if \(d\) is the differencing order.
Model Evaluation
Once a model is built, it is essential to evaluate its performance. Evaluating the performance of forecasting models is done through a process known as backfitting or backtesting where we use the model to predict each prior time period for which we have an actual observation and comparing the forecast value against the actual value. The difference between the two is the “error” or “deviation”.
Common evaluation metrics for forecasting models include:
- Mean Absolute Error (MAE): The average of the absolute errors.
- Mean Squared Error (MSE): The average of the squared errors.
- Root Mean Squared Error (RMSE): The square root of MSE.
- Mean Absolute Percentage Error (MAPE): The average of the absolute percentage errors.
These metrics are also be used to compare model performance. If model A has a substantially lower MSE than model B, it means that, on average, its predictions are closer to the actual value and are likely going to be more accurate and that model A is preferable for making forecasts.
Let’s look at one of those metrics in more detail: MAE. The other metrics are calculated very similarly, so looking at just MAE also shows us how to calculate the others.
MAE calculates the average of the absolute errors. It is also known as Mean Absolute Deviation (MAD). Let’s compare two forecasting models for the water usage time series we have looked at before: Linear Regression Trend Model and Weighted Moving Average. To calculate the MAE, we need to use each model to calculate a “forecast” for prior time periods where we have the actual observed values and then calculate the difference between the forecasts and actual values and finally calculate the average (mean) of the absolute errors.
So, MAE is defined as
\(\text{MAE} = \frac{1}{n}\sum_{t=1}^{n}(|Y_t - F_t|)\)
The code below demonstrates how to do the calculation of MAE explicitly. We will reuse the calculate.wma()
function developed in section Weighted Moving Average and the linear regression model created in the section Linear Regression Trend Model. Go back and refresh your memory for both of those models before proceeding. We assume that the data set has been loaded into the data frame water.usage
as before.
Note that some modeling functions such as lm
and auto.arima
will automatically calculate residuals which are the difference between actual and predicted values – although not the absolute value, but that is simple to calculate. Naturally, this makes calculating the metrics much simpler.
Calculating MAE for WMA Model
The code below calculates a “forecast” for each prior time period except the first three as it would be impossible to do so as they do not have three prior time periods. The function calculate.wma()
places the “forecast” value for time period \(t\) into next row. So, the forecast for time period 13 is in row 12 and the forecast for time period 4 is in row 3. We need to be mindful of that when we calculate the errors.
Before running the code, do the calculations by hand or in a spreadsheet program and convince yourself that the result produced by the code is correct.
# define the weights for the WMA (3-period example)
w <- c(0.1, 0.3, 0.6)
# calculate "forecast" for each prior time period
water.usage$WMA <- calculate.wma(water.usage$daily,
n = 3,
weights = w)
periods <- nrow(water.usage)
# calculate the absolute errors
for (t in 3:periods) {
water.usage$abs.err[t] <- abs(water.usage$daily[t] - water.usage$WMA[t-1])
}
# calculate mean of absolute errors (MAE)
MAE.WMA <- mean(water.usage$abs.err, na.rm = T)
The MAE for the WMA with the above weights is 128.8. By itself it is not that meaningful, although a large MAE is comparison to the values in the time series would indicate that the model’s forecasts are wildly off. More useful is comparing the MAE of thw WMA with another model. So, let’s do that by calculating the MAE for the Linear Regression Trend Model.
MAE for Linear Regression Trend Model:
Rather than calculating the errors using the slope and intercept from the lm()
function, we will rely on the residuals produced by lm
– the residuals are the actual errors. They can be found in trend.model$residuals
. Print them out to convince yourself that they are correct. For example, for time period 12, we can see:
F.12 <- trend.model$coefficients[[1]] + trend.model$coefficients[[2]]*12
Y.12 <- water.usage$daily[12]
residual.12.1 <- Y.12 - F.12
residual.12.2 <- trend.model$residuals[12]
cat("residuals are", residual.12.1, "vs", residual.12.2)
## residuals are 16.41026 vs 16.41026
Before running the code, do the calculations by hand or in a spreadsheet program and convince yourself that the result produced by the code is correct.
MAE.LRTM <- mean(abs(trend.model$residuals), na.rm = T)
The MAE for the LRTM is 83.2. This is smaller than the MAE for the WMA which means that the backfitted forecasts were closer to the observed values. We can surmise that future forecasts of the LRTM will likely be more accurate compared to the WMA model (with those weights; perhaps a WMA with different weights might be better).
Lecture: Model Evaluation
In the lecture below, Dr. Martin Schedlbauer of Khoury Boston demonstrates the calculation of weighted moving average directly in R without the use of a package. Note that the lecture uses the alternative term MAD instead of MAE, but they are defined the same; both terms are used in practice.
Model Tuning
Model tuning is the process of optimizing a model’s hyperparameters to improve its performance. Hyperparameters are parameters that are set before the learning process begins and are not learned from the data. In contrast, model parameters are learned from the data during training. For example, the weights for the Weighted Moving Average model are hyperparameters and so is the \(\alpha\) for Simple Exponential Smoothing.
Tuning involves selecting the best set of hyperparameters that minimize a predefined loss function or maximize the model’s accuracy. This process often involves techniques such as cross-validation, grid search, or random search.
Example: Tuning Alpha for SES
Let’s take a look at an example of tuning a model. In Simple Exponential Smoothing (SES), the hyperparameter \(\alpha\) (alpha) controls the smoothing factor. The value of \(\alpha\) ranges between 0 and 1 and wile we often use a value of 0.3 this may not be the one that has the best fit and produces the smallest MAE or MSE. Furthermore, a value closer to 1 puts more weight on recent observations, making the model more responsive to recent changes, while a value closer to 0 gives more weight to past observations, making the model less responsive to recent changes.
The goal of model tuning is to find the \(\alpha\) value that minimizes the forecasting error. First, we need to choose a loss function to evaluate the model’s performance. Common choices are Mean Absolute Error (MAE), Mean Squared Error (MSE), or Root Mean Squared Error (RMSE). For this example, we will use MSE. We need to then try various values of \(\alpha\) and perform a grid search over a range of \(\alpha\) values to find the one that minimizes MSE.
Here’s how you can perform grid search to tune the alpha parameter for Simple Exponential Smoothing in R using the waterusage.csv
dataset assuming the same data frame we loaded before. The HoltWinters
function fits an SES model using the an alpha value, and the forecast
function generates forecasts for the next 12 periods.
# Convert 'daily' to a time series object
daily.ts <- ts(water.usage$daily,
start = c(2023, 8),
frequency = 12)
# Define a range of alpha values to search over
alpha.values <- seq(0.01, 1, by = 0.01)
# Define a function to calculate the MSE for a given alpha
calculate.mse <- function(alpha, data) {
ses.model <- HoltWinters(data, alpha = alpha, beta = FALSE, gamma = FALSE)
fitted.values <- fitted(ses.model)[, 1]
# Extract the fitted values
mse <- mean((data - fitted.values)^2, na.rm = TRUE)
return(mse)
}
# Perform grid search over the alpha values
mse.values <- sapply(alpha.values, calculate.mse, data = daily.ts)
# Find the alpha value that minimizes the MSE
best.alpha <- alpha.values[which.min(mse.values)]
cat("Best alpha:", best.alpha, "\n")
## Best alpha: 0.01
Model tuning is a essential step in improving the performance of forecasting models by optimizing its hyperparameters. In this example, we performed grid search to tune the alpha parameter for Simple Exponential Smoothing (SES). By finding the alpha value that minimizes the Mean Squared Error (MSE), we improved the model’s accuracy, leading to more reliable forecasts. The process of tuning hyperparameters can be applied to all forecasting models and datasets to achieve optimal fit.
Ensemble Methods
An ensemble model combines predictions from multiple models to improve overall forecasting accuracy and robustness. The idea is that by aggregating the strengths of various models, the ensemble can outperform any single model. Ensemble methods can be used in regression, classification, and time series forecasting tasks.
Let’s create an ensemble model by combining the predictions from three different models:
- ARIMA (AutoRegressive Integrated Moving Average)
- SES (Simple Exponential Smoothing)
- Linear Trend Model
We will average the predictions from the three models to create the ensemble forecast, although if we find that one model has generally lower MAD or MSE we might choose a weighted average with higher weights for the better performing model.
library(TTR)
library(forecast)
csv <- "https://s3.us-east-2.amazonaws.com/artificium.us/datasets/waterusage.csv"
# Load the data
water.usage <- read.csv(csv,
stringsAsFactors = F)
# Convert 'daily' to a time series object
daily.ts <- ts(water.usage$daily, start = c(2023, 8),
frequency = 12)
# Fit ARIMA model
arima.model <- auto.arima(daily.ts)
arima.forecast <- forecast(arima.model, h = 12)$mean
# Fit SES model
ses.model <- HoltWinters(daily.ts,
beta = FALSE,
gamma = FALSE)
ses.forecast <- forecast(ses.model, h = 12)$mean
# Fit Linear Trend model
t <- time(daily.ts)
linear.model <- lm(water.usage$daily ~ t)
future.time <- seq(from = (max(t) + 1/12),
to = (max(t) + 12/12),
by = 1/12)
linear.forecast <- predict(linear.model,
newdata = data.frame(time = future.time))
# Combine predictions (ensemble)
ensemble.forecast <- (arima.forecast + ses.forecast + linear.forecast) / 3
The forecast for the ensemble model is 433.7. Ensemble models combine predictions from multiple models to improve forecasting accuracy and robustness. By averaging the forecasts from ARIMA, SES, and Linear Trend models, we created an ensemble model that leverages the strengths of each individual model. This approach can lead to more accurate and reliable forecasts.
Seasonality Adjustment
Seasonality involves fluctuations that occur at regular intervals and are thus periodic. For example, retail sales might increase every December due to the holiday shopping season, or electricity consumption might peak every summer due to increased use of air conditioning. Furthermore, the length of the seasonal cycle is constant and known, i.e., it has a fixed frequency. Common seasonal frequencies include monthly (e.g., monthly sales data), quarterly (e.g., quarterly financial results or budgets), and annually (e.g., annual holiday sales periods). Because seasonality follows a regular pattern, it is predictable and can be identified and modeled to improve forecasting accuracy, i.e. forecasts can be adjusted to account for seasonal ups and downs.
Seasonality can be identified through visual inspection of time series plots or by using statistical methods such as autocorrelation plots. Visual inspection involves plotting the data and looking for regular patterns. Autocorrelation plots (ACF plots) show the correlation of the time series with its own past values and can highlight the presence of seasonality.
In time series forecasting models, seasonality can be accommodated additively or multiplicatively. An additive model adds a “seasonality adjustment” to each forecast that adjusts the forecast value up or down depending in which season the forecast value falls. A multiplicative model multiplies the forecast value by a “seasonality index” to make an adjustment of the forecast for a season.
Seasonality (or cyclicality) adjustment involves adjusting the forecast to account for period peaks and valleys, i.e. accounting for some seasons being above normal and some being below normal. This process helps in better understanding the underlying trend and making more accurate predictions.
There are two common ways to account for seasonality and to adjust forecasts for seasonality:
Multiplicative Adjustment: The forecast is multiplied by an adjustment factor (the “seasonality index”) depending for which season the forecast is. The adjustment factor is calculated from the data by finding the average difference between the mean and each season.
Additive Adjustment: This requires building a Linear Trend Line Model using both time and the season as factors in the regression. The forecast is adjusted by adding (or subtracting) a seasonality factor depending for which season the forecast is.
Let’s explore both of these techniques by considering the example below for a time series containing in-bound calls to a call center for four years by quarter. For example, in quarter 3 of year 2, the call center received 255 calls.
Q1 |
218 |
225 |
234 |
250 |
Q2 |
247 |
254 |
265 |
283 |
Q3 |
243 |
255 |
264 |
289 |
Q4 |
292 |
299 |
327 |
356 |
Identify Seasonality
The first step is not plot time series and use the visualization to determine whether their is cyclicality and what the season are.
library(ggplot2)
ggplot(callcenter.df,
aes(x = Period, y = Calls,
color = factor(Quarter), group = 1)) +
geom_line() +
geom_point() +
labs(title = "Quarterly Call Volume",
x = "Time Period",
y = "Number of Calls",
color = "Quarter") +
theme_minimal()
From the graph we can see that every Q4 (i.e., time period 4, 8, 12, and 16) is a peak where the number of calls spike, and Q1 of each year (i.e., time period 1, 5, 9, and 13). In addition, there is an overall upward trend, so a Linear Regression Trend Model would be an appropriate model to use; an alternative might be a Holt’s Linear Trend Model.
Multiplicative Adjustment
The process for a multiplicative seasonality adjusted forecast model requires a seasonality index (the multiplicative factor) and the process for calculating this factor is summarized below:
- For each one year period (the cycle), calculate the average across quarters (the seasons) within the cycle.
- For each time period, divide the actual seasonal demand by the average seasonal demand.
- Compute the average seasonality index for each season.
- Remove seasonality variance by calculating a forecast for the entire next year (cycle) and then divide that by the number of quarters (seasons) to get an average.
- Multiply the average forecast by the seasonality index for each season to get a forecast for the next year (cycle) per quarter (season).
Let’s start by calculating the seasonality index. The graphic below shows how to calculate the seasonality index for each season.
avg.calls <- aggregate(
x = callcenter.df[,c("Year","Calls")],
FUN = mean,
by = list(callcenter.df$Year)
)
avg.calls <- avg.calls[,2:3]
colnames(avg.calls) <- c("Year","AvgNumCalls")
for (t in 1:nrow(callcenter.df)) {
Y <- callcenter.df$Year[t]
SI <- avg.calls$AvgNumCalls[Y]
callcenter.df$SI[t] <- callcenter.df$Calls[t] / SI
}
avg.SI <- aggregate(
x = callcenter.df[,c("QuarterNum","SI")],
FUN = mean,
by = list(callcenter.df$Quarter)
)
Next, we’ll build a Linear Regression Trend Model; of course, we’ll need to make the data a time series rather than a 2D table. So, the time is Year 1/Quarter 1 followed by Year 1/Quarter 2, and so forth.
The data frame created looks like this (only the first and last three rows are shown):
dots.df <- data.frame(
Period = "...",
Calls = "...",
Quarter = "...",
Year = "..."
)
knitr::kable(
rbind(head(callcenter.df[,c(1,2,4,5)],3),
rbind(dots.df,
tail(callcenter.df[,c(1,2,4,5)],3))),
format = "simple",
align = "cccc",
row.names = FALSE
)
1 |
218 |
Q1 |
1 |
2 |
247 |
Q2 |
1 |
3 |
243 |
Q3 |
1 |
… |
… |
… |
… |
14 |
283 |
Q2 |
4 |
15 |
289 |
Q3 |
4 |
16 |
356 |
Q4 |
4 |
Now we can train the linear regression model.
trend.model.calls <- lm(callcenter.df$Calls ~ callcenter.df$Period)
# forecast quarterly calls for next four quarters
n <- nrow(callcenter.df) + 1 # time periods
fc <- c(4) # forecasts
b <- trend.model.calls$coefficients[[1]] # intercept
m <- trend.model.calls$coefficients[[2]] # slope
for (t in n:(n+3)) {
fc[t-n+1] <- m*t + b
}
Next, we need to average the forecasts and multiply by its seasonality index. This will be a forecast for time period 17 (the next time period) and that is a Q1, so the seasonality index for Q1 is 0.8627152:
mean.fc <- mean(fc)
adjusted.forecast <- mean.fc * avg.SI$SI[1]
The seasonality adjusted forecast for Quarter 1 of Year 5 is therefore 275 calls.
Additive Adjustment
An alternative approach to multiplicative seasonality adjustments is an additive model. This approach requires the use of a Linear Regression Trend Line Model, while a multiplicative model can be used with any forecasting method. In an additive model, we will add the season as a factor. Since it is a categorical factor it must be one-hot encoded.
One-hot encoding, also sometimes called “dummy encoding” as it introduced “dummy” or “placeholder” variables, transforms a categorical variable into a series of binary (0 or 1) columns. Each unique category value (level) becomes a new column, and a 1 or 0 is placed in the column depending on whether the categorical variable matches the column heading. If the categorical variable has n levels (i.e., n unique values), then we only need n-1 binary columns.
For the call center example, each quarter is a season, so “season” has four levels (Quarter 1 through Quarter 4), so we need to add three more columns to the data frame labeled “Q1”, “Q2”, and “Q3”. The quarterly seasons are then encoding as follows:
Q1 |
1 |
0 |
0 |
Q2 |
0 |
1 |
0 |
Q3 |
0 |
0 |
1 |
Q4 |
0 |
0 |
0 |
Note that Q4 is encoded as not Q1, not Q2,and not Q3.
Next, we will add the new dummy columns for the one-hot endoing to the callcenter.df
data frame. The packages dummies and caret provide functions for doing such as encoding, but we will not use those; instead, we will do the encoding ourselves using ifelse()
.
callcenter.df$Q1 <- ifelse(callcenter.df$Quarter == "Q1", 1, 0)
callcenter.df$Q2 <- ifelse(callcenter.df$Quarter == "Q2", 1, 0)
callcenter.df$Q3 <- ifelse(callcenter.df$Quarter == "Q3", 1, 0)
To build an additive model, we will train a linear regression trend model with the “dummy columns” for the season levels as additional independent variables.
trend.model.adjusted <- lm(
callcenter.df$Calls ~ callcenter.df$Period +
callcenter.df$Q1 +
callcenter.df$Q2 +
callcenter.df$Q3
)
summary(trend.model.adjusted)
##
## Call:
## lm(formula = callcenter.df$Calls ~ callcenter.df$Period + callcenter.df$Q1 +
## callcenter.df$Q2 + callcenter.df$Q3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.1125 -4.4125 -0.6125 2.8313 15.3375
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 281.5625 5.7530 48.942 3.18e-14 ***
## callcenter.df$Period 3.6938 0.4288 8.614 3.21e-06 ***
## callcenter.df$Q1 -75.6688 5.5745 -13.574 3.25e-08 ***
## callcenter.df$Q2 -48.8625 5.4914 -8.898 2.34e-06 ***
## callcenter.df$Q3 -52.0562 5.4409 -9.568 1.15e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.671 on 11 degrees of freedom
## Multiple R-squared: 0.9687, Adjusted R-squared: 0.9574
## F-statistic: 85.21 on 4 and 11 DF, p-value: 3.342e-08
Now, we can calculate a forecast for the next time period (time period 17, which is Quarter 1 of Year 5). Since that is a “Q1”, we will pass 1 for the variable Q1
in the model and 0 for the variables Q2
and Q3
. In other words, we will use 1 for the correct season or all 0 values if it is a “Q4”.
Notice how the coefficients are all negative as all quarters except Q4 are below the mean. Also, note that the coefficient for Q1 is the largest as it is the most below the mean.
b <- trend.model.adjusted$coefficients[[1]] # intercept
t <- trend.model.adjusted$coefficients[[2]] # time period coefficient
Q1 <- trend.model.adjusted$coefficients[[3]] # Q1 coefficient
Q2 <- trend.model.adjusted$coefficients[[4]] # Q1 coefficient
Q3 <- trend.model.adjusted$coefficients[[5]] # Q1 coefficient
F.17 <- (t * 17) + b + Q1*1 + Q2*0 + Q3*0
From this we get a seasonality adjusted forecast for Quarter 1 of Year 5 of 269 calls.
Forecast Interval
A forecast interval (or prediction interval) provides a range within which we expect future observations to fall with a certain probability. It quantifies the uncertainty around the point forecast, offering a more comprehensive view of potential future values. The width of the interval reflects the model’s uncertainty: wider intervals indicate greater uncertainty and consequently more risk in using the point forecast.
The Point Forecast is the single best estimate of the future value and what we have calculated thus far However, it is generally much better to report a range between a lower and upper value at a probability that the true value will fall within the forecast interval (e.g., 95%).
A 95% forecast interval implies that we are 95% confident that the true future value will lie within this interval. This is typically calculated as:
\(\text{Forecast Interval} = \text{Point Forecast} \pm Z_{\alpha/2} \times \text{Standard Error}\)
where:
- \(Z_{\alpha/2}\) is the critical value from the standard normal distribution for a 95% confidence level, approximately 1.96. Other values can be obtained from the standard distribution; the table below lists common values for common probabilities.
- Standard Error (SE) is the standard deviation of the forecast errors, reflecting the variability of the forecast.
# Calculate z-scores for different confidence levels
z_scores <- data.frame(
Confidence_Level = c("99%", "95%", "90%", "80%", "70%"),
Z_Score = c(
qnorm(0.995), # 99% confidence level
qnorm(0.975), # 95% confidence level
qnorm(0.95), # 90% confidence level
qnorm(0.90), # 80% confidence level
qnorm(0.85) # 70% confidence level
)
)
# Display the data frame using kable
knitr::kable(z_scores,
col.names = c("Confidence Level", "Z-Score"),
caption = "Z-Scores for Different Confidence Levels",
format = "simple",
digits = 3,
align = "cc",
label = NA)
Z-Scores for Different Confidence Levels
99% |
2.576 |
95% |
1.960 |
90% |
1.645 |
80% |
1.282 |
70% |
1.036 |
Forecasts produced by a Linear Regression Trend Model have a different way of calculating the forecast interval as it is a statistical method.The other methods are non-statistical and therefore do produce a value of the standard error and one has to be estimated.
LTRM Forecast Interval
For an LRTM the forecast interval is calculated as shown below. The example uses the water.usage
time series. The function summary(trend.model)
produces the residual standard error as one of its components: model.results$sigma
.
# Load necessary libraries
library(forecast)
csv <- "https://s3.us-east-2.amazonaws.com/artificium.us/datasets/waterusage.csv"
# Load the data
water.usage <- read.csv(csv,
stringsAsFactors = F)
# fit a linear trend model using linear regression
trend.model <- lm(water.usage$daily ~ water.usage$period)
print(trend.model)
##
## Call:
## lm(formula = water.usage$daily ~ water.usage$period)
##
## Coefficients:
## (Intercept) water.usage$period
## 379.939 7.304
# Forecast future values using the trend model
b <- trend.model$coefficients[[1]]
m <- trend.model$coefficients[[2]]
# forecast for next time period
forecast.val <- m*13 + b
model.results <- summary(trend.model)
SE <- model.results$sigma
z.score.95 <- qnorm(0.975) # two-tailed, so 0.05/2
lower95 <- forecast.val - (z.score.95 * SE)
upper95 <- forecast.val + (z.score.95 * SE)
So, for the LRTM we have a point forecast of 474.9 with a 95% forecast (prediction) interval ranging from 246.4 to 703.4. The range is quite wide indicating substantial uncertainty in the forecast and using the point forecast is thus risky. If someone were to budget for next month’s water use and they are “risk averse”, they’d be best served by using the upper range as it is more pessimistic in this situation.
Other Forecast Intervals
For all other methods, because they are not statistical, we need to use a different method since we do not have an standard error from the regression. We do, however, have residuals (errors) and can use those to derive an estimate of the standard error.
\(\sigma_E = \sqrt{\frac{\sum_{t=1}^{n}(E_t-\bar{E})^2}{n-1}}\)
Let’s demonstrate by calculating the 95% forecast interval for a WMA model.
# define the weights for the WMA (3-period example)
w <- c(0.1, 0.3, 0.6)
# calculate "forecast" for each prior time period
water.usage$WMA <- calculate.wma(water.usage$daily,
n = 3,
weights = w)
forecast.val <- water.usage$WMA[12]
periods <- nrow(water.usage)
# calculate the absolute errors
for (t in 3:periods) {
water.usage$E[t] <- water.usage$daily[t-1] - water.usage$WMA[t]
}
# get the errors for all but first two forecasts as those are NA
WMA.E <- water.usage$E[3:periods]
mean.E <- mean(WMA.E)
sigma <- sqrt(sum((WMA.E - mean.E)^2)/(length(WMA.E)))
z.score.95 <- qnorm(0.975) # two-tailed, so 0.05/2
lower95 <- forecast.val - (z.score.95 * sigma)
upper95 <- forecast.val + (z.score.95 * sigma)
So, for the WMA we have a point forecast of 496.9 with a 95% forecast (prediction) interval ranging from 266.1 to 727.7.
Tracking Signal
Forecasting models must be continually evaluated to assure that they still provide accurate forecast estimates.
A tracking signal (TS) is a measure used in forecasting to detect bias in the forecast. It helps in identifying whether the forecast model consistently overestimates or underestimates the actual values. It is defined as the ratio of the Cumulative Forecast Error (CFE) and the Mean Absolute Error (MAE). It should be calculated after each forecast and then evaluated to see if the forecast model requires re-training or further tuning.
\(\text{TS} = \frac{\text{CFE}}{\text{MAE}}\)
where
\(\text{CFE} = \sum_{t=1}^{n}(Y_t - F_t)\)
and
\(\text{MAE} = \sum_{t=1}^{n}(|Y_t - F_t|)\)
Interpretation of the tracking signal generally consider whether it is:
- Positive: which means that the forecasts are generally under predicting, i.e., \(Y > F\)
- Negative: which means that the forecasts are generally over predicting, i.e., \(Y < F\)
- Near Zero: which means that the forecast model is unbiased
A re-training and/or re-tuning of the forecast model is generally advisable when \(\text{TS} > ±4 \times MAE\) as that means that the errors are starting to far exceed the mean.
Example TS Calculation
Let’s use a simple example to demonstrate the calculation process for the tracking signal.
1 |
220 |
212 |
8 |
8 |
8 |
2 |
245 |
249 |
-4 |
4 |
4 |
3 |
270 |
260 |
10 |
14 |
10 |
4 |
261 |
252 |
9 |
23 |
9 |
5 |
283 |
276 |
7 |
30 |
7 |
6 |
301 |
292 |
9 |
39 |
9 |
The errors are the difference between the actual and the forecast values. The CFE is 39 and the MAE is 7.8, so the tracking signal is
\(\text{TS} = \frac{\text{CFE}}{\text{MAE}}=\frac{\text{39}}{\text{7.83}}=4.98\)
The tracking signal is positive so this means that the model that produced these forecasts is biased towards under predicting.
Calculating TS in R
As an example, let’s calculate TS for the previously built WMA Model for the water.usage
data set from Section Calculating WMA in R. The MAE was previously calculated in Section Calculating MAE for WMA Model.
We need two metrics:
- MAE, and
- CFE.
Since we have MAE in the variable MAE.WMA
from before, we only need to calculate CFE which is the sum of the forecast errors.
# calculate the errors
periods <- nrow(water.usage)
for (t in 3:periods) {
water.usage$abs.err[t] <- water.usage$daily[t-1] - water.usage$WMA[t]
}
# calculate sum of errors (CFE)
CFE <- sum(water.usage$abs.err, na.rm = T)
The CFE for this model’s first forecast is -77.9 and its MAE is 128.8, which makes the tracking signal -0.6. Since the tracking signal is negative, it means that the model, on average, over predicts. However, the tracking signal is well below 4 and thus within the expected range.
Summary: Forecasting
Forecasting is a multifaceted discipline that blends statistical methods with domain expertise. By understanding the underlying components of time series data and selecting appropriate forecasting models, one can make accurate predictions that are invaluable in many fields. R provides a rich ecosystem of packages and functions to implement and evaluate various forecasting methods, making it an common platform for this purpose.
In this lesson, we explored several key methods for forecasting, including Weighted Moving Average (WMA), Exponential Smoothing, ARIMA, and Regression Trend Line. We showed how to incorporate trend and seasonality into forecasts and how to use various metrics to evaluate and compare models. In addition, we explained how to tune models and combine them into ensemble models. Finally, we demonstrated the use of a tracking signal to detect forecast bias. Throughout the lesson, we emphasized the importance of these techniques and metrics in improving the accuracy and reliability of forecasts.
See Also
---
title: "Basic Time Series Forecasting"
params:
  category: 3
  stacks: "0,0"
  number: 303
  time: 45
  level: beginner
  tags: R,statistics,time series,forecasting,MAD,MSE,functions
  description: "Introduces time-series forecasting using weighted
                moving averages, linear regression, and exponential
                smoothing. Shows how to build a forecasting
                model, tune models, evaluate the model, construct ensembles,
                provide forecast intervals, account for trend and
                seasonality."
date: "<small>`r Sys.Date()`</small>"
author: "<small>Martin Schedlbauer</small>"
email: "m.schedlbauer@neu.edu"
affilitation: "Northeastern University"
output: 
  bookdown::html_document2:
    toc: true
    toc_float: true
    collapsed: false
    number_sections: false
    code_download: true
    theme: paper
    highlight: tango
    code_folding: show
---

---
title: "<small>`r params$category`.`r params$number`</small><br/><span style='color: #2E4053; font-size: 0.9em'>`r rmarkdown::metadata$title`</span>"
---

```{r code=xfun::read_utf8(paste0(here::here(),'/R/_insert2DB.R')), include = FALSE}
```

------------------------------------------------------------------------

## Learning Objectives

At the conclusion of this lesson, you will be able to:

-   define the need for forecasting
-   distinguish forecasting from predictive modeling
-   create moving average, exponential smoothing, and trend models
-   account for trend and seasonality
-   evaluate models using MAE, MSE, and MAPE
-   tune model hyperparameters
-   construct ensemble models

------------------------------------------------------------------------

## Introduction

Forecasting is a specific way to build predictive models that rely on trends in time series data. They are often used in finance, economics, and various branches of science where predicting future values based on historical time series data is essential and feasible. Forecasting involves making predictions about future events based on historical data. The primary goal is to identify patterns and trends within the data that can be extrapolated to predict future outcomes. Accurate forecasting can guide decision-making processes in various fields, from stock market analysis to weather prediction and inventory management.

In this lesson, we will delve into the fundamental principles of forecasting, explore various forecasting methods, and demonstrate practical implementations using R. Many of the techniques explored in this lesson also apply to constructed predictive models for classification and regression using supervised machine learning.

Forecasting and predictive modeling are related concepts in data science and machine learning, but they have some key differences. Forecasting primarily focuses on estimating future values of a specific variable or set of variables based on historical data and trends. It is often used for time series data and is particularly common in fields like economics, finance, and weather prediction. Predictive modeling, on the other hand, is a broader concept that involves using statistical algorithms and machine learning techniques to identify patterns in data and make predictions about future outcomes or to classify new instances. It can be applied to a wide range of problems, not just time series data. Nevertheless, in both modeling efforts data is used to "train a model" on historical data to make a prediction or derive an estimate.

Forecasting typically deals with time series data, where observations are collected sequentially over time. The temporal aspect is crucial, and the order of data points matters. Predictive modeling can work with various types of data, including structured (tabular), unstructured (text, images), and time series data. For predictive modeling, the order of data points may or may not be relevant, depending on the specific problem, but it certainly matters for forecasting.

Forecasting often employs simple methods such as weighted averages, statistical methods like ARIMA (AutoRegressive Integrated Moving Average), exponential smoothing, and more advanced techniques like Prophet or LSTM (Long Short-Term Memory) neural networks for complex time series. On the other hand, predictive modeling encompasses a broader range of algorithms, including regression, classification, clustering, and deep learning techniques. It may use methods like nearest neighbors, Bayesian networks, decision trees, random forests, support vector machines, neural networks, and various ensemble methods.

Forecasting typically focuses on predicting future values at specific time points or over a defined period. The time horizon can be short-term, medium-term, or long-term, depending on the application. Predictive modeling doesn't necessarily have a fixed time horizon, instead it estimates a numeric value or assigns some observation to a class.

Forecasting often emphasizes quantifying uncertainty in predictions, providing prediction intervals along with point estimates. Predictive modeling may or may not explicitly quantify uncertainty, depending on the algorithm used. Some models provide probability estimates or confidence scores, while others may only give point predictions.

Forecasting models often prioritize interpretability, as understanding the factors driving future trends is essential in many applications. Predictive modeling can range from highly interpretable (*e.g.*, multiple regression, decision trees) to more black-box approaches (*e.g.*, deep neural networks), depending on the complexity of the problem and the chosen algorithm.

In practice, there's often an overlap between forecasting and predictive modeling, especially when dealing with time series data. Many modern approaches combine elements of both to create more powerful and flexible models.

## Time Series Data

Central to forecasting is time series data, which is a sequence of data points collected or recorded at specific time intervals. Time series data can exhibit various components such as trend, seasonality, and noise. Understanding these components is critical for building effective forecasting models.

1.  **Trend**: The long-term movement in the data.
2.  **Seasonality**: Regular, repeating patterns or cycles in the data.
3.  **Noise**: Random variability in the data that cannot be explained by the model.

**Seasonality** refers to the presence of regular, predictable patterns or cycles that repeat over a specific period within a time series dataset. These patterns are often influenced by external factors such as weather, holidays, or economic cycles, and they can significantly affect the behavior of the data. Seasonality adjustment is an essential concept in time series forecasting because it helps identify and account for these periodic fluctuations, leading to more accurate predictions.

**Trend** in forecasting refers to the long-term movement or direction in time series data over a period. It represents the underlying pattern of growth or decline that persists over time, ignoring short-term fluctuations and noise. Identifying and modeling trends is paramount for accurate time series forecasting, as it helps in understanding the general direction in which the data is moving.

The components are often discovered through visualization. For example, the line chart below shows a time series of average daily water usage over the course of several months.

```{r plotTimeSeries, eval = T, class.source = 'fold-hide'}
# load data from CSV
csv <- "https://s3.us-east-2.amazonaws.com/artificium.us/datasets/waterusage.csv"
water.usage.ts <- read.csv(csv,
                           stringsAsFactors = F)

library(ggplot2)

# Convert 'period' to a factor to maintain the order in the plot
water.usage.ts$period <- as.factor(water.usage.ts$period)

# Line plot for daily usage over periods
ggplot(water.usage.ts, aes(x = period, y = daily, group = 1)) +
  geom_line(color = "green") +
  geom_point(color = "orange") +
  labs(title = "Average Daily Water Usage Over Time", x = "Period", y = "Daily Usage (ltr)") +
  theme_minimal()

```

The time series shows some peaks and valleys which might indicate seasonality, but there is no overall upward or downward trend. These characteristics will determine which forecasting model will be most appropriate.

## Forecasting Models

A forecasting model uses a series/sequence of data values measuring some event over time (referred to as *time series* data) and using the sequence to "estimate" the next few values in the sequence. Forecasting solely focuses on the trend of the data values over time, while other models include additional data. For example, if a real estate agent has average monthly home sales prices for the past fifteen years, they can use a forecasting model to "predict" the average sales price for the next few months. But that "prediction" would not take into account any other factors, so it would not be useful to predict the likely selling price for a specific home having a specific lot size, living area, pool, age of home, *etc.* A predictive model constructed using a machine learning algorithm would take those other "features" into account; those types of models are often referred to as *causal models*.

Several mathematical methods exist for forecasting, ranging from simple extrapolation techniques to complex machine learning models. In this lesson, we will focus on the following common approaches:

1.  Simple Moving Averages (SMA)
2.  Weighted Moving Averages (WMA)
3.  Simple Exponential Smoothing (SES)
4.  Holt's Linear Trend Model (HLTM)
5.  Linear Regression Trend Model (LRTM)
6.  AutoRegressive Integrated Moving Average (ARIMA)

In addition, we will explain how to tune and evaluate a model, account for trend and seasonality in the data, combine multiple forecasting methods into an ensemble, and provide a forecast interval in addition to a point estimate.

## Sample Data

The examples in the ensuing sections use the time series data in the CSV file [wateruage.csv](%22https://s3.us-east-2.amazonaws.com/artificium.us/datasets/waterusage.csv%22). The first few rows are shown below:

```{r displayCSV, eval = T, class.source = 'fold-hide'}
# load data from CSV
csv <- "https://s3.us-east-2.amazonaws.com/artificium.us/datasets/waterusage.csv"
water.usage.ts <- read.csv(csv,
                           stringsAsFactors = F)

# display first few rows
knitr::kable(head(water.usage.ts, 4),
             format = "simple",
             align = "c")
```

The data file contains average daily water use (in *liters*) over the course of several months.

## Simple Moving Average

This is the unweighted mean of the previous $n$ data points. A Simple Moving Average (SMA) is a basic approach to time series analysis which smoothes out short-term fluctuations. It is calculated by averaging a fixed number of past observations.

The SMA for a given time period is calculated by taking the arithmetic mean of the last $n$ observations. It can be represented mathematically as:

$\text{SMA}_t = \frac{1}{n} \sum_{i=0}^{n-1} y_{t-i}$

where:

-   $SMA_t$ is the Simple Moving Average at time $t$
-   $n$ is the number of periods over which the average is calculated
-   $y_{t-i}$ are the values of the time series at time $t, t-1, t-2, \ldots, t-(n-1)$

For example, suppose we have a time series of daily water usage for the past 7 days: [110, 90, 100, 120, 140, 160, 180]. To calculate the 3-day SMA at the end of this series to get a forecast for time period 8 (the next time period):

$t(8) = \frac{140 + 160 + 180}{3} = \frac{480}{3} = 160$

Naturally, this means that the forecast can never be larger than the largest value in the forecast calculation, so any upward (or downward) trend that might exist would be smoothed out. Consequently, SMA is not an appropriate method for calculating a forecast when there is a trend as it simply smoothes out fluctuations.

By calculating the arithmetic mean of a fixed number of past observations, SMA provides a clear and easily interpretable measure of central tendency in the data. The mathematical simplicity and ease of implementation make SMA a popular choice for initial data analysis and trend identification.

### Calculating SMA in R

Let's calculate the SMA for the `daily` column in the `waterusage.csv` dataset using the `SMA` function from the `TTR` package to compute the moving average for the next time period.

Here's the R code to perform these steps:

```{r SMA}
library(TTR)

csv <- "https://s3.us-east-2.amazonaws.com/artificium.us/datasets/waterusage.csv"

# load data from CSV
water.usage.ts <- read.csv(csv,
                           stringsAsFactors = F)

# Convert 'daily' to a time series object
daily.ts <- ts(water.usage.ts$daily, 
               start = c(2023, 8), 
               frequency = 12)

# Calculate the 3-period Simple Moving Average
sma_3 <- SMA(daily.ts, n = 3)
```

The function `SMA` calculates the 3-day moving average for all time periods (except the first two as it would be impossible to do so). It requires a "time series object" which is produced by the function `ts`. In the above code we start the time series at month 8 in 2023 because that is the oldest month in the data set *water.usage.ts*. We use `frequency = 12` because our data set from the CSV has 12 time periods.

The last value in the vector of forecasts is the forecast for the next time period. So the forecast for August 2024 would be in `sma_3[12]` and is `r round(sma_3[12],1)`.

## Weighted Moving Average

A Weighted Moving Average (WMA) is a type of moving average where each data point is assigned a different weight, with more recent observations typically given higher weights. This method gives more importance to recent data points, making it more responsive to changes in the data compared to the Simple Moving Average (SMA).

The WMA for a given time period $n$ is calculated by assigning a weight to each data point and then taking the weighted average. The weights typically sum to 1 and are written as fractions, although they do not have. The benefit of writing them as fractions that sum to 1 is that the sum of the weights is always 1 making the calculation simpler. If the weights do not sum to 1, then it is important to divide by the sum of the weights.

$WMA_t = \frac{\sum_{i=0}^{n-1} w_i y_{t-i}}{\sum_{i=0}^{n-1} w_i}$

where:

-   $WMA_t$ is the Weighted Moving Average at time $t$
-   $y_{t-i}$ are the values of the time series at times $t, t-1, t-2, \ldots, t-(n-1)$
-   $w_i$ are the weights assigned to each observation

Suppose we have the following data points: [100, 120, 140, 160, 180], and we want to calculate the 3-period WMA with weights 0.1, 0.3, and 0.6 for the most recent three observations.

The calculation for $WMA_5$ (the WMA at the fifth data point) would be:

$WMA_5 = \frac{0.1 \times 140 + 0.3 \times 160 + 0.6 \times 180}{0.1 + 0.3 + 0.6}$

$WMA_5 = \frac{14 + 48 + 108}{1}$

$WMA_5 = 170.0$

### Calculating WMA in R

Let’s implement the Weighted Moving Average using the `waterusage.csv` dataset. The **forecast** package in R does not have a dedicated function specifically for calculating a Weighted Moving Average (WMA) forecast, although the package provides various functions for other types of moving averages, exponential smoothing, and ARIMA models which we will explore later in this lesson.

To calculate a Weighted Moving Average, we can either use the `WMA` function from the **TTR** package or write a custom function.

**Custom Function for Calculating WMA**

The `calculate.wma` function takes three arguments: the data, the number of periods $n$, and the weights. It initializes a vector `wma` with `NA` values to store the WMA results. A loop iterates over the data, calculating the WMA for each position where there are enough data points. The WMA is calculated by taking the weighted sum of the last $n$ observations and dividing by the sum of the weights.

The weights for the 3-period WMA are defined as `c(0.1, 0.3, 0.6)`, with the most recent observation given the highest weight. The results are stored in a new column `WMA` in the `water_usage` data frame.

```{r customWMA}
# Load the data
csv <- "https://s3.us-east-2.amazonaws.com/artificium.us/datasets/waterusage.csv"
water.usage <- read.csv(csv,
                        stringsAsFactors = F,
                        header = T)

# Function to calculate the Weighted Moving Average
calculate.wma <- function(data, n, weights) {
  # Initialize WMA column with NA values
  wma <- rep(NA, length(data)) 
  for (i in n:length(data)) {
    wma[i] <- sum(weights * data[(i-n+1):i]) / sum(weights)
  }
  return(wma)
}

# Define the weights for the WMA (3-period example)
w <- c(0.1, 0.3, 0.6)

# Calculate the WMA
WMA <- calculate.wma(water.usage$daily, 
                     n = 3, 
                     weights = w)

forecast.val <- tail(WMA,1)
```

The forecast is in the last element of the returned vector of forecasts (it actually calculates the forecast for all prior values) and is `r forecast.val`.

**Using `WMA` Function from** **TTR**

The `WMA` function from the **TTR** package calculates the Weighted Moving Average based on the defined weights. The calculated WMA is added as a new column `wma` in the data frame. The forecast for the next period is made using the last value of the WMA. This is a simple approach where the next forecast value is assumed to be the same as the most recent WMA value.

```{r WMA_TTR}
library(TTR)

csv <- "https://s3.us-east-2.amazonaws.com/artificium.us/datasets/waterusage.csv"

# Load the data
water.usage <- read.csv(csv,
                        stringsAsFactors = F)

# Define the weights for the WMA (example with 3-period weights)
weights <- c(0.1, 0.3, 0.6)

# Calculate the Weighted Moving Average
wma <- TTR::WMA(water.usage$daily, n = length(weights), wts = weights)

# Forecast using the last value of WMA
forecast.val <- tail(wma, 1)
```

The forecast is in the last element of the returned vector of forecasts (it actually calculates the forecast for all prior values) and is `r forecast.val` -- the same as with our custom function (as we would expect.)

The **TTR** package has a number of other forecasting methods, such as Exponential Weighted Mean, so check it out.

The Weighted Moving Average (WMA) is an effective method for smoothing time series data, giving more importance to recent observations. Naturally, given that it is an average, WMA does not account for either trend or seasonality, although a seasonality adjustment can be added as explained later in this lesson.

### Lecture: Moving Average

In the lecture below, Dr. Martin Schedlbauer of Khoury Boston demonstrates the calculation of weighted moving average directly in R without the use of a package.

<iframe src="https://player.vimeo.com/video/973340515?badge=0&amp;autopause=0&amp;player_id=0&amp;app_id=58479" width="480" height="270" frameborder="1" allow="autoplay; fullscreen; picture-in-picture; clipboard-write" data-external="1">

</iframe>

## Simple Exponential Smoothing

Simple Exponential Smoothing (SES) is a time series forecasting method for univariate data that can be used to forecast data with no clear trend or seasonal pattern. SES uses weighted averages of past observations, with more recent observations given higher weights. The method applies exponentially decreasing weights to past data, making it more responsive to recent changes. This method is particularly useful for data with a clear trend and seasonal patterns.

The forecast at time $t+1$, the next time period, is given by:

$F_{t+1} = \alpha Y_t + (1 - \alpha) F_t$

where:

-   $F_{t+1}$ is the forecast for the next period
-   $Y_t$ is the actual value at time $t$
-   $F_t$ is the forecast for time $t$
-   $\alpha$ (alpha) is the smoothing parameter, $0 \leq \alpha \leq 1$

The initial forecast $F_1$ can be set to the first observation $Y_1$ or the average of the first few observations.

From the formula above, we can see that it is recursive and requires starting at the first time period.

The SES formula can be applied recursively as:

$F_{t+1} = \alpha Y_t + (1 - \alpha) (\alpha Y_{t-1} + (1 - \alpha) F_{t-1})$

This recursive nature means that we apply exponentially decreasing weights to past observations.

Let's consider a small dataset for illustration. Suppose we have the following data points: [10, 12, 14, 13, 15]. Assume we choose $\alpha = 0.3$ (a common value). The initial forecast $F_1$ is set to $Y_1$, which is 10.

**Initial Forecast**:

$F_1 = Y_1 = 10$

**First Forecast**:

$F_2 = \alpha Y_1 + (1 - \alpha) F_1$

$F_2 = 0.3 \times 10 + 0.7 \times 10$

$F_2 = 10$

**Second Forecast**:

$F_3 = \alpha Y_2 + (1 - \alpha) F_2$

$F_3 = 0.3 \times 12 + 0.7 \times 10$

$F_3 = 10.6$

**Third Forecast**:

$F_4 = \alpha Y_3 + (1 - \alpha) F_3$

$F_4 = 0.3 \times 14 + 0.7 \times 10.6$

$F_4 = 11.62$

**Fourth Forecast**:

$F_5 = \alpha Y_4 + (1 - \alpha) F_4$

$F_5 = 0.3 \times 13 + 0.7 \times 11.62$

$F_5 = 12.134$

The "ideal" value for $\alpha$ is found by iteratively trying various values and seeing which model, when applied to its prior observations (called "backfitting"), results in the best fit. The section on [Model Evaluation] shows this approach in more detail.

### Calculating SES in R

Let’s calculate SES for the `waterusage.csv` data to forecast daily water usage. We will use the `HoltWinters` function from the **forecast** package. For this function to calculate Simple Exponential Smoothing requires setting the *beta* and *gamma* argument to *FALSE*. Notice that once again we need to convert the data to a "time series" object of type *ts*.

```{r warning=F, message=F}
library(forecast)

csv <- "https://s3.us-east-2.amazonaws.com/artificium.us/datasets/waterusage.csv"

water_usage <- read.csv(csv,
                        stringsAsFactors = F)

# convert 'daily' to a time series object
daily.ts <- ts(water_usage$daily, 
               start = c(2023, 8), 
               frequency = 12)

# create SES model using HoltWinters 
# with beta and gamma set to FALSE
ses.model <- HoltWinters(daily.ts, 
                         beta = FALSE, 
                         gamma = FALSE)

# Forecast the next period (next month)
forecast.val <- forecast(ses.model, h = 1)
```

The SES forecasting model estimates the average daily water use for the next month to be `r round(forecast.val$mean[[1]],0)` with a 95% prediction range from `r round(forecast.val$lower[[2]],0)` to `r round(forecast.val$upper[[2]],0)`.

The details of the model can be inspecting by using the function `summary(ses.model)` or by accessing its components:

-   `forecast.val$mean[[1]]` is the forecast value
-   `forecast.val$lower[[2]]` is the lower bound of the 95% prediction interval
-   `forecast.val$upper[[2]]` is the upper bound of the 95% prediction interval

The `forecast()` function also return the 80% prediction interval.

### Lecture: Exponential Smoothing

In the lecture below, Dr. Martin Schedlbauer of Khoury Boston demonstrates the calculation of weighted moving average directly in R without the use of a package.

<iframe src="https://player.vimeo.com/video/973415945?badge=0&amp;autopause=0&amp;player_id=0&amp;app_id=58479" width="480" height="360" frameborder="1" allow="autoplay; fullscreen; picture-in-picture; clipboard-write" data-external="1">

</iframe>

To summarize, Simple Exponential Smoothing (SES) is an effective method for forecasting time series data that lacks a clear trend or seasonal pattern. By applying exponentially decreasing weights to past observations, SES provides a smooth forecast that emphasizes recent changes in the data. The `HoltWinters` function makes calculating an SES forecast quite simple.

## Holt’s Linear Trend Model

Holt's Linear Trend Model (HLTM), also known as Holt’s Exponential Smoothing, is an extension of Simple Exponential Smoothing that allows forecasting of data with a simple linear trend. It incorporates two equations: one for the level and one for the trend. This method is useful when the data exhibits a trend but no seasonality.

**Level Equation**:

$L_t = \alpha Y_t + (1 - \alpha) (L_{t-1} + T_{t-1})$

**Trend Equation**:

$T_t = \beta (L_t - L_{t-1}) + (1 - \beta) T_{t-1}$

The forecast equation for the next *h* time periods is:

$F_{t+h} = L_t + h T_t$

where:

-   $L_t$ is the estimated level at time $t$
-   $T_t$ is the estimated trend at time $t$
-   $Y_t$ is the actual value at time $t$
-   $\alpha$ is the smoothing parameter for the level, $0 \leq \alpha \leq 1$
-   $\beta$ is the smoothing parameter for the trend, $0 \leq \beta \leq 1$
-   $h$ is the number of periods ahead to be forecasted.

Let's look at an example. Suppose we have the following data points for a series: [10, 12, 14, 13, 15] and we choose $\alpha = 0.5$ and $\beta = 0.5$. Let's calculate the level and trend for these points.

**Initial Level and Trend**:

$L_1 = Y_1 = 10$

$T_1 = Y_2 - Y_1 = 2$

**First Update**:

$L_2 = \alpha Y_2 + (1 - \alpha) (L_1 + T_1)$

$L_2 = 0.5 \times 12 + 0.5 \times (10 + 2)$

$L_2 = 12$

$T_2 = \beta (L_2 - L_1) + (1 - \beta) T_1$

$T_2 = 0.5 \times (12 - 10) + 0.5 \times 2$

$T_2 = 2$

**Second Update**:

$L_3 = \alpha Y_3 + (1 - \alpha) (L_2 + T_2)$

$L_3 = 0.5 \times 14 + 0.5 \times (12 + 2)$

\$L_3 = 14 \$

$T_3 = \beta (L_3 - L_2) + (1 - \beta) T_2$

$T_3 = 0.5 \times (14 - 12) + 0.5 \times 2$ \$T_3 = 2\$

The level and trend are updated at each step, and the forecast can be made using these updated values. Once again, the model is recursive and requires starting at the first observation. The first forecast can be set to the first observed value or we could start at observation three and make its first forecast the average of the first two observed values.

### Calculating HLTM in R

Let’s implement Holt’s Linear Trend Model using the `waterusage.csv` data to forecast daily water usage. We will choose $\alpha = 0.3$ and $\beta = 0.5$.

```{r warning=F, message=F}
library(forecast)

# convert 'daily' to a time series object
daily.ts <- ts(water_usage$daily, start = c(2023, 8), frequency = 12)

# apply Holt's Linear Trend Model using HoltWinters 
# with gamma set to FALSE
holt.model <- HoltWinters(daily.ts, 
                          alpha = 0.3, 
                          beta = 0.5, 
                          gamma = FALSE)

# Forecast the next period (next month)
forecast.val <- forecast(holt.model, h = 1)
```

The Holt's Linear Trend Model estimates the average daily water use for the next month to be `r round(forecast.val$mean[[1]],0)` with a 95% prediction range from `r round(forecast.val$lower[[2]],0)` to `r round(forecast.val$upper[[2]],0)`.

The details of the model can be inspecting by using the function `summary(holt.model)` or by accessing its components:

-   `forecast.val$mean[[1]]` is the forecast value
-   `forecast.val$lower[[2]]` is the lower bound of the 95% prediction interval
-   `forecast.val$upper[[2]]` is the upper bound of the 95% prediction interval

The `forecast()` function also returns the residuals and the 80% prediction interval.

Holt’s Linear Trend Model extends simple exponential smoothing to capture linear trends in time series data. By incorporating both level and trend components, it provides a more accurate forecast for data with trends but no seasonality.

## Linear Regression Trend Model

**Trend** in forecasting refers to the long-term movement or direction in a time series data over a period. It represents the underlying pattern of growth or decline that persists over time, ignoring short-term fluctuations and noise. Identifying and modeling trends is crucial for accurate time series forecasting, as it helps in understanding the general direction in which the data is moving.

Trends capture the general direction of the data over a long period. This movement can be upward, downward, or flat (no trend). There may be noise in the data and values may move up or down, but over the long-term, if they remain stable, then there is no trend. Trends are persistent patterns that last for a significant period, distinguishing them from short-term variations.

We generally recognize three types of trends, which can be represented mathematically in various forms depending on the nature of the trend:

**Linear Trend**: A constant rate of increase or decrease over time.

$T_t = a + bt$

where:

-   $T_t$ is the trend component at time $t$
-   $a$ is the intercept
-   $b$ is the slope (*i.e.*, rate of change)

**Exponential Trend**: A rate of increase or decrease that accelerates over time.

$T_t = ae^{bt}$

where $e$ is the base of the natural logarithm.

**Polynomial Trend**: More complex trends that can change direction over time, modeled as higher-order polynomials.

$T_t = a + bt + ct^2 + dt^3 + \ldots$

where $a, b, c, d, \ldots$ are coefficients of the polynomial.

Trends can be identified through visual inspection of time series plots or by using statistical methods like moving averages and regression analysis. Plotting the data helps in visualizing the long-term direction, while regression techniques can quantify the trend mathematically. For example, if we have monthly sales data for a retail store over several years, and we observe that sales are generally increasing, then we can use a linear regression trend model to express the trend mathematically and provide useful forecasts.

### Calculating LRTM in R

Let's use the `waterusage.csv` dataset to identify and model the trend in daily water usage using the `lm()` function.

```{r warning=F, message=F}
# Load necessary libraries
library(forecast)

csv <- "https://s3.us-east-2.amazonaws.com/artificium.us/datasets/waterusage.csv"

# Load the data
water.usage <- read.csv(csv,
                        stringsAsFactors = F)

# fit a linear trend model using linear regression
trend.model <- lm(water.usage$daily ~ water.usage$period)

# Forecast future values using the trend model
forecast.val <- predict(trend.model, 
                        newdata = data.frame(period = 13))
```

The Linear Regression Trend Model forecasting model estimates the average daily water use for the next month to be `r round(forecast.val[12],0)`.

The details of the model can be inspecting by using the function `summary(ses.model)` or by accessing its components:

-   `trend.model$coefficients[[1]]` is the coefficient of the regression equation
-   `trend.model$coefficients[[2]]` is the slope of the regression equation

The equation for the linear regression trend model is:

$F(t) = 379.939 + 7.304t$

Plugging in $t=12$ results in `r round(trend.model$coefficients[[1]]*(12) + trend.model$coefficients[[2]],1)`, the same value as we got from the `predict()` function.

### Lecture: Trend Line

In the lecture below, Dr. Martin Schedlbauer of Khoury Boston demonstrates the calculation of weighted moving average directly in R without the use of a package.

<iframe src="https://player.vimeo.com/video/973427615?badge=0&amp;autopause=0&amp;player_id=0&amp;app_id=58479" width="480" height="360" frameborder="1" allow="autoplay; fullscreen; picture-in-picture; clipboard-write" title="Calculating Linear Regression Trend Model in R" data-external="1">

</iframe>

## ARIMA Models

ARIMA (AutoRegressive Integrated Moving Average) is a widely used statistical method for time series forecasting. It is particularly effective for data that shows evidence of non-stationarity, where the statistical properties of the series (mean, variance, etc.) change over time. ARIMA models are powerful tools for forecasting time series data that can be made stationary through differencing.

ARIMA combines three components:

1.  **AutoRegression (AR)**: A model that uses the dependency between an observation and a number of lagged observations (*i.e.*, previous time points).
2.  **Integrated (I)**: The use of differencing of observations (subtracting an observation from an observation at the previous time step) to make the time series stationary.
3.  **Moving Average (MA)**: A model that uses the dependency between an observation and a residual error from a moving average model applied to lagged observations.

The ARIMA model is typically denoted as ARIMA(p, d, q), where:

-   **p**: The number of lag observations in the model (lag order).
-   **d**: The number of times that the raw observations are differenced (degree of differencing).
-   **q**: The size of the moving average window (order of moving average).

### Calculating ARIMA in R

In R, the **forecast** package provides functions for fitting ARIMA models. Let's use the ARIMA model to forecast the average daily water usage for the next time period using the provided CSV data. First, we need to convert the `daily` column into a "time series object". Next, we use the `auto.arima` function from the **forecast** package to fit an ARIMA model to the time series data. This function automatically selects the best ARIMA model based on the data. Finally, we can use the fitted ARIMA model to forecast the daily water usage for the next time period.

```{r loadPackage_forecast, echo = F, warning=F, message=F}
package <- "forecast"

if (!require(package, character.only = TRUE)) {
  install.packages(package, dependencies = TRUE, quietly = TRUE)
}

library(package, character.only = TRUE)
```

```{r ARIMA Example, echo = T, eval = T}
library(forecast)

# Convert 'daily' to a time series object
daily.ts <- ts(water.usage.ts$daily, 
               start = c(2023, 8), 
               frequency = 12)

# Fit an ARIMA model
arima.model <- auto.arima(daily.ts)

# Forecast the next period (next month)
forecast.val <- forecast(arima.model, h = 1)
```

The ARIMA forecasting model estimates the average daily water use for the next month to be `r round(forecast.val$mean[[1]],0)` with a 95% prediction range from `r round(forecast.val$lower[[2]],0)` to `r round(forecast.val$upper[[2]],0)`.

The details of the model can be inspecting by using the function `summary(arima.model)` or by accessing its components:

-   `forecast.val$mean[[1]]` is the forecast value
-   `forecast.val$lower[[2]]` is the lower bound of the 95% prediction interval
-   `forecast.val$upper[[2]]` is the upper bound of the 95% prediction interval

The `forecast()` function also contains the residuals and the 80% prediction interval.

Overall, ARIMA is a commonly used method for time series forecasting because of its excellent forecasts and its capability to handle various types of temporal data. By fitting an ARIMA model to historical data, a decision maker can use informed predictions about future values.

### Mathematical Foundation of ARIMA

The ARIMA (AutoRegressive Integrated Moving Average) model combines aspects of autoregressive models, differencing, and moving averages to handle non-stationary data. Here, we will delve into the mathematical formulation and components of ARIMA: AutoRegressive (*AR*) part, Integrated (*I*) part, and Moving Average (*MA*) part.

The autoregressive part of ARIMA models the relationship between an observation and a number of lagged observations (previous values). The order of the autoregressive model is denoted by $p$.

The AR model of order $p$, AR(p), can be written as:

$$ y_t = \phi_1 y_{t-1} + \phi_2 y_{t-2} + \cdots + \phi_p y_{t-p} + \epsilon_t $$

where:

-   $y_t$ is the value of the time series at time $t$.
-   $\phi_1, \phi_2, \ldots, \phi_p$ are the parameters of the model.
-   $\epsilon_t$ is white noise (error term) at time $t$.

The integrated part of ARIMA involves differencing the data to make it stationary. The order of differencing is denoted by $d$. Differencing means subtracting the previous observation from the current observation.

If $d = 1$, the differenced series $y'_t$ is:

$$ y'_t = y_t - y_{t-1} $$

For higher orders of differencing, the differencing process is repeated.

The moving average part models the relationship between an observation and a residual error from a moving average model applied to lagged observations. The order of the moving average model is denoted by $q$.

The MA model of order $q$, MA(q), can be written as:

$$ y_t = \epsilon_t + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \cdots + \theta_q \epsilon_{t-q} $$

where:

-   $\epsilon_t, \epsilon_{t-1}, \ldots, \epsilon_{t-q}$ are the white noise error terms.
-   $\theta_1, \theta_2, \ldots, \theta_q$ are the parameters of the model.

The ARIMA model combines these three components. An *ARIMA(p, d, q)* model is represented as:

$$ y'_t = \phi_1 y'_{t-1} + \phi_2 y'_{t-2} + \cdots + \phi_p y'_{t-p} + \epsilon_t + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \cdots + \theta_q \epsilon_{t-q} $$

where $y'_t$ is the differenced series if $d$ is the differencing order.

## Model Evaluation

Once a model is built, it is essential to evaluate its performance. Evaluating the performance of forecasting models is done through a process known as *backfitting* or *backtesting* where we use the model to predict each prior time period for which we have an actual observation and comparing the forecast value against the actual value. The difference between the two is the "error" or "deviation".

Common evaluation metrics for forecasting models include:

-   **Mean Absolute Error (MAE)**: The average of the absolute errors.
-   **Mean Squared Error (MSE)**: The average of the squared errors.
-   **Root Mean Squared Error (RMSE)**: The square root of MSE.
-   **Mean Absolute Percentage Error (MAPE)**: The average of the absolute percentage errors.

These metrics are also be used to compare model performance. If model A has a substantially lower MSE than model B, it means that, on average, its predictions are closer to the actual value and are likely going to be more accurate and that model A is preferable for making forecasts.

Let's look at one of those metrics in more detail: *MAE*. The other metrics are calculated very similarly, so looking at just *MAE* also shows us how to calculate the others.

MAE calculates the average of the absolute errors. It is also known as *Mean Absolute Deviation (MAD)*. Let's compare two forecasting models for the water usage time series we have looked at before: Linear Regression Trend Model and Weighted Moving Average. To calculate the *MAE*, we need to use each model to calculate a "forecast" for prior time periods where we have the actual observed values and then calculate the difference between the forecasts and actual values and finally calculate the average (mean) of the absolute errors.

So, *MAE* is defined as

$\text{MAE} = \frac{1}{n}\sum_{t=1}^{n}(|Y_t - F_t|)$

The code below demonstrates how to do the calculation of MAE explicitly. We will reuse the `calculate.wma()` function developed in section [Weighted Moving Average] and the linear regression model created in the section [Linear Regression Trend Model]. Go back and refresh your memory for both of those models before proceeding. We assume that the data set has been loaded into the data frame `water.usage` as before.

Note that some modeling functions such as `lm` and `auto.arima` will automatically calculate residuals which are the difference between actual and predicted values -- although not the absolute value, but that is simple to calculate. Naturally, this makes calculating the metrics much simpler.

### Calculating MAE for WMA Model

The code below calculates a "forecast" for each prior time period except the first three as it would be impossible to do so as they do not have three prior time periods. The function `calculate.wma()` places the "forecast" value for time period $t$ into next row. So, the forecast for time period 13 is in row 12 and the forecast for time period 4 is in row 3. We need to be mindful of that when we calculate the errors.

> Before running the code, do the calculations by hand or in a spreadsheet program and convince yourself that the result produced by the code is correct.

```{r}
# define the weights for the WMA (3-period example)
w <- c(0.1, 0.3, 0.6)

# calculate "forecast" for each prior time period
water.usage$WMA <- calculate.wma(water.usage$daily, 
                                 n = 3, 
                                 weights = w)

periods <- nrow(water.usage)

# calculate the absolute errors
for (t in 3:periods) {
  water.usage$abs.err[t] <- abs(water.usage$daily[t] - water.usage$WMA[t-1])
}

# calculate mean of absolute errors (MAE)
MAE.WMA <- mean(water.usage$abs.err, na.rm = T)
```

The MAE for the WMA with the above weights is `r round(MAE.WMA,2)`. By itself it is not that meaningful, although a large MAE is comparison to the values in the time series would indicate that the model's forecasts are wildly off. More useful is comparing the MAE of thw WMA with another model. So, let's do that by calculating the MAE for the Linear Regression Trend Model.

**MAE for Linear Regression Trend Model:**

Rather than calculating the errors using the slope and intercept from the `lm()` function, we will rely on the residuals produced by `lm` -- the residuals are the actual errors. They can be found in `trend.model$residuals`. Print them out to convince yourself that they are correct. For example, for time period 12, we can see:

```{r}
F.12 <- trend.model$coefficients[[1]] + trend.model$coefficients[[2]]*12
Y.12 <- water.usage$daily[12]

residual.12.1 <- Y.12 - F.12
residual.12.2 <- trend.model$residuals[12]

cat("residuals are", residual.12.1, "vs", residual.12.2)
```

> Before running the code, do the calculations by hand or in a spreadsheet program and convince yourself that the result produced by the code is correct.

```{r}
MAE.LRTM <- mean(abs(trend.model$residuals), na.rm = T)
```

The MAE for the LRTM is `r round(MAE.LRTM,2)`. This is smaller than the MAE for the WMA which means that the backfitted forecasts were closer to the observed values. We can surmise that future forecasts of the LRTM will likely be more accurate compared to the WMA model (with those weights; perhaps a WMA with different weights might be better).

### Lecture: Model Evaluation

In the lecture below, Dr. Martin Schedlbauer of Khoury Boston demonstrates the calculation of weighted moving average directly in R without the use of a package. Note that the lecture uses the alternative term *MAD* instead of *MAE*, but they are defined the same; both terms are used in practice.

<iframe src="https://player.vimeo.com/video/973441437?badge=0&amp;autopause=0&amp;player_id=0&amp;app_id=58479" width="480" height="360" frameborder="0" allow="autoplay; fullscreen; picture-in-picture; clipboard-write" title="Evaluating Models in R with MAD" data-external="1">

</iframe>

## Model Tuning

**Model tuning** is the process of optimizing a model's hyperparameters to improve its performance. Hyperparameters are parameters that are set before the learning process begins and are not learned from the data. In contrast, model parameters are learned from the data during training. For example, the weights for the Weighted Moving Average model are hyperparameters and so is the $\alpha$ for Simple Exponential Smoothing.

Tuning involves selecting the best set of hyperparameters that minimize a predefined loss function or maximize the model's accuracy. This process often involves techniques such as cross-validation, grid search, or random search.

### Example: Tuning Alpha for SES

Let's take a look at an example of tuning a model. In Simple Exponential Smoothing (SES), the hyperparameter $\alpha$ (alpha) controls the smoothing factor. The value of $\alpha$ ranges between 0 and 1 and wile we often use a value of 0.3 this may not be the one that has the best fit and produces the smallest MAE or MSE. Furthermore, a value closer to 1 puts more weight on recent observations, making the model more responsive to recent changes, while a value closer to 0 gives more weight to past observations, making the model less responsive to recent changes.

The goal of model tuning is to find the $\alpha$ value that minimizes the forecasting error. First, we need to choose a loss function to evaluate the model's performance. Common choices are Mean Absolute Error (MAE), Mean Squared Error (MSE), or Root Mean Squared Error (RMSE). For this example, we will use MSE. We need to then try various values of $\alpha$ and perform a grid search over a range of $\alpha$ values to find the one that minimizes MSE.

Here’s how you can perform grid search to tune the alpha parameter for Simple Exponential Smoothing in R using the `waterusage.csv` dataset assuming the same data frame we loaded before. The `HoltWinters` function fits an SES model using the an alpha value, and the `forecast` function generates forecasts for the next 12 periods.

```{r}
# Convert 'daily' to a time series object
daily.ts <- ts(water.usage$daily, 
               start = c(2023, 8), 
               frequency = 12)

# Define a range of alpha values to search over
alpha.values <- seq(0.01, 1, by = 0.01)

# Define a function to calculate the MSE for a given alpha
calculate.mse <- function(alpha, data) {
  ses.model <- HoltWinters(data, alpha = alpha, beta = FALSE, gamma = FALSE)
  fitted.values <- fitted(ses.model)[, 1]  
  # Extract the fitted values
  mse <- mean((data - fitted.values)^2, na.rm = TRUE)
  return(mse)
}

# Perform grid search over the alpha values
mse.values <- sapply(alpha.values, calculate.mse, data = daily.ts)

# Find the alpha value that minimizes the MSE
best.alpha <- alpha.values[which.min(mse.values)]
cat("Best alpha:", best.alpha, "\n")
```

Model tuning is a essential step in improving the performance of forecasting models by optimizing its hyperparameters. In this example, we performed grid search to tune the alpha parameter for Simple Exponential Smoothing (SES). By finding the alpha value that minimizes the Mean Squared Error (MSE), we improved the model's accuracy, leading to more reliable forecasts. The process of tuning hyperparameters can be applied to all forecasting models and datasets to achieve optimal fit.

## Ensemble Methods

An ensemble model combines predictions from multiple models to improve overall forecasting accuracy and robustness. The idea is that by aggregating the strengths of various models, the ensemble can outperform any single model. Ensemble methods can be used in regression, classification, and time series forecasting tasks.

Let's create an ensemble model by combining the predictions from three different models:

1.  **ARIMA (AutoRegressive Integrated Moving Average)**
2.  **SES (Simple Exponential Smoothing)**
3.  **Linear Trend Model**

We will average the predictions from the three models to create the ensemble forecast, although if we find that one model has generally lower *MAD* or *MSE* we might choose a weighted average with higher weights for the better performing model.

```{r}
library(TTR)
library(forecast)

csv <- "https://s3.us-east-2.amazonaws.com/artificium.us/datasets/waterusage.csv"

# Load the data
water.usage <- read.csv(csv,
                        stringsAsFactors = F)

# Convert 'daily' to a time series object
daily.ts <- ts(water.usage$daily, start = c(2023, 8), 
               frequency = 12)

# Fit ARIMA model
arima.model <- auto.arima(daily.ts)
arima.forecast <- forecast(arima.model, h = 12)$mean

# Fit SES model
ses.model <- HoltWinters(daily.ts, 
                         beta = FALSE, 
                         gamma = FALSE)
ses.forecast <- forecast(ses.model, h = 12)$mean

# Fit Linear Trend model
t <- time(daily.ts)
linear.model <- lm(water.usage$daily ~ t)
future.time <- seq(from = (max(t) + 1/12), 
                   to = (max(t) + 12/12), 
                   by = 1/12)
linear.forecast <- predict(linear.model, 
                           newdata = data.frame(time = future.time))

# Combine predictions (ensemble)
ensemble.forecast <- (arima.forecast + ses.forecast + linear.forecast) / 3
```

The forecast for the ensemble model is `r round(ensemble.forecast[12],1)`. Ensemble models combine predictions from multiple models to improve forecasting accuracy and robustness. By averaging the forecasts from ARIMA, SES, and Linear Trend models, we created an ensemble model that leverages the strengths of each individual model. This approach can lead to more accurate and reliable forecasts.

## Seasonality Adjustment

Seasonality involves fluctuations that occur at regular intervals and are thus periodic. For example, retail sales might increase every December due to the holiday shopping season, or electricity consumption might peak every summer due to increased use of air conditioning. Furthermore, the length of the seasonal cycle is constant and known, *i.e.*, it has a fixed frequency. Common seasonal frequencies include monthly (*e.g.*, monthly sales data), quarterly (*e.g.*, quarterly financial results or budgets), and annually (*e.g.*, annual holiday sales periods). Because seasonality follows a regular pattern, it is predictable and can be identified and modeled to improve forecasting accuracy, *i.e.* forecasts can be adjusted to account for seasonal ups and downs.

Seasonality can be identified through visual inspection of time series plots or by using statistical methods such as autocorrelation plots. Visual inspection involves plotting the data and looking for regular patterns. Autocorrelation plots (ACF plots) show the correlation of the time series with its own past values and can highlight the presence of seasonality.

In time series forecasting models, seasonality can be accommodated *additively* or *multiplicatively*. An additive model adds a "seasonality adjustment" to each forecast that adjusts the forecast value up or down depending in which season the forecast value falls. A multiplicative model multiplies the forecast value by a "seasonality index" to make an adjustment of the forecast for a season.

Seasonality (or cyclicality) adjustment involves adjusting the forecast to account for period peaks and valleys, *i.e.* accounting for some seasons being above normal and some being below normal. This process helps in better understanding the underlying trend and making more accurate predictions.

There are two common ways to account for seasonality and to adjust forecasts for seasonality:

1.  **Multiplicative Adjustment**: The forecast is multiplied by an adjustment factor (the "seasonality index") depending for which season the forecast is. The adjustment factor is calculated from the data by finding the average difference between the mean and each season.

2.  **Additive Adjustment**: This requires building a Linear Trend Line Model using both time and the season as factors in the regression. The forecast is adjusted by adding (or subtracting) a seasonality factor depending for which season the forecast is.

```{r echo=F}
Period <- 1:16
Calls <- c(218,247,243,292,225,254,255,299,234,265,264,327,250,283,289,356)
QuarterNum <- rep(1:4, times = 4)
Quarter <- NA
Year <- c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4)

callcenter.df <- data.frame(
  Period,
  Calls,
  QuarterNum,
  Quarter,
  Year
)

callcenter.df$Quarter <- paste0("Q",callcenter.df$QuarterNum)
colnames(callcenter.df) <- c("Period","Calls","QuarterNum","Quarter","Year")
```

Let's explore both of these techniques by considering the example below for a time series containing in-bound calls to a call center for four years by quarter. For example, in quarter 3 of year 2, the call center received `r callcenter.df$Calls[which(callcenter.df$Quarter == "Q3" & callcenter.df$Year == "2")]` calls.

```{r echo=F}
library(reshape2)

reshaped.df <- dcast(callcenter.df, Quarter ~ Year, value.var = "Calls")

knitr::kable (
  reshaped.df,
  format = "simple",
  align = "cccc",
  col.names = c("Quarter","Y1","Y2","Y3","Y4")
)
```

### Identify Seasonality

The first step is not plot time series and use the visualization to determine whether their is cyclicality and what the season are.

```{r class.source = 'fold-hide'}
library(ggplot2)

ggplot(callcenter.df, 
       aes(x = Period, y = Calls, 
           color = factor(Quarter), group = 1)) +
  geom_line() +
  geom_point() +
  labs(title = "Quarterly Call Volume",
       x = "Time Period",
       y = "Number of Calls",
       color = "Quarter") +
  theme_minimal()
```

From the graph we can see that every Q4 (*i.e.*, time period 4, 8, 12, and 16) is a peak where the number of calls spike, and Q1 of each year (*i.e.*, time period 1, 5, 9, and 13). In addition, there is an overall upward trend, so a Linear Regression Trend Model would be an appropriate model to use; an alternative might be a Holt's Linear Trend Model.

### Multiplicative Adjustment

The process for a multiplicative seasonality adjusted forecast model requires a seasonality index (the multiplicative factor) and the process for calculating this factor is summarized below:

1.  For each one year period (the cycle), calculate the average across quarters (the seasons) within the cycle.
2.  For each time period, divide the actual seasonal demand by the average seasonal demand.
3.  Compute the average seasonality index for each season.
4.  Remove seasonality variance by calculating a forecast for the entire next year (cycle) and then divide that by the number of quarters (seasons) to get an average.
5.  Multiply the average forecast by the seasonality index for each season to get a forecast for the next year (cycle) per quarter (season).

Let's start by calculating the seasonality index. The graphic below shows how to calculate the seasonality index for each season.

```{r class.source = 'fold-hide'}
avg.calls <- aggregate(
  x = callcenter.df[,c("Year","Calls")], 
  FUN = mean, 
  by = list(callcenter.df$Year)
)

avg.calls <- avg.calls[,2:3]

colnames(avg.calls) <- c("Year","AvgNumCalls")

for (t in 1:nrow(callcenter.df)) {
  Y <- callcenter.df$Year[t]
  SI <- avg.calls$AvgNumCalls[Y]
  callcenter.df$SI[t] <- callcenter.df$Calls[t] / SI
}

avg.SI <- aggregate(
  x = callcenter.df[,c("QuarterNum","SI")], 
  FUN = mean, 
  by = list(callcenter.df$Quarter)
)
```

![](images/seasonality-index-calculation.jpeg){width="75%"}

Next, we'll build a Linear Regression Trend Model; of course, we'll need to make the data a time series rather than a 2D table. So, the time is Year 1/Quarter 1 followed by Year 1/Quarter 2, and so forth.

The data frame created looks like this (only the first and last three rows are shown):

```{r class.source = 'fold-hide'}
dots.df <- data.frame(
  Period = "...",
  Calls = "...",
  Quarter = "...",
  Year = "..."
)

knitr::kable(
  rbind(head(callcenter.df[,c(1,2,4,5)],3),
        rbind(dots.df,
              tail(callcenter.df[,c(1,2,4,5)],3))),
  format = "simple",
  align = "cccc",
  row.names = FALSE
)
```

Now we can train the linear regression model.

```{r}
trend.model.calls <- lm(callcenter.df$Calls ~ callcenter.df$Period)

# forecast quarterly calls for next four quarters
n <- nrow(callcenter.df) + 1              # time periods
fc <- c(4)                                # forecasts
b <- trend.model.calls$coefficients[[1]]  # intercept
m <- trend.model.calls$coefficients[[2]]  # slope

for (t in n:(n+3)) {
  fc[t-n+1] <- m*t + b
}
```

Next, we need to average the forecasts and multiply by its seasonality index. This will be a forecast for time period 17 (the next time period) and that is a Q1, so the seasonality index for Q1 is `r avg.SI$SI[1]`:

```{r}
mean.fc <- mean(fc)
adjusted.forecast <- mean.fc * avg.SI$SI[1]
```

The seasonality adjusted forecast for Quarter 1 of Year 5 is therefore `r round(adjusted.forecast,0)` calls.

### Additive Adjustment

An alternative approach to multiplicative seasonality adjustments is an additive model. This approach requires the use of a Linear Regression Trend Line Model, while a multiplicative model can be used with any forecasting method. In an additive model, we will add the season as a factor. Since it is a categorical factor it must be one-hot encoded.

One-hot encoding, also sometimes called "dummy encoding" as it introduced "dummy" or "placeholder" variables, transforms a categorical variable into a series of binary (0 or 1) columns. Each unique category value (level) becomes a new column, and a 1 or 0 is placed in the column depending on whether the categorical variable matches the column heading. If the categorical variable has *n* levels (*i.e.*, *n* unique values), then we only need *n-1* binary columns.

For the call center example, each quarter is a season, so "season" has four levels (Quarter 1 through Quarter 4), so we need to add three more columns to the data frame labeled "Q1", "Q2", and "Q3". The quarterly seasons are then encoding as follows:

```{r echo=F}
dummies <- data.frame (
  Quarter <- c("Q1","Q2","Q3","Q4"),
  Q1 <- c(1,0,0,0),
  Q2 <- c(0,1,0,0),
  Q3 <- c(0,0,1,0)
)

knitr::kable(
  x = dummies,
  align = "cccc",
  format = "simple",
  col.names = c("Quarter","Q1","Q2","Q3")
)
```

Note that Q4 is encoded as not Q1, not Q2,and not Q3.

Next, we will add the new dummy columns for the one-hot endoing to the `callcenter.df` data frame. The packages **dummies** and **caret** provide functions for doing such as encoding, but we will not use those; instead, we will do the encoding ourselves using `ifelse()`.

```{r}
callcenter.df$Q1 <- ifelse(callcenter.df$Quarter == "Q1", 1, 0)
callcenter.df$Q2 <- ifelse(callcenter.df$Quarter == "Q2", 1, 0)
callcenter.df$Q3 <- ifelse(callcenter.df$Quarter == "Q3", 1, 0)
```

To build an additive model, we will train a linear regression trend model with the "dummy columns" for the season levels as additional independent variables.

```{r}
trend.model.adjusted <- lm(
  callcenter.df$Calls ~ callcenter.df$Period + 
    callcenter.df$Q1 +
    callcenter.df$Q2 +
    callcenter.df$Q3
)

summary(trend.model.adjusted)
```

Now, we can calculate a forecast for the next time period (time period 17, which is Quarter 1 of Year 5). Since that is a "Q1", we will pass *1* for the variable `Q1` in the model and *0* for the variables `Q2` and `Q3`. In other words, we will use *1* for the correct season or all *0* values if it is a "Q4".

Notice how the coefficients are all negative as all quarters except Q4 are below the mean. Also, note that the coefficient for Q1 is the largest as it is the most below the mean.

```{r}
b <- trend.model.adjusted$coefficients[[1]]   # intercept
t <- trend.model.adjusted$coefficients[[2]]   # time period coefficient
Q1 <- trend.model.adjusted$coefficients[[3]]  # Q1 coefficient
Q2 <- trend.model.adjusted$coefficients[[4]]  # Q1 coefficient
Q3 <- trend.model.adjusted$coefficients[[5]]  # Q1 coefficient

F.17 <- (t * 17) + b + Q1*1 + Q2*0 + Q3*0
```

From this we get a seasonality adjusted forecast for Quarter 1 of Year 5 of `r round(F.17,0)` calls.

## Forecast Interval

A **forecast interval** (or prediction interval) provides a range within which we expect future observations to fall with a certain probability. It quantifies the uncertainty around the point forecast, offering a more comprehensive view of potential future values. The width of the interval reflects the model's uncertainty: wider intervals indicate greater uncertainty and consequently more risk in using the point forecast.

The **Point Forecast** is the single best estimate of the future value and what we have calculated thus far However, it is generally much better to report a range between a lower and upper value at a probability that the true value will fall within the forecast interval (*e.g.*, 95%).

A 95% forecast interval implies that we are 95% confident that the true future value will lie within this interval. This is typically calculated as:

$\text{Forecast Interval} = \text{Point Forecast} \pm Z_{\alpha/2} \times \text{Standard Error}$

where:

-   $Z_{\alpha/2}$ is the critical value from the standard normal distribution for a 95% confidence level, approximately 1.96. Other values can be obtained from the standard distribution; the table below lists common values for common probabilities.
-   **Standard Error** (SE) is the standard deviation of the forecast errors, reflecting the variability of the forecast.

```{r eval=T, class.source = 'fold-hide'}
# Calculate z-scores for different confidence levels
z_scores <- data.frame(
  Confidence_Level = c("99%", "95%", "90%", "80%", "70%"),
  Z_Score = c(
    qnorm(0.995),  # 99% confidence level
    qnorm(0.975),  # 95% confidence level
    qnorm(0.95),   # 90% confidence level
    qnorm(0.90),   # 80% confidence level
    qnorm(0.85)    # 70% confidence level
  )
)

# Display the data frame using kable
knitr::kable(z_scores, 
             col.names = c("Confidence Level", "Z-Score"), 
             caption = "Z-Scores for Different Confidence Levels",
             format = "simple",
             digits = 3,
             align = "cc",
             label = NA)
```

Forecasts produced by a Linear Regression Trend Model have a different way of calculating the forecast interval as it is a statistical method.The other methods are non-statistical and therefore do produce a value of the standard error and one has to be estimated.

**LTRM Forecast Interval**

For an LRTM the forecast interval is calculated as shown below. The example uses the `water.usage` time series. The function `summary(trend.model)` produces the residual standard error as one of its components: `model.results$sigma`.

```{r}
# Load necessary libraries
library(forecast)

csv <- "https://s3.us-east-2.amazonaws.com/artificium.us/datasets/waterusage.csv"

# Load the data
water.usage <- read.csv(csv,
                        stringsAsFactors = F)

# fit a linear trend model using linear regression
trend.model <- lm(water.usage$daily ~ water.usage$period)

print(trend.model)

# Forecast future values using the trend model
b <- trend.model$coefficients[[1]]
m <- trend.model$coefficients[[2]]

# forecast for next time period
forecast.val <- m*13 + b

model.results <- summary(trend.model)
SE <- model.results$sigma

z.score.95 <- qnorm(0.975) # two-tailed, so 0.05/2

lower95 <- forecast.val - (z.score.95 * SE)
upper95 <- forecast.val + (z.score.95 * SE)
```

So, for the LRTM we have a point forecast of `r round(forecast.val,1)` with a 95% forecast (prediction) interval ranging from `r round(lower95,1)` to `r round(upper95,1)`. The range is quite wide indicating substantial uncertainty in the forecast and using the point forecast is thus risky. If someone were to budget for next month's water use and they are "risk averse", they'd be best served by using the upper range as it is more pessimistic in this situation.

**Other Forecast Intervals**

For all other methods, because they are not statistical, we need to use a different method since we do not have an standard error from the regression. We do, however, have residuals (errors) and can use those to derive an estimate of the standard error.

$\sigma_E = \sqrt{\frac{\sum_{t=1}^{n}(E_t-\bar{E})^2}{n-1}}$

Let's demonstrate by calculating the 95% forecast interval for a WMA model.

```{r}
# define the weights for the WMA (3-period example)
w <- c(0.1, 0.3, 0.6)

# calculate "forecast" for each prior time period
water.usage$WMA <- calculate.wma(water.usage$daily, 
                                 n = 3, 
                                 weights = w)

forecast.val <- water.usage$WMA[12]

periods <- nrow(water.usage)

# calculate the absolute errors
for (t in 3:periods) {
  water.usage$E[t] <- water.usage$daily[t-1] - water.usage$WMA[t]
}

# get the errors for all but first two forecasts as those are NA
WMA.E <- water.usage$E[3:periods]

mean.E <- mean(WMA.E)

sigma <- sqrt(sum((WMA.E - mean.E)^2)/(length(WMA.E)))

z.score.95 <- qnorm(0.975) # two-tailed, so 0.05/2

lower95 <- forecast.val - (z.score.95 * sigma)
upper95 <- forecast.val + (z.score.95 * sigma)
```

So, for the WMA we have a point forecast of `r round(forecast.val,1)` with a 95% forecast (prediction) interval ranging from `r round(lower95,1)` to `r round(upper95,1)`.

## Tracking Signal

Forecasting models must be continually evaluated to assure that they still provide accurate forecast estimates.

A *tracking signal (TS)* is a measure used in forecasting to detect bias in the forecast. It helps in identifying whether the forecast model consistently overestimates or underestimates the actual values. It is defined as the ratio of the Cumulative Forecast Error (*CFE*) and the Mean Absolute Error (*MAE*). It should be calculated after each forecast and then evaluated to see if the forecast model requires re-training or further tuning.

$\text{TS} = \frac{\text{CFE}}{\text{MAE}}$

where

$\text{CFE} = \sum_{t=1}^{n}(Y_t - F_t)$

and

$\text{MAE} = \sum_{t=1}^{n}(|Y_t - F_t|)$

Interpretation of the tracking signal generally consider whether it is:

-   **Positive**: which means that the forecasts are generally under predicting, *i.e.*, $Y > F$
-   **Negative**: which means that the forecasts are generally over predicting, *i.e.*, $Y < F$
-   **Near Zero**: which means that the forecast model is unbiased

A re-training and/or re-tuning of the forecast model is generally advisable when $\text{TS} > ±4 \times MAE$ as that means that the errors are starting to far exceed the mean.

### Example TS Calculation

```{r echo=F}
actual <- c(220, 245, 270, 261, 283, 301)
forecast <- c(212, 249, 260, 252, 276, 292)
error <- actual - forecast

periods <- length(actual)

# Create the data frame
df.TS <- data.frame (
  period = 1:periods,
  actual,
  forecast,
  error
)

df.TS$CFE <- cumsum(df.TS$error)

df.TS$ABS <- abs(df.TS$actual - df.TS$forecast)

MAE <- mean(df.TS$ABS)
CFE <- tail(df.TS$CFE,1)
```

Let's use a simple example to demonstrate the calculation process for the tracking signal.

```{r echo=F}
knitr::kable(x = df.TS,
             align = "c",
             format = "simple",
             col.names = c("t", "Y(t)",
                           "F(t)", "E(t)",
                           "CFE",
                           "|E(t)|"))
```

The errors are the difference between the actual and the forecast values. The CFE is `r round(CFE,0)` and the MAE is `r round(MAE,1)`, so the tracking signal is

$\text{TS} = \frac{\text{CFE}}{\text{MAE}}=\frac{\text{39}}{\text{7.83}}=4.98$

The tracking signal is positive so this means that the model that produced these forecasts is biased towards under predicting.

### Calculating TS in R

As an example, let's calculate TS for the previously built WMA Model for the `water.usage` data set from Section [Calculating WMA in R]. The *MAE* was previously calculated in Section [Calculating MAE for WMA Model].

We need two metrics:

1.  *MAE*, and
2.  *CFE*.

Since we have *MAE* in the variable `MAE.WMA` from before, we only need to calculate *CFE* which is the sum of the forecast errors.

```{r}
# calculate the errors
periods <- nrow(water.usage)
for (t in 3:periods) {
  water.usage$abs.err[t] <- water.usage$daily[t-1] - water.usage$WMA[t]
}

# calculate sum of errors (CFE)
CFE <- sum(water.usage$abs.err, na.rm = T)
```

The CFE for this model's first forecast is `r round(CFE,1)` and its MAE is `r round(MAE.WMA,1)`, which makes the tracking signal `r round(CFE/MAE.WMA,1)`. Since the tracking signal is negative, it means that the model, on average, over predicts. However, the tracking signal is well below *4* and thus within the expected range.

## Summary: Forecasting {#concl}

Forecasting is a multifaceted discipline that blends statistical methods with domain expertise. By understanding the underlying components of time series data and selecting appropriate forecasting models, one can make accurate predictions that are invaluable in many fields. R provides a rich ecosystem of packages and functions to implement and evaluate various forecasting methods, making it an common platform for this purpose.

In this lesson, we explored several key methods for forecasting, including Weighted Moving Average (WMA), Exponential Smoothing, ARIMA, and Regression Trend Line. We showed how to incorporate trend and seasonality into forecasts and how to use various metrics to evaluate and compare models. In addition, we explained how to tune models and combine them into ensemble models. Finally, we demonstrated the use of a tracking signal to detect forecast bias. Throughout the lesson, we emphasized the importance of these techniques and metrics in improving the accuracy and reliability of forecasts.

------------------------------------------------------------------------

## Files & Resources

```{r zipFiles, echo=FALSE}
zipName = sprintf("LessonFiles-%s-%s.zip", 
                 params$category,
                 params$number)

textALink = paste0("All Files for Lesson ", 
               params$category,".",params$number)

# downloadFilesLink() is included from _insert2DB.R
knitr::raw_html(downloadFilesLink(".", zipName, textALink))
```

------------------------------------------------------------------------

## See Also

## Slide Deck

Slide Deck ([HTML](slide-deck/sd-3-303.html) \| [Quarto Presentation](slide-deck/sd-3-303.qmd))

## References

[Fundamentals of Time Series Data and Analysis](https://www.aptech.com/blog/introduction-to-the-fundamentals-of-time-series-data-and-analysis/)

## Errata

[Let us know](https://form.jotform.com/212187072784157){target="_blank"}.
