ARIMA Time Series Modeling

https://www.analyticsvidhya.com/blog/2015/12/complete-tutorial-time-series-modeling/

Step 0 : Create a ts object

You only need a (single) time series, a frequency, and a start date. The examples at the bottom of the ?ts documentation should be very helpful. I’m guessing you’d write something like ts( your_timeseries_data, frequency = 365, start = c(1980, 153)) for instance if your data started on the 153rd day of 1980.

Step 1: Visualize the Time Series

plot(AirPassengers)

#this will plot the time series

abline(reg=lm(AirPassengers~time(AirPassengers)))

#this will fit a line

cycle(AirPassengers)

#this will print the cycle across year

plot(aggregate(AirPassengers,FUN=mean))

#This will aggregate the cycles and display a year on year trend

boxplot(AirPassengers~cycle(AirPassengers))

#Box plot across months will give us a sense on seasonal effect

Step 2: Stationarize the Series

We know that we need to address two issues before we test stationary series. One, we need to remove unequal variances. We do this using log of the series. Two, we need to address the trend component. We do this by taking difference of the series. Now, let’s test the resultant series. (Dickey-Fuller test of tseries package)

adf.test(diff(log(AirPassengers)), alternative="stationary", k=0)
data: diff(log(AirPassengers))
 Dickey-Fuller = -9.6003, Lag order = 0,
 p-value = 0.01
 alternative hypothesis: stationary

We see that the series is stationary enough to do any kind of time series modelling.

There are three commonly used technique to make a time series stationary if otherwise:

1.  Detrending : Here, we simply remove the trend component from the time series. For instance, the equation of my time series is:

x(t) = (mean + trend * t) + error

We’ll simply remove the part in the parentheses and build model for the rest.

2. Differencing : This is the commonly used technique to remove non-stationarity. Here we try to model the differences of the terms and not the actual term. For instance,

x(t) – x(t-1) = ARMA (p ,  q)

This differencing is called as the Integration part in AR(I)MA. Now, we have three parameters

p : AR

d : I

q : MA

3. Seasonality : Seasonality can easily be incorporated in the ARIMA model directly. More on this has been discussed in the applications part below.

Step 3: Find Optimal Parameters

The parameters p,d,q can be found using  ACF and PACF plots.

ACF plot is a bar chart of the coefficients of correlation between a time series and lags of itself.
PACF plot is a plot of the partial correlation coefficients between the series and lags of itself.

acf是自相關係數,並不對其他變量加以控制。而偏自相關係數pacf,就是控制住其他變量後計算的自相關係數,由於他挖空了其他變量影響,所以二者的值應該不同

To find p and q you need to look at ACF and PACF plots. The interpretation of ACF and PACF plots to find p and q are as follows:

AR (p) model: If ACF plot tails off* but PACF plot cut off** after p lags
MA(q) model: If PACF plot tails off but ACF plot cut off after q lags
ARMA(p,q) model: If both ACF and PACF plot tail off, you can choose different combinations of p and q , smaller p and q are tried.
ARIMA(p,d,q) model: If it’s ARMA with d times differencing to make time series stationary.

Use AIC and BIC to find the most appropriate model. Lower values of AIC and BIC are desirable.

*Tails of mean slow decaying of the plot, i.e. plot has significant spikes at higher lags too.
**Cut off means the bar is significant at lag p and not significant at any higher order lags.

首先判斷acf圖和pacf圖是否平穩,假如非平穩那麼需要差分,如果一階差分後仍非平穩,則需要二階差分,等等。(d)

在確定差分平穩後,需要判斷p和q,這裏定階方法有很多,因爲p和q的確定也很複雜,不是一下子就可以確定的。主要有這麼幾種(1)觀察法,直接觀察,如果acf在q+1階突然截斷,在q處截尾,則爲ma(q)序列,同理,pacf在p處截尾則爲ar(p)序列,否則爲arma(p,q)序列,二者結合進一步判斷(2)參數檢驗,利用數理統計檢驗高階模型的新增加的參數是否近似爲零,檢驗模型殘差的相關特性等(3)信息準則,確定一個與模型階數有關的準則,如AIC、BIC等,既考慮擬合效果接近程度,又考慮參數個數。實際中往往多種方法綜合應用,選擇最合適的p,d,q.

對於同一個圖有人認爲是5階截尾,有人認爲是7階截尾,仁者見仁,但是基本原則就是這樣,在什麼地方截尾認爲是幾階的,雖然截尾地方往往不止一處,在模型診斷部分可以比較不同的擬合效果,增強說服力。

Here is a link that might help you understand the concept further http://people.duke.edu/~rnau/arimrule.htm

acf(diff(log(AirPassengers)))
pacf(diff(log(AirPassengers)))

An addition to this approach is can be, if both ACF and PACF decreases gradually, it indicates that we need to make the time series stationary and introduce a value to “d”.Next step is to find the right parameters to be used in the ARIMA model.

We already know that the ‘d’ component is 1 as we need 1 difference to make the series stationary. (We have difference the series once and get to see that the trend is removed. Had the trend been still there we would have difference the series once again. This series did not require to be difference more than once; hence d=1.)

Clearly, ACF plot cuts off after the first lag. Hence, we understood that value of p should be 0 as the ACF is the curve getting a cut off. While value of q should be 1 or 2. After a few iterations, we found that (0,1,1) as (p,d,q) comes out to be the combination with least AIC and BIC.

 

Step 4: Build ARIMA Model

 

With the parameters in hand, we can now try to build ARIMA model. The value found in the previous section might be an approximate estimate and we need to explore more (p,d,q) combinations. The one with the lowest BIC and AIC should be our choice. We can also try some models with a seasonal component. Just in case, we notice any seasonality in ACF/PACF plots.

Let’s fit an ARIMA model and predict the future 10 years. Also, we will try fitting in a seasonal component in the ARIMA formulation. Then, we will visualize the prediction along with the training data. You can use the following code to do the same :

fit <- arima(log(AirPassengers), c(0, 1, 1),seasonal = list(order = c(0, 1, 1), period = 12))

Step 5: Make Predictions

Once we have the final ARIMA model, we are now ready to make predictions on the future time points. We can also visualize the trends to cross validate if the model works fine.

pred <- predict(fit, n.ahead = 10*12)
ts.plot(AirPassengers,2.718^pred$pred, log = "y", lty = c(1,3))

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章