Time Series Modeling and Forecasting using Apple Stock Price¶

I got the monthly stock price of Apple Inc. from 2005-03-01 to 2014-06-02. data source. I have done the following procedures to fit the appropriate model and do forecasting. The modeling was done on the Opening price.

* Stationarity and  data transformation 
* Initial models nomination
* Model comparison and final model selection
* Assumption verification 
* Forecasting

libs <- c("ggplot2","lubridate","astsa","tseries","forecast")
lapply(libs,require, character.only= TRUE)
library(repr)
options(repr.plot.width=7,repr.plot.height=4)

AppleMonthlStack <- read.csv("C:/Users/selamina/...data.csv")
AppleOpenPrice=cbind.data.frame(Date=mdy(AppleMonthlStack[,1]),OpenPrice=AppleMonthlStack[,2])

AppleOpenPriceFinal=AppleOpenPrice[c(183:294),] # 112 Total Observation begining 2005-03-01

tailForecast=tail(AppleOpenPriceFinal,10)# last 10 obs for forecast
ggplot(AppleOpenPriceFinal, aes(Date, OpenPrice))+ geom_line()+ 
      labs(y="stock Price (USD)",x="monthly Stock: 2005-03-01 to 2014-06-02",title="From 2005-03-01 to 2014-06-02")+
      theme(axis.text.x=element_text(angle = 45, size = 14,face="bold"))+
      geom_line(data=tailForecast,aes(Date,OpenPrice),color='red',size=1.5) # just red once will be kept for forecasting
AppleOpenPrice=ts(head(AppleOpenPriceFinal[,2],-10)) # keeping the bottom 10 for Forecasting

The last part, red colored in the time series plot, is kept for forecast comparison.

Stationarity Verification¶

adf.test(AppleOpenPrice, alternative="stationary")

	Augmented Dickey-Fuller Test

data:  AppleOpenPrice
Dickey-Fuller = -2.1717, Lag order = 4, p-value = 0.5055
alternative hypothesis: stationary

The adf.test results a p-value of 0.5055, which indicates as the original data is not Stationary at 5% level of significance.

par(mfrow=c(1,2))
acf(AppleOpenPrice,100);pacf(AppleOpenPrice,100)

The slow decline in the ACF plot indicates the existence of correlation among sequence of observations and also the immediate cut off in the PACF plot confirms that. Thus, the origional data is not Stationary.

Transformation¶

Clearly the general trend of Apple Stock price in the time series plot indicates an upward parabola. Increasing at an exponential rate. This shows that we could take log transformation with or with out lag 1 differencing to stablize and make the time series stationary.

AppleOpenPricelog=log(AppleOpenPrice)
plot(AppleOpenPricelog,main="Log of the Open price")
AppleOpenPriceD1 = diff(AppleOpenPrice, 1)
plot(AppleOpenPriceD1,main="Lag 1 differenced Open Price")
AppleOpenPricelogD1 = diff(AppleOpenPricelog, 1)
plot(AppleOpenPricelogD1,main="Lag 1 difference of log(Open Price) ")

As we can clearly see, the log transformed data is not stationary in mean. The lag one difference is not stationary in variance. That is why I decided to combine both, lag 1 differencing for the $log(open-Price)$. Stationarity test after lag-1 difference of the $log(open-Price)$ is done below.

adf.test(AppleOpenPricelogD1, alternative="stationary")
par(mfrow=c(1,2))
acf(AppleOpenPricelogD1,100);pacf(AppleOpenPricelogD1,100)

Warning message:
In adf.test(AppleOpenPricelogD1, alternative = "stationary"): p-value smaller than printed p-value

	Augmented Dickey-Fuller Test

data:  AppleOpenPricelogD1
Dickey-Fuller = -4.252, Lag order = 4, p-value = 0.01
alternative hypothesis: stationary

The adf.test and both ACF and PACF confirms that the transformed data is stationary both in mean and variance.

Model Selection¶

par(mfrow=c(1,2))
acf(AppleOpenPricelogD1,100);pacf(AppleOpenPricelogD1,100)

From the ACF and PACF plot, its clear that the transformed data is White Noise.

options(repr.plot.width=7,repr.plot.height=6)
Model1p0d1q0=Arima(AppleOpenPricelog, order=c(0, 1, 0))
tsdiag(Model1p0d1q0)
Model1p0d1q0

Series: AppleOpenPricelog 
ARIMA(0,1,0)                    

sigma^2 estimated as 0.012:  log likelihood=80.05
AIC=-158.09   AICc=-158.05   BIC=-155.48

Final model equation¶

$$Log(x_t)=Log(x_{t-1})+\epsilon_t $$

The logarithm of Apple open price for selected month is on average equals with the log of the previous month open price. The variability is only a random factor of average zero, with in the study time period. Logically make sense, stock price hardly change at monthly level. Of course the model could be way different if we had taken weekly or daily price value.

Residual Assesment¶

options(repr.plot.width=7,repr.plot.height=3)
fittedActual=cbind.data.frame(fit=as.numeric(exp(fitted(Model1p0d1q0))),Date=head(AppleOpenPriceFinal[,1],-10),
                              OpenPrice=as.numeric(exp(AppleOpenPricelog))) 

ggplot(fittedActual, aes(Date, OpenPrice)) + geom_line() +
labs(y="stock Price (USD)",title="monthly Stock: 2005-03-01 to 2014-06-02",x="Black line=Actual Values; Red line= Fitted values")+
theme(axis.text.x=element_text(angle = 45, size = 14,face="bold"))+geom_line(data=fittedActual,aes(Date,fit),color='red',size=1)

res=as.numeric(Model1p0d1q0$residuals)
plot(Model1p0d1q0$residuals, main="Residuals Plot")
qqnorm(res,pch=16) 
qqline(res,col="red",lwd=4,lty=2)

message("Compute the Box-Pierce or Ljung-Box test statistic for examining the null hypothesis of independence 
         in a given time series.")
Box.test(Model1p0d1q0$residuals, lag=1)

Compute the Box-Pierce or Ljung-Box test statistic for examining the null hypothesis of independence 
         in a given time series.

	Box-Pierce test

data:  Model1p0d1q0$residuals
X-squared = 1.8105, df = 1, p-value = 0.1785

According to the residual plots, all the assumptions are significant or acceptable.

Forecast¶

forecast=forecast.Arima(Model1p0d1q0, h=10, level=c(99.75))
forecast=exp(data.frame(forecast))
forecastOriginal=cbind(tailForecast,forecast)
ggplot(forecastOriginal, aes(Date, OpenPrice)) + geom_line() + 
labs(y="stock Price (USD)",title="Apple Stock price forecast for 10 consecutive months",x="Black line=Actual Values;Green 
                                  line=Forecasted Values; Red line= 95% confidence intervale")+
theme(axis.text.x=element_text(angle = 45, size = 14,face="bold"))+
geom_line(data=forecastOriginal,aes(Date,Lo.99.75),color='red',size=1)+
geom_line(data=forecastOriginal,aes(Date,Hi.99.75),color='red',size=1)+
geom_line(data=forecastOriginal,aes(Date,Point.Forecast),color='green',size=1)
forecastOriginal

The equal point estimation for the forecast is due to the fact that our selected model is random walk. When we see the point forecast estimate, all are under estimating the true value slightly. However, all the forcast and the true values are under the 95 percent confidence range.

	Date	OpenPrice	Point.Forecast	Lo.99.75	Hi.99.75
285	2013-09-03	493.1	455.75	327.2664	634.6757
286	2013-10-01	478.45	455.75	285.3166	727.9913
287	2013-11-01	524.02	455.75	256.8113	808.7964
288	2013-12-02	558	455.75	235.0045	883.8469
289	2014-01-02	555.68	455.75	217.332	955.7175
290	2014-02-03	502.61	455.75	202.5015	1025.711
291	2014-03-03	523.42	455.75	189.7584	1094.592
292	2014-04-01	537.76	455.75	178.619	1162.855
293	2014-05-01	592	455.75	168.7528	1230.842
294	2014-06-02	633.96	455.75	159.9232	1298.799