Time Series Modeling and Forecasting using Apple Stock Price

I got the monthly stock price of Apple Inc. from 2005-03-01 to 2014-06-02. data source. I have done the following procedures to fit the appropriate model and do forecasting. The modeling was done on the Opening price.

* Stationarity and  data transformation 
* Initial models nomination
* Model comparison and final model selection
* Assumption verification 
* Forecasting
In [12]:
libs <- c("ggplot2","lubridate","astsa","tseries","forecast")
lapply(libs,require, character.only= TRUE)
library(repr)
options(repr.plot.width=7,repr.plot.height=4)
In [13]:
AppleMonthlStack <- read.csv("C:/Users/selamina/...data.csv")
AppleOpenPrice=cbind.data.frame(Date=mdy(AppleMonthlStack[,1]),OpenPrice=AppleMonthlStack[,2])
In [14]:
AppleOpenPriceFinal=AppleOpenPrice[c(183:294),] # 112 Total Observation begining 2005-03-01
In [15]:
tailForecast=tail(AppleOpenPriceFinal,10)# last 10 obs for forecast
ggplot(AppleOpenPriceFinal, aes(Date, OpenPrice))+ geom_line()+ 
      labs(y="stock Price (USD)",x="monthly Stock: 2005-03-01 to 2014-06-02",title="From 2005-03-01 to 2014-06-02")+
      theme(axis.text.x=element_text(angle = 45, size = 14,face="bold"))+
      geom_line(data=tailForecast,aes(Date,OpenPrice),color='red',size=1.5) # just red once will be kept for forecasting
AppleOpenPrice=ts(head(AppleOpenPriceFinal[,2],-10)) # keeping the bottom 10 for Forecasting

The last part, red colored in the time series plot, is kept for forecast comparison.

Stationarity Verification

In [16]:
adf.test(AppleOpenPrice, alternative="stationary")
	Augmented Dickey-Fuller Test

data:  AppleOpenPrice
Dickey-Fuller = -2.1717, Lag order = 4, p-value = 0.5055
alternative hypothesis: stationary

The adf.test results a p-value of 0.5055, which indicates as the original data is not Stationary at 5% level of significance.

In [17]:
par(mfrow=c(1,2))
acf(AppleOpenPrice,100);pacf(AppleOpenPrice,100)

The slow decline in the ACF plot indicates the existence of correlation among sequence of observations and also the immediate cut off in the PACF plot confirms that. Thus, the origional data is not Stationary.

Transformation

Clearly the general trend of Apple Stock price in the time series plot indicates an upward parabola. Increasing at an exponential rate. This shows that we could take log transformation with or with out lag 1 differencing to stablize and make the time series stationary.

In [18]:
AppleOpenPricelog=log(AppleOpenPrice)
plot(AppleOpenPricelog,main="Log of the Open price")
AppleOpenPriceD1 = diff(AppleOpenPrice, 1)
plot(AppleOpenPriceD1,main="Lag 1 differenced Open Price")
AppleOpenPricelogD1 = diff(AppleOpenPricelog, 1)
plot(AppleOpenPricelogD1,main="Lag 1 difference of log(Open Price) ")

As we can clearly see, the log transformed data is not stationary in mean. The lag one difference is not stationary in variance. That is why I decided to combine both, lag 1 differencing for the $log(open-Price)$. Stationarity test after lag-1 difference of the $log(open-Price)$ is done below.

In [24]:
adf.test(AppleOpenPricelogD1, alternative="stationary")
par(mfrow=c(1,2))
acf(AppleOpenPricelogD1,100);pacf(AppleOpenPricelogD1,100)
Warning message:
In adf.test(AppleOpenPricelogD1, alternative = "stationary"): p-value smaller than printed p-value
	Augmented Dickey-Fuller Test

data:  AppleOpenPricelogD1
Dickey-Fuller = -4.252, Lag order = 4, p-value = 0.01
alternative hypothesis: stationary

The adf.test and both ACF and PACF confirms that the transformed data is stationary both in mean and variance.

Model Selection

In [25]:
par(mfrow=c(1,2))
acf(AppleOpenPricelogD1,100);pacf(AppleOpenPricelogD1,100)

From the ACF and PACF plot, its clear that the transformed data is White Noise.

In [8]:
options(repr.plot.width=7,repr.plot.height=6)
Model1p0d1q0=Arima(AppleOpenPricelog, order=c(0, 1, 0))
tsdiag(Model1p0d1q0)
Model1p0d1q0
Series: AppleOpenPricelog 
ARIMA(0,1,0)                    

sigma^2 estimated as 0.012:  log likelihood=80.05
AIC=-158.09   AICc=-158.05   BIC=-155.48

As we can see in the time series diagram output, our model is a valid or acceptable for the following reasons.

1. The standardized residuals plot support a random residuals with centre or mean Zero.
2. The ACF of the residual confirms as the residuals are White Noise.
3. All the p-values of the Ljung-box statistics are way above the boundary which confirms the independence of the residuals.

Final model equation

$$Log(x_t)=Log(x_{t-1})+\epsilon_t $$

The logarithm of Apple open price for selected month is on average equals with the log of the previous month open price. The variability is only a random factor of average zero, with in the study time period. Logically make sense, stock price hardly change at monthly level. Of course the model could be way different if we had taken weekly or daily price value.

Residual Assesment

In [9]:
options(repr.plot.width=7,repr.plot.height=3)
fittedActual=cbind.data.frame(fit=as.numeric(exp(fitted(Model1p0d1q0))),Date=head(AppleOpenPriceFinal[,1],-10),
                              OpenPrice=as.numeric(exp(AppleOpenPricelog))) 

ggplot(fittedActual, aes(Date, OpenPrice)) + geom_line() +
labs(y="stock Price (USD)",title="monthly Stock: 2005-03-01 to 2014-06-02",x="Black line=Actual Values; Red line= Fitted values")+
theme(axis.text.x=element_text(angle = 45, size = 14,face="bold"))+geom_line(data=fittedActual,aes(Date,fit),color='red',size=1)

res=as.numeric(Model1p0d1q0$residuals)
plot(Model1p0d1q0$residuals, main="Residuals Plot")
qqnorm(res,pch=16) 
qqline(res,col="red",lwd=4,lty=2)

message("Compute the Box-Pierce or Ljung-Box test statistic for examining the null hypothesis of independence 
         in a given time series.")
Box.test(Model1p0d1q0$residuals, lag=1)
Compute the Box-Pierce or Ljung-Box test statistic for examining the null hypothesis of independence 
         in a given time series.
	Box-Pierce test

data:  Model1p0d1q0$residuals
X-squared = 1.8105, df = 1, p-value = 0.1785

According to the residual plots, all the assumptions are significant or acceptable.

Forecast

In [18]:
forecast=forecast.Arima(Model1p0d1q0, h=10, level=c(99.75))
forecast=exp(data.frame(forecast))
forecastOriginal=cbind(tailForecast,forecast)
ggplot(forecastOriginal, aes(Date, OpenPrice)) + geom_line() + 
labs(y="stock Price (USD)",title="Apple Stock price forecast for 10 consecutive months",x="Black line=Actual Values;Green 
                                  line=Forecasted Values; Red line= 95% confidence intervale")+
theme(axis.text.x=element_text(angle = 45, size = 14,face="bold"))+
geom_line(data=forecastOriginal,aes(Date,Lo.99.75),color='red',size=1)+
geom_line(data=forecastOriginal,aes(Date,Hi.99.75),color='red',size=1)+
geom_line(data=forecastOriginal,aes(Date,Point.Forecast),color='green',size=1)
forecastOriginal
DateOpenPricePoint.ForecastLo.99.75Hi.99.75
2852013-09-03493.1455.75327.2664634.6757
2862013-10-01478.45455.75285.3166727.9913
2872013-11-01524.02455.75256.8113808.7964
2882013-12-02558455.75235.0045883.8469
2892014-01-02555.68455.75217.332955.7175
2902014-02-03502.61455.75202.50151025.711
2912014-03-03523.42455.75189.75841094.592
2922014-04-01537.76455.75178.6191162.855
2932014-05-01592455.75168.75281230.842
2942014-06-02633.96455.75159.92321298.799

The equal point estimation for the forecast is due to the fact that our selected model is random walk. When we see the point forecast estimate, all are under estimating the true value slightly. However, all the forcast and the true values are under the 95 percent confidence range.