Bowling Green State University
(419) 378 - 9131
ebaltay@bgsu.edu
I got the monthly stock price of Apple Inc. from 2005-03-01 to 2014-06-02. data source. I have done the following procedures to fit the appropriate model and do forecasting. The modeling was done on the Opening price.
* Stationarity and data transformation
* Initial models nomination
* Model comparison and final model selection
* Assumption verification
* Forecasting
libs <- c("ggplot2","lubridate","astsa","tseries","forecast")
lapply(libs,require, character.only= TRUE)
library(repr)
options(repr.plot.width=7,repr.plot.height=4)
AppleMonthlStack <- read.csv("C:/Users/selamina/...data.csv")
AppleOpenPrice=cbind.data.frame(Date=mdy(AppleMonthlStack[,1]),OpenPrice=AppleMonthlStack[,2])
AppleOpenPriceFinal=AppleOpenPrice[c(183:294),] # 112 Total Observation begining 2005-03-01
tailForecast=tail(AppleOpenPriceFinal,10)# last 10 obs for forecast
ggplot(AppleOpenPriceFinal, aes(Date, OpenPrice))+ geom_line()+
labs(y="stock Price (USD)",x="monthly Stock: 2005-03-01 to 2014-06-02",title="From 2005-03-01 to 2014-06-02")+
theme(axis.text.x=element_text(angle = 45, size = 14,face="bold"))+
geom_line(data=tailForecast,aes(Date,OpenPrice),color='red',size=1.5) # just red once will be kept for forecasting
AppleOpenPrice=ts(head(AppleOpenPriceFinal[,2],-10)) # keeping the bottom 10 for Forecasting
The last part, red colored in the time series plot, is kept for forecast comparison.
adf.test(AppleOpenPrice, alternative="stationary")
The adf.test results a p-value of 0.5055, which indicates as the original data is not Stationary at 5% level of significance.
par(mfrow=c(1,2))
acf(AppleOpenPrice,100);pacf(AppleOpenPrice,100)
The slow decline in the ACF plot indicates the existence of correlation among sequence of observations and also the immediate cut off in the PACF plot confirms that. Thus, the origional data is not Stationary.
Clearly the general trend of Apple Stock price in the time series plot indicates an upward parabola. Increasing at an exponential rate. This shows that we could take log transformation with or with out lag 1 differencing to stablize and make the time series stationary.
AppleOpenPricelog=log(AppleOpenPrice)
plot(AppleOpenPricelog,main="Log of the Open price")
AppleOpenPriceD1 = diff(AppleOpenPrice, 1)
plot(AppleOpenPriceD1,main="Lag 1 differenced Open Price")
AppleOpenPricelogD1 = diff(AppleOpenPricelog, 1)
plot(AppleOpenPricelogD1,main="Lag 1 difference of log(Open Price) ")
As we can clearly see, the log transformed data is not stationary in mean. The lag one difference is not stationary in variance. That is why I decided to combine both, lag 1 differencing for the $log(open-Price)$. Stationarity test after lag-1 difference of the $log(open-Price)$ is done below.
adf.test(AppleOpenPricelogD1, alternative="stationary")
par(mfrow=c(1,2))
acf(AppleOpenPricelogD1,100);pacf(AppleOpenPricelogD1,100)
The adf.test and both ACF and PACF confirms that the transformed data is stationary both in mean and variance.
par(mfrow=c(1,2))
acf(AppleOpenPricelogD1,100);pacf(AppleOpenPricelogD1,100)
From the ACF and PACF plot, its clear that the transformed data is White Noise.
options(repr.plot.width=7,repr.plot.height=6)
Model1p0d1q0=Arima(AppleOpenPricelog, order=c(0, 1, 0))
tsdiag(Model1p0d1q0)
Model1p0d1q0
As we can see in the time series diagram output, our model is a valid or acceptable for the following reasons.
1. The standardized residuals plot support a random residuals with centre or mean Zero.The logarithm of Apple open price for selected month is on average equals with the log of the previous month open price. The variability is only a random factor of average zero, with in the study time period. Logically make sense, stock price hardly change at monthly level. Of course the model could be way different if we had taken weekly or daily price value.
options(repr.plot.width=7,repr.plot.height=3)
fittedActual=cbind.data.frame(fit=as.numeric(exp(fitted(Model1p0d1q0))),Date=head(AppleOpenPriceFinal[,1],-10),
OpenPrice=as.numeric(exp(AppleOpenPricelog)))
ggplot(fittedActual, aes(Date, OpenPrice)) + geom_line() +
labs(y="stock Price (USD)",title="monthly Stock: 2005-03-01 to 2014-06-02",x="Black line=Actual Values; Red line= Fitted values")+
theme(axis.text.x=element_text(angle = 45, size = 14,face="bold"))+geom_line(data=fittedActual,aes(Date,fit),color='red',size=1)
res=as.numeric(Model1p0d1q0$residuals)
plot(Model1p0d1q0$residuals, main="Residuals Plot")
qqnorm(res,pch=16)
qqline(res,col="red",lwd=4,lty=2)
message("Compute the Box-Pierce or Ljung-Box test statistic for examining the null hypothesis of independence
in a given time series.")
Box.test(Model1p0d1q0$residuals, lag=1)
According to the residual plots, all the assumptions are significant or acceptable.
forecast=forecast.Arima(Model1p0d1q0, h=10, level=c(99.75))
forecast=exp(data.frame(forecast))
forecastOriginal=cbind(tailForecast,forecast)
ggplot(forecastOriginal, aes(Date, OpenPrice)) + geom_line() +
labs(y="stock Price (USD)",title="Apple Stock price forecast for 10 consecutive months",x="Black line=Actual Values;Green
line=Forecasted Values; Red line= 95% confidence intervale")+
theme(axis.text.x=element_text(angle = 45, size = 14,face="bold"))+
geom_line(data=forecastOriginal,aes(Date,Lo.99.75),color='red',size=1)+
geom_line(data=forecastOriginal,aes(Date,Hi.99.75),color='red',size=1)+
geom_line(data=forecastOriginal,aes(Date,Point.Forecast),color='green',size=1)
forecastOriginal
The equal point estimation for the forecast is due to the fact that our selected model is random walk. When we see the point forecast estimate, all are under estimating the true value slightly. However, all the forcast and the true values are under the 95 percent confidence range.