My developed Algorithms for Forward, Backward and Stepwise variable selection during Multiple Linear regression using P-value only.


I was doing multiple linear regression project on R with larger continuous and categorical variable size, when I found myself wasting time to do stepwise variable selection. Searching a significant variable to add into the model and then look for dropping a variable and so on in each step. It was very tiring. I could have used the preexisting step() from stats package or regsubsets from leaps package. However, these methods use AIC to do variable selection. AIC is one among the acceptable methods used to do model comparison or variable selection. However, AIC is way complicated to interpret and to even explain it for non-technical person. In addition, we pick a model with smaller AIC but how much smaller is small AIC? That is why I decided to write these algorithms which only evaluates the p-value to add or drop a variable in each case of forward, backward or stepwise variable selection procedures.


Advantage of using p-values to do variable selection


Thus, it's a great benefit to use the algorithm I prepared which does variable selection using P-value. Each method of variable selection algorithms, (backward, forward and stepwise), are being written separately. Would have been best if it is just one package for the three, but not yet. I am working on it.


Instruction to use the algorithms


Algorithms

Forward Variable Selection algorithm using forwardSelection(data,"responceVariable",alphaToEnter)
Backward Elimination algorithm using backwardElimination(data,"responceVariable",alphaToRemove)
Stepwise Variable Selection algorithm using StepwiseAlgorithm(data,"responceVariable",alphaToEnter,alphaToRemove)


More to come...