Considerations when Working with MLR
Multiple linear regression is a very powerful statistical tool. However to use it effectively you need to have statistical experience. This topic is intended to remind you of issues to be aware of.
Are the Prerequisites for MLR Fulfilled?
The assumptions of an MLR model are as follows:
If there another kind of relationship between the two variables, using MLR will result in incorrect results. For instance if Y = b
Nonstochastic variables = fixed variables.
Stochastic variables = variables drawn at random from a probability function.
If this is the case the variables are said to be collinear. The effect is called multicollinearity and is the case when two supposedly independent variables convey the same information, that is they are not independent.
The following model for a person’s salary (Y) is proposed:
where X1 is the average hours worked a day and
This is obviously nonsense. It is not possible to carry out an MLR analysis, as it is not possible to change one independent variable without changing the other. This is called perfect multicollinearity. In practice, the effect is not perfect and more difficult to detect.
This is known as autocorrelation or serial correlation. This often occurs in time series. Here it means that errors in an estimate in one period lead to errors in periods in the future. One measure for autocorrelation is the Durbin-Watson test One measure for autocorrelation is the Durbin-Watson test or if the dependent variable is lagged the Durbin-h test.
If the error variance is not constant one speaks of Heteroscedasticity. This means that the size of the error is dependent on the independent variable itself. This leads to a bias in the results and means that the best fit is not found. To check for heteroscedaticity plot the error against the dependent variable. No trend should be evident.
If these conditions are not fulfilled or only partially, MLR is possibly not the correct forecasting method.
Modifying Historical Data
It is possible that there is only a linear relationship between an independent variable and the dependent variable within a particular range of values. For instance in the graph below Y is constant until approximately X=4.
In this case, it would be advisable to change the X values for forecasting purposes so that the minimum value is 4. You could use a macro to do this.