Considerations when Working with MLR

Multiple linear regression is a very powerful statistical tool. However to use it effectively you need to have statistical experience. This topic is intended to remind you of issues to be aware of.

Are the Prerequisites for MLR Fulfilled?

The assumptions of an MLR model are as follows:

• There is a linear relationship between an independent variable (Xi) and the dependent variable Y.

If there another kind of relationship between the two variables, using MLR will result in incorrect results. For instance if Y = b i Xin or Y = b I sin XI, MLR will lead to erroneous results. If you know the form of such a non-linear function, it should be possible to transform it to a linear form, for instance by creating an auxiliary variable (could be a key figure or time series) Z = XI, which results in Y= b i Z – a linear relationship.

• The Xs are non-stochastic.

Nonstochastic variables = fixed variables.
Stochastic variables = variables drawn at random from a probability function.

• No exact linear relationship exists between two or more of the explanatory variables.

If this is the case the variables are said to be collinear. The effect is called multicollinearity and is the case when two supposedly independent variables convey the same information, that is they are not independent.

The following model for a person’s salary (Y) is proposed:

Y= b 0 + b 1.X1 + b 2 X2 +e i

where X1 is the average hours worked a day and
X
2 the average hours worked in a week.

This is obviously nonsense. It is not possible to carry out an MLR analysis, as it is not possible to change one independent variable without changing the other. This is called perfect multicollinearity. In practice, the effect is not perfect and more difficult to detect.

• Errors corresponding to different observations are independent and therefore uncorrelated.

This is known as autocorrelation or serial correlation. This often occurs in time series. Here it means that errors in an estimate in one period lead to errors in periods in the future. One measure for autocorrelation is the Durbin-Watson test One measure for autocorrelation is the Durbin-Watson test or if the dependent variable is lagged the Durbin-h test.

• The error variable is normally distributed, with 0 expected value and constant variance for all observations.

If the error variance is not constant one speaks of Heteroscedasticity. This means that the size of the error is dependent on the independent variable itself. This leads to a bias in the results and means that the best fit is not found. To check for heteroscedaticity plot the error against the dependent variable. No trend should be evident.

If these conditions are not fulfilled or only partially, MLR is possibly not the correct forecasting method.

Modifying Historical Data

It is possible that there is only a linear relationship between an independent variable and the dependent variable within a particular range of values. For instance in the graph below Y is constant until approximately X=4.

In this case, it would be advisable to change the X values for forecasting purposes so that the minimum value is 4. You could use a macro to do this.