%% Introduction
% Brock University - Economics 4F10
% Practical Worksheet #2
% The following worksheet demonstrates how to perform diagnostic tests for
% a linear regression model of the form y = X*beta + u, u~N(0,sigma2), in
% MATLAB, where for simplicity% X = nX3, and b = 3x1 (i.e. three regressors,
% including the constant), and how to calculate the coefficient of
% determination (the R2).
% Papers cited in comments:
% White, H. (1980). A heteroskedasticity-consistent covariance matrix
% estimator and a direct test for heteroskedasticity. Econometrica: Journal
% of the Econometric Society, 48(4), 817–838.
% Engle, R. "Autoregressive Conditional Heteroscedasticity with Estimates
% of the Variance of United Kingdom Inflation. Econometrica. Vol. 96, 1988,
% pp. 893-920.
% Durbin, J., and Watson, G., "Testing for Serial Correlation in
% Least-Squares Regression," Biometrika, vol. 38, 1951, pp. 159–171.
%% Load data and set parameters
clear;
load SampleRegressionData; % load data (same as in Practical Worksheet #1)
n = size(X,1); % set number of observations equal to the # of rows in X
k = size(X,2) + 1; % set number of coefficients to # of columns in X plus constant
%% Estimate coefficients vector using OLS and its covariance matrix
% Steps are the same as in Practical Worksheet #1
xmat = [ones(n,1) X];
betahat = inv(xmat'*xmat)*(xmat'*y);
yhat = xmat*betahat;
uhat = y - yhat;
sigma2hat = (uhat'*uhat)/(n-k);
covbetahat = sigma2hat*inv(xmat'*xmat);
%% Estimate the coefficient of determination (the R-squared)
% Recall that R^2 is defined as the ratio of explained sum of squares (ESS)
% over total sum of squares (TSS).
TSS = sum((y-mean(y)).^2); % find TSS
ESS = sum((yhat-mean(y)).^2); % find ESS
R2 = ESS/TSS; % find coefficient of determination
disp(R2); % display the R^2
%% Do diagnostic test #1: White's test for conditional heteroskedasticity
% Recall that the test of White (1980) for conditional heteroskedasticity
% involves regressing the squared residuals on all regressors included in
% X, their squares and cross-products, using an auxiliary regression.
% First, define the new data matrix containing regressors for auxiliary
% model, and the new dependent variable:
xmat_whites_test = [ones(n,1) X X.^2 X(:,1).*X(:,2)];
ymat_whites_test = uhat.^2;
% Next, estimate the auxiliary model using OLS:
betahat_whites_test = inv(xmat_whites_test'*xmat_whites_test)*...
(xmat_whites_test'*ymat_whites_test);
% Recall that the test statistic for the test of White (1980) is n*R^2,
% where R^2 is the coefficient of determination from the auxiliary
% regression, and n is the sample size as before.
R2_auxiliary_regression = sum((xmat_whites_test*betahat_whites_test - mean(ymat_whites_test)).^2)/...
sum((ymat_whites_test - mean(ymat_whites_test)).^2); % R^2 from auxiliary regression
whites_test_statistic = n*R2_auxiliary_regression; % White's test statistic
% Recall that under H0 of homoskedasticity, this statistic asymptotically
% follows a Chi-Squared distribution with (p-1) degrees of freedom, where
% p is the number of columnts in the auxiliary regression data matrix.
% To carry out the test, look up corresponding cricical value, and reject
% if whites_test_statistic > critical value.
% If you have MATLAB's Statistics Toolbox installed, you can obtain the
% critical value using the chi2inv() function (see help for more info.):
level_of_significance = 0.01;
whites_critical_value = chi2inv(1-level_of_significance,...
size(xmat_whites_test,2) - 1);
disp('Whites test statistic:');
disp(whites_test_statistic);
disp('Whites critical value:');
disp(whites_critical_value);
% Note: if you have Econometrics Toolbox, you can also test for residual
% heteroskedasticity using the ARCH test of Engle (1988). See MATLAB help
% for archtest() function.
%% Do diagnostic test #2: the Durbin-Watson d-test for serial correlaiton
% Recall that the d-test of Durbin and Watson (1951) can be used to test
% for the presense of serial correltion among regression residuals. If
% present, this can lead to bias and inefficiency of OLS estimates.
% Recall that the DW-statistic was simply the ratio of the sum of squared
% differences between successive residuals, and the residual sum of
% squares, the SSR.
dw_statistic = sum((uhat(2:end)-uhat(1:end-1)).^2)/sum(uhat.^2);
disp('Durbin-Watson Statistic:');
disp(dw_statistic);
% Recall that the limiting distribution of DW statistic is a bit
% complicated, and involves rejection, non-rejection and indecision
% regions. Here, the statist falls in nonrejection region, and hence we do
% not reject the null of no auto-correlation.
% Note: if you have Statistics Toolbox, you can also run the DW test using
% the dwtest() function. See help for more info.