Estimation of Censored Regression Model

Economics 4F10 - Brock University. The first part of this code shows the effect of censoring on OLS estimates. The second part shows how to obtain unbiased estimates of the coefficients from a censored linear regression model using MLE (assuming Gaussian errors).

Contents

Clear memory and generate regression data

clear;

n = 1000; % sample size
sigma2 = 0.2; % variance of u
true_betas = [1;2]; % true model coefficients
X = randn(n,1); % regressor
y = [ones(n,1) X]*true_betas + randn(n,1)*sqrt(sigma2); % generate y

scatter(X,y) % generate scatter plot of uncensored data

Estimate betas using OLS and uncensored data

xmat = [ones(n,1) X]; % data matrix for OLS
beta_hat = inv(xmat'*xmat)*(xmat'*y) % estimate coefficients using OLS

% Note that our estimates are very close to true parameter values.
beta_hat =

    1.0063
    1.9942

Censor data and re-estimate the model using OLS

y_censored = y.*(y>=0); % replace all values of y that are smaller than 0 with zeros
beta_hat_censored = inv(xmat'*xmat)*(xmat'*y_censored) % re-estimate the model using censored data

% Note that there's now substantial bias in OLS estimates.
beta_hat_censored =

    1.4360
    1.3833

Estimate the censored model using MLE

% Define the log-likelihood function

loglik = @(params) -(sum(log(normcdf(zeros(sum(y<=0),1),params(1) + params(2)*X(y<=0),ones(sum(y<=0),1)*params(3)))) + ...
sum(log(normpdf(y(y>0),params(1) + params(2)*X(y>0),ones(sum(y>0),1)*params(3)))));

% Run MLE

fminunc(loglik,[1.5;2.5;0.5])

% Note that estimates are once more very close to true values, despite
% censoring.
Warning: Gradient must be provided for trust-region algorithm; using
quasi-newton algorithm instead. 

Local minimum found.

Optimization completed because the size of the gradient is less than
the default value of the optimality tolerance.




ans =

    1.0294
    1.9648
    0.4614