Tobit, the Heckman Correction and Censoring
I will here talk about three topics in econometrics: 1. Why OLS is biased when Y cannot exceed 10. 2. Why the Heckman correction corrects for that. 3. Why OLS is biased when Y cannot exceed X. This complements my earlier post on topic 1.
1. Why OLS is biased when Y cannot exceed 10.
Suppose Y = Min (beta*X + U, 10)
For example, a child’s test score (Y) cannot exceed 10, no matter what his age, X.
Suppose, too, that we do not have any observations of X (a) near 0 (where there would be natural bound also), or of X (b) big enough that we think beta*X > 10. Because of (b), the problem is not that of using a linear technique to estimate a nonlinear relationship.

Figure 1 shows what happens. The closed dots (except those on the X-axis) are the observations, which I’ve drawn as one with zero disturbance, one with disturbance of postive Z_i, and one with negative Z_i for each value X_i. The positive disturbances would lead to values of Y greater than 10, at where the open dots are, so they come out as 10 each time. As a result, the disturbances have mean less than zero, not mean zero, and the OLS line through the middle of the observed dots leads to an UNDERESTIMATE of beta (note that we can sign the bias).
Note that this is NOT a problem of trying to approximate a two-spline by a single straight line. The data doesn’t take us into the region where the true expectation of Y given X equals 10.
There is an additional complexity, though. The OLS slope is below the true slope. Thus, if an observation had zero disturbance, the OLS equation would underpredict the Y value. If what we are interested in is not estimating beta, but in predicting Y, however, conditional on a given value of X, I think the OLS equation might do just fine (I’m not sure, but it’s plausible). The reason is that the observed disturbance– perhaps we should call it V instead of U— is not zero on average— it is negative, with the negative amount growing with X and depending on the variance of U. Thus, EY|X does not equal beta*X, but something less than beta*X.
(I’m not sure OLS is unbiased even for prediction, though. The expected value of Y is changing with X, but is it changing linearly? No, I think, if the distribution of U is not uniform. As X gets bigger, at some point 1-beta*X hits the “bulge” in the normal distribution, and the constraint starts to bit more.)
When X goes up, there are two effects on Y. First, there is the increase from beta*dX. Second, there is the decrease from V having a more negative expected value.
You might say, “OK, the OLS beta is not the theory beta, but it is even better, because it is better for prediction than the theoretical beta. Who cares whether the change in Y is from dX or from dEV?”
One answer is that the source of the change in Y matters more when there are multiple exogenous variables. Suppose we had both X and W on the right-hand-side. Then, the effect of an increase in X would on beta and on dEV, as before. But dEV depends on the size of W, because if W is big, the predicted value of Y is near the bound of 10 and U gets constrained more. As a result, the overall effect of a change in X, from beta*dX and from dEV, depends on W.
A second answer is that even if OLS did give you an unbiased, least-squares, prediction, it would not be an efficient prediction. We know there are going to be lots of Y=10 values. Thus, it would be better to have an estimator that gave you the probability of a Y=10 rather than just a single estimated value.
A third answer is that even if OLS gives a good parameter estimate, it will give bad standard errors and significance statistics. That’s because the disturbance term has an asymmetric distribution and its mean isn’t zero.
2. The Heckman Correction
The Heckman correction was developed for situations of truncation, where some observations vanished completely from the dataset, in a nonrandom way. It can be adapted to the present problem, however. That’s what Jeff Campbell suggested in response to my earlier post.
The idea is this. The OLS bias arises because the disturbances have non-zero means, and in fact have a nonzero mean that varies with X. So, let’s correct for that. We can estimate the disturbance mean conditional on X, and then instead of minimizing the sum of deviations of Y from betahat*X, we will minimize the sum of deviations from (Y-disturbance mean|X) from betahat*X. In effect, we’re trying to replace the Y=10 values with Y=betaX+U values. The first stage of the Heckman two-step estimator estimates EU|X, and the second stage runs OLS with Y adjusted by that EU|X.
We should think about using the Heckman correction whenever the problem is a nonzero mean of the disturbance term.
Some old class powerpoints on the Heckman correction are here.
3. Why OLS is biased when Y cannot exceed X.
My problem in the earlier post was not tobit, but something a bit more complicated, where the bound of Y is not constant, at 10, but varies with X, as when it is X itself. (For example, maybe Y is consumption and X is income.) Figure 2 shows the situation.

Again, the problem is that the big positive errors get cut down because of the limit of Y=X. The V disturbance will have a negative expected value. The OLS line will have too small a slope. In my earlier post, Jeff Campbell showed how to use a Heckman correction.