Archive for the 'Math' Category

Weighted Least Squares and Why More Data is Better

Friday, September 28th, 2007

In doing statistics, when should we weight different observations differently?

Suppose I have 10 independent observations of $x$ and I want to estimate the population mean, $\mu$. Why should I use the unweighted sample mean rather than weighting the first observation .91 and each of the rest by .01?

Either way, I get an unbiased estimate, but the unweighted mean gives me lower variance of the estimator. If I use just observation 1 (a weight of 100% on it) then my estimator has the variance of the disturbance. If I use two observations, then a big positive disturbance on observation 1 might be cancelled out by a big negative on observation 2. Indeed, the worst case is that observation 2 also has a big positive disturbance, in which case I am no worse off by having it. I do not want to overweight any one observation, because I want mistakes to cancel out as evenly as possible.

All this is completely free of the distribution of the disturbance term. It doesn’t rely on the Central Limit Theorem, which says that as $n$ increases then the distribution of the estimator approaches the normal distribution (if I don’t use too much weighting, at least!).

If I knew that observation 1 had a smaller disturbance on average, then I *would* want to weight it more heavily. That’s heteroskedasticity.

Bayesian vs. Frequentist Statistical Theory

Tuesday, September 25th, 2007

The Frequentist view of probability is that a coin with a 50% probability of heads will turn up heads 50% of the time.

(more…)

Asymptotics

Monday, September 24th, 2007

Page 96 of David Cox’s 2006 Principles of Statistical
Inference has a very nice one-sentence summary of asymptotic theory:

[A]pproximations are derived on the basis that the amount of
information is large, errors of estimation are small, nonlinear
relations are locally linear and a central limit effect operates to
induce approximate normality of log likelihood derivatives.

The Odds Ratio in Biased-Sample Case-Control Studies

Thursday, September 13th, 2007

I learned something this morning. When you’re trying to estimate the impact of a two-valued X on some two-valued Y with Y=1 being a rare event, you can get an unbiased estimate of the relative risk, Pr(Y=1|X=1)/Pr(Y=1|X=0), even if your sample is biased because you oversampled mainly Y=1 observations. This is not just restricted to logit estimation either. I learned this reading The analysis of case-control studies by NE Breslow and NE Day, but I have a writeup at http://www.rasmusen.org/x/2007/oddsratio.pdf that is much clearer.

Statistics Jokes

Tuesday, August 7th, 2007

Here are the best (in some cases reworded) from GARY C. RAMSEYER’S FIRST INTERNET GALLERY OF STATISTICS JOKES, which is not so selective: (more…)

Writing Propositions

Monday, June 4th, 2007

Which is the best way to write this proposition?

(1) “If x =2z, then y = 3z; and if x =2v, then y = 3g.”

(2) “If x =2z then y = 3z, and if x =2v then y = 3g.”

(3) “If x =2z, then y = 3z and if x =2v, then y = 3g.”

(4) “If x =2z, then y = 3z, and if x =2v, then y = 3g.”

Asymptotics

Tuesday, April 24th, 2007


I’m thinking a lot about sampling these days. The idea of classical hypothesis testing is that we choose a null hypothesis and a testing scheme, and ask how often we would accidentally reject the null hypothesis if we took repeated samples and if the null were actually true. The null might be that the mean of the population is 0. The testing scheme might be that we take a sample of N=100 observations and we reject the null if the mean of that sample is less than -1 or greater than +1. The significance level of the test, the probability that we falsely reject the null, might be 14%. That means that if we took 10,000 100-observation samples, and the true mean is M=0, we would expect about 1,400 sample means to be outside of the [-1,+1] interval.

Any null hypothesis takes some set of background facts as given, as assumptions that aren’t tested. In the case above, these assumptions might include that (a) the population is distributed according to some shape f(x) around the mean M, and we know the entire shape, just not the value of M, (b) the observations are chosen independently (which also means they are chosen with replacement, I think), and (c) the shape f(x) does not change during the course of our sampling. If we wanted, we could instead take M=0 as an assumption and test statement (b) instead, or use some other combination of assumptions and null.

Well, I haven’t even gotten to asymptotics yet, but I’m out of time. I’ll have to continue this another day. Asymptotics concern the properties of a testing scheme when the sample size N is large enough that the Central Limit Theorem can be called into play and estimator errors are close to being normal even if the underlying distribution shapes are not.

Statistical Tests Online

Saturday, April 7th, 2007

I found a good free site, “Statistical Tests.” I was trying to find one so I could recommend it to an undergraduate in G492. The two relevant to me were for a binomial test comparing a sample to a population proportion and a chi-squared test comparing two sample proportions.

Two Math Jokes: Tangent and Deviation

Tuesday, April 3rd, 2007

t these jokes from The Volokh Conspiracy and then improved them. (more…)

Three Calculus Jokes

Monday, April 2nd, 2007

I got these jokes from The Volokh Conspiracy and then improved them. (more…)

Granger Causality

Tuesday, March 20th, 2007

I learned today that the idea of “Granger Causality” is not to learn whether variable A causes variable B, but to learn whether if variable A increases in period t we should forecast that B will increase in period (t+1). For example, we might run regressions on Health and Wealth across states:

Ht = alpha + beta*Wealth(1-t) + theta*H(t-1)

and

Wt = gamma + delta*Health (t-1) + mu*W(t-1).


If beta is significant, we’d conclude that Wealth Granger-causes Health. In fact, it may be that it is IQ that causes both Wealth and Health and as an omitted variable is responsible for beta being significant. That’s OK for Granger causality. If we see a state’s wealth increase, we should expect its health to increase. It’s OK to have a biased regression because of omitted variables.

We have to be careful not to think of this as real causality, though. If we kept all other variables the same, including IQ, but we increased the Wealth in a State, the Health would not rise. The reason that usually if we see an increase in Wealth we can predict an increase in Health is that an increase in Wealth is a sign of an increase in IQ, and it is the IQ that will increase Health.

Thus, Granger causality is useful only for positive prediction, not for normative policymaking.

“Avoiding Invalid Instruments and Coping with Weak Instruments”

Wednesday, January 31st, 2007

Here are some notes on “Avoiding Invalid Instruments and Coping with Weak Instruments” by Michael P. Murray, Journal of Economic Perspectives, Volume: 20 | Issue: 4 Fall 2006 111-132

(more…)

Lagged Variables in Time Series Cross Section

Saturday, January 6th, 2007

Suppose lagged Y is in a time series cross section regression like this:

y_{it} = \alpha_i + \gamma y_{i,t-1} + e_{it}

Is OLS consistent?

Yes, I think, but this is
a nice setting for thinking about what “consistency” means. If we replicated the same 10 years and 50 industries 1,000 times, with new disturbances each time, the Within Groups estimator would get better and better, I think. What is more natural is to think of going to 1,000 years and 1,000 industries, and that gets better too.

But going to 10 years and 1,000 industries does not make the bias get any smaller. This, I think, is what Nickell (1981, Econometrica) says.

And in a small sample, estimators are often biased, which we forget. When we only have 10 observations for something– the years here– the bias can be pretty serious.

Though, actually, maybe there’s not a bias, just inconsistency. This seems to be a Gauss-Markov Theorem BLUE situation. Maybe ex ante there is no way to know which direction the mistaken estimation will go, positive or negative.

Propensity Score Models

Sunday, December 24th, 2006

I just saw a reference to “propensity score models” for “the effect of treatment
on the treated” which it seems are used by
many economists. Sascha Becker at Munich has
the best explanation I
could find in quick search. I could not understand it by reading it, but by thinking
I think this is the situation: (more…)

INSTRUMENTAL VARIABLES WITHOUT ENDOGENEITY OR INFINITE SAMPLES

Monday, December 18th, 2006

I just realized that instrumental variables could be useful in a context
without endogeneity or large samples: to add datapoints.

Suppose

y = beta X + U, t= 1,…T

(more…)

Why Least Squares, Not Least Absolute Deviations?

Thursday, November 2nd, 2006

In explaining regression lines to novices, I always use the approach of saying that in trying to relate two variables, we can plot them on a graph, and then a line through the middle of the dots shows the relation between them. We want to make “through the middle” precise, though, so what we do is have a computer find the line that minimizes the square of the distances from each point vertically up or down to the line. (more…)

Wald Tests of Group Significance and of Exogeneity

Wednesday, October 11th, 2006

Somebody asked me about Wald tests, and I thought I’d write up my answer as a weblog entry, so someone can correct me if I’m wrong. Here’s some STATA output: (more…)

Is Division Evil?

Tuesday, October 10th, 2006

I went to a class session lecture by Leo Strauss’s co-author Joseph Cropsey back around 1989. It was on Plato’s Protagoras. Cropsey talked as if it were a one-sided conversation, and a lot of what he talked about was Irrational numbers and the Incommensurable. An irrational number is one that can’t be written as a fraction— the square root of 2, or pi, rather than 1/4 or 1/3. Thus, an irrational number is not a ratio, not rational, and is sort of in a different world. I’m not sure if what he said made any sense, but it did make an impression.

Today in the car I was telling Elizabeth and Amelia how two times ten dollars equalled twenty dollars, a thought they liked. I went on to ask what 20 divided by 2 was. Amelia said she didn’t know what “divided by” meant, and that she hadn’t done division yet. I explained. She said that Division was Evil. If Amelia was divided by two, she’d be two dead halves. If we divided two sisters by two, there is just one on each side. It seems they don’t like each other, or are forcibly separated. So Amelia doesn’t like division.

That’s a nice story. Is there some profundity at the bottom of it?

Caring About Probabilities of Probabilities

Tuesday, September 26th, 2006

Here are some notes on expected utility theory that I wrote up after a stimulating lunch. Suppose Jack might earn $100, $200, $300, or $400, with equal probabilities, an expected wealth of $250. His information partition is (100,200,300,400)– he can’t rule out anything.

Jack is very risk averse, so his utilities from known wealths are

U(100) = 0 - e
U(200) = 100 - e
U(300) = 120 - e
U(400) = 128 - e, (more…)

Is there a Closed Unbounded Set?

Thursday, September 7th, 2006

I just pinned down whether the set of all real numbers from -infinity to +infinity is a closed set or an open set. It is both, a “clopen set”. It is both a closed and an open interval, so we can write either (0, infinity) or (0, infinity] as notation for the positive numbers. The difference is in whether we want to include “infinity” as an object in the set. Mathworld tells me that a closed set is one that contains its limit points (which are *not* necessarily its endpoints). Thus, (0, infinity] does not have to contain infinity itself. An open set is one such that a tiny ball around any point is still entirely within the set, and the set of all numbers from -infinity to +infinity also satisfies that definition. More generally (needing a more general definition too: that a small movement won’t take you out of the set into some “meta-set”), the null set and the whole space are clopen, as Wikipedia tells us.

I was interested in this because I wondered if there could be a closed unbounded set. Yes: an example is [-infinity, +infinity]. But it is also an open set.


Bad Behavior has blocked 560 access attempts in the last 7 days.