Difference between revisions of "Statistics"

From Rasmapedia
Jump to navigation Jump to search
(Data Visualization)
(Sample Size)
 
Line 50: Line 50:
  
 
==Sample Size==
 
==Sample Size==
It's curious that with both few observations (n=30) and many (n= 3 million), statistical significance is too lax a standard. Has anyone thought about those problems in conjunction?
+
*It's curious that with both few observations (n = 30) and many (n = 3 million), statistical significance is too lax a standard. Has anyone thought about those problems in conjunction?
 +
(Later: What do I mean here? perhaps that with n -= 30, normality of the distribution can't be relied upon? )
  
 +
*I'm still puzzled by what the Big Data Paradox is supposed to be. Suppose we have a sample of 10,000 and a sample of 100,000 out of 10 million  both biased samples. Neither sample has much sampling error. so their confidence intervals are very small, and you'd be equally misled in both if you know statistics. On the other hand, if you don't know know statistics, you'd think the 100,000 sample was a lot better. Is that (a) the Big Data Paradox?
 +
      Or is it that if you have a highly biased samples of 10 and 10,000, that (b) you know your results are unreliable in the sample of 10, but you don't know it in the sample of 10,000, if you calculate the confidence intervals?
  
 
==Teaching Statistics==
 
==Teaching Statistics==

Latest revision as of 15:28, 26 December 2021

Confidence Levels

95% statistical confidence levels make sense for scholarly work. Has any scholar ever argued they make sense for real world decisions? Often, a 10% is fine--really, 1%, with a one-side test. But I bet the FDA uses them for drug approval. Has anyone written on this?

Another application is to child molesting. 95% is probably OK for criminal juries--"no reasonable doubt". We use 51% for the civil suit. We properly use 5% for whether the guy should help in the church nursery.

``Managerial Conservatism and Rational Information Acquisition, Journal of Economics and Management Strategy (Spring 1992), 1(1): 175-202. Conservative managerial behavior can be rational and profit- maximizing. If the valuation of innovations contains white noise and the status quo would be preferred to random innovation, then any innovation that does not appear to be substantially better than the status quo should be rejected. The more successful the firm, the higher the threshold for accepting innovation should be, and the greater the conservative bias. Other things equal, more successful firms will spend less on research, adopt fewer innovations, and be less likely to advance the industry 's best practice.


Randomization Tests vs. t-Tests



Data Visualization

  • You only have to have this experience of your plotting requests being ignored a few times to realize that you can’t expect to be able to say once, Hey how about plotting all the data first?, in a normal tone, and expect everyone to put aside the more complicated stuff they want to do and make plots. You have to repeat it, again and again, probably loudly, until you seem like some sort of graphing extremist.

--https://statmodeling.stat.columbia.edu/2021/01/27/tukeyian-uphill-battles/

  • Josh Wills

@josh_wills

People who don't know any statistics just look at the data.

People who know some statistics run hypothesis tests, compute confidence intervals, etc.

People who know lots of statistics just look at the data.


Humor


Instrumental Variables

[email protected]

is our finest poet.


Supply and demand:
without a good instrument,
not identified.

No: poems should rhyme if they aren't in kanji. .
Supply goes up.
Demand goes down..
Without an instrument.
They can't be found..
.
See http://rasmusen.org/published/blp-rasmusen.pdf


Sample Size

  • It's curious that with both few observations (n = 30) and many (n = 3 million), statistical significance is too lax a standard. Has anyone thought about those problems in conjunction?
(Later: What do I mean here? perhaps that with n -= 30, normality of the distribution can't be relied upon? )
  • I'm still puzzled by what the Big Data Paradox is supposed to be. Suppose we have a sample of 10,000 and a sample of 100,000 out of 10 million both biased samples. Neither sample has much sampling error. so their confidence intervals are very small, and you'd be equally misled in both if you know statistics. On the other hand, if you don't know know statistics, you'd think the 100,000 sample was a lot better. Is that (a) the Big Data Paradox?
     Or is it that if you have a highly biased samples of 10 and 10,000, that (b) you know your results are unreliable in the sample of 10, but you don't know it in the sample of 10,000, if you calculate the confidence intervals?

Teaching Statistics

The Amanda Middlebury stats simulations

"Statistical fallacies as they arise in political science (from Bob Jervis)", a set of paragraph-long questions in the style of Fischer's Black's Finance Questions. .