Last updated: July 6, 1998/5 Feb., 2007.

Charles Murray has provided the data he used in the analysis in his book, The Bell Curve . The actual data files are in the low megabyte range, and are available in two formats. Please note that the data includes weights for each observation, because the survey from which it comes sampled different groups with different weights. Also, some of the data is in the form of `z-scores', which means that is is measured as standard deviations away from a mean of zero.

If you encounter problems reading the data, please let me know. I probably can't help, since I haven't been using this data in the past few years, but you never know. If you solve your problem, let me know about that too, so I can post the solution.

Data Files From Charles Murray, in His Format
Files are saved as Macintosh text files with labels, tab indicating end of field, and CR indicating end of line.
• NATION.TXT, which has 12,686 cases and 50 variables. This file includes variables scored for all NLSY subjects, one line per subject. Size: 3.112MB.
• CHILD1.TXT, which has 8,513 cases and 26 variables. Variables scored for all NLSY children, representing one case per child for whom data were available through SY90. Size: 1.312MB.
• CHILD2.TXT, which has 17,040 cases and 40 variables. Each case represents one child for one test year. A given child may therefore be represented in up to three cases. TY=test year. Percentiles on the developmental and behavioral indicators all represent within-gender percentiles. Size: 3.040MB.
• WOMEN.TXT, which has 6,283 cases and 28 variables. Variables scored for all women in the NLSY (one case per subject). Size: 1.032MB.
• The Documentation is available in a number of forms. The original is a 51K file, 1TBC_Documentation5.rtf. You can get the same thing in Word in 33K at 3TBC_Documentation.doc, or in Ascii in 25K at 2TBC_Documentation.ascii. Finally, you can get a 15K version describing just the NATION variables at 1TBC_Nation.Documentation5.rtf.

Data Files As I Modified Them, with Commas Separating the Entries
I used the EXCEL spreadsheet to change the format into one I could use more easily, and made a few other small changes. The output are csv files, with each entry separated by a comma.

Some Regression Files Using STATA
I like the STATA program very much, and here include some input and output files using the data above.
• bell2a.do, an input file using nation.csv. The output from this is the log file, bell2a.log, which has regression results, and a STATA data file, nation1.dta, which has a subset of the nation.txt variables in the condensed STATA format.
• bell2.do, a 3K input file using nation.csv. The output from this is the 13K file, bell2.log. This do-file wasn't working in May 2002--my present version of STATA, STATA 7.0, says there is not room enough for all the observations.
• jan6c.do, a 2K input file using nation.csv. The output from this is the 2K file, jan6c.log.