10 0, discrete freq xtitle("Flunks-- Lawyers") ytitle("First line of the title" "Continues to a second line") saving(flunks-l,replace) ; *This one has y-axis grid lines, and a special y-title with two lines of text; graph export flunks-l.ps, replace; Stata will make the y-axis label vertical writing, rather than horizontal. I go to Adobe Illustrator to change that. Use help twoway_options to find out how to use the options. --------------------------------------------------------------------- USING WEIGHTS IN A REGRESSION * If I want to weight observations 1 unless the judge variables equals 1, and then use .04; generate wj = 1; replace wj = .04 if judge==1 ; dprobit judge utokyo ukyoto Flunks413 Post93 Post93UT Post93UK Post93 Fl413 [pweight=wj] ---------------------------------------------------------------------- MERGING IN A NEW VARIABLE INTO A STATA DATASET Mark tells me that this command does it, putting the new variables in wherever the joinby prefecture using "file name.dta" ---------------------------------------------------------------------- FORMATTING REGRESSION OUTPUT: OUTREG AND EXCEL Stata's standard regression output has too many decimal places and some useless statistics. To convert one or more regressions to a table form of the kind customary in economics, use one of two methods: the Outreg program, or cutting and pasting to Excel. The best way to do it is to use outreg, with output in a file like temp.tex (with long enough lines, few carriage returns). In the temp.tex file change all the left parentheses to yyy. Then load it into Excel simply by opening temp.txt and choosing comma delimiters. Then copy into WORD or LATEX. THen change all the yyy to left parentheses. FIRST, OUTREG: The add-on program outreg.ado is available from: http://econpapers.repec.org/software/bocbocode/s375201.htm Documentation is at: http://www.kellogg.northwestern.edu/researchcomputing/docs/outreg.pdf I think you can download and install it in one step by issuing the command in stata ssc install outreg In my PC's Stata 7.0, tho, that doesn't work. Instead, first type net from http://fmwww.bc.edu/repec/bocode/o/ and then type net install outreg Here is how to use the program. After each regression command such as "regress yvar xvar x2 x3 x4" type, for the first regression, outreg using table1.txt, replace bdec(2) and for every succeeding similar regression, outreg using table1.txt, append bdec(2) What this does is to add the latest regression results as a new column in a table in the file table1.txt. The top row will have the Y variable (which can differ). Coefficients and t-stats will be to 2 decimal places. The tabs won't work properly, so the table's spacing won't be right, though. To get proper spacing, you can also outreg into a comma-separated file, by adding an option like this: outreg using table1.txt, replace bdec(2) comma Then cut and paste to MS Word. Select all the text except the notes at the bottom and click on Table, Convert Text to Table. It will format it nicely. Then if you cut and paste to a plain text file, the good spacing remains. If you use dprobit dtobit, etc., you will get only MARGINAL EFFECTS reported. That is usually best. Then outreg will automatically report those too. Otherwise, try: margin[(u|c|p)] specifies that the marginal effects rather than the coefficient estimates are reported. It can be used after truncreg,marginal from STB 52 or dtobit from STB 56. One of the parameters u, c, or p is required after dtobit, corresponding to the unconditional, conditional, and probability marginal effects, respectively. It is not necessary to specify margin after dprobit, dlogit2, dprobit2, and dmlogit2. SECOND, EXCEL The other way to format a regression table is to cut-and-paste from teh STATA log on teh screen (not from the *.log file) using EDIT and then SAVE TABLE (NOT the regular Save;SAVE TABLE put in tabs between columns.) Paste it into a plain text file, or, perhpas into MS-WORD directly (and convert text to table then, I guess). Save the plain text file as temp.txt, then open it in Excel, and it will be a nice spreadsheet with columns and rows that can be cut and pasted. Note that in Excel you can also add columns of "(" and ")". Then save as a *.csv file, with comma delimiters. Edit that as plain text, to get the parentheses in the same columns as their contents, and upload into Excel again. USING BOTH Start with temp.txt. Then chagne all the commas to tabs (mabye this happens anyway if you don't use the COMMA option in OUTREG). Then make a spreadsheet file and pick TEXT as the default for Nubmer Format. Then copy and past the temp.txt file to the spreadsheet file. It will save with parentheses around the t-stats, instead of making them negative. BETTER: in the temp.txt file change all the left parentheses to yyy. Then load it into Excel simply by opening temp.txt and choosing comma delimiters. Then copy into WORD or LATEX. THen change all the yyy to left parentheses. * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~--; X-WINDOWS If I am trying to use the Indiana U. computer remotely to use Stata 9.1, load up an x-terminal and then type: ssh erasmuse@libra.uits.iu.edu or ssh erasmuse@steel.ucs.indiana.edu and issue the command: xstata * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~--; /* This is a multline comment. No semicolons are needed. */ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~--; #delimit ; * This says that the semicolon denotes the end of a line of command. All lines must end with semicolons after this; log using marginal-effects.log, replace; set more 1; *This should stop the pauses; *To keep going regardless of errors, use DO mYFILE, nostop; * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~--; FIXED-EFFECTS REGRESSIONS This creates dummies out of verbal or numeric categories. xi: tobit yvar xvar1 xvar2 i.categoryvariable, ll(10000) ul(30000); OR xi: regress yvar xvar1 xvar2 i.categoryvariable, robust; * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~--; *PLOTTING TWO VARIABLES AGAINST EACH OTHER; graph murder black; graph murder black, symbol([state]); graph murder black, symbol([state]) psize(150); *this makes the symbol size 50 percent larger than the default graph score cashleft, symbol([rank]) xscale(0, 40000) yscale(30, 110) When I tried this in Stata 9.2 in dec. 2007 it didn't work anymore, though. For a simple scatterplot I had to use: twoway scatter exshpt exshare; * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~--; *DESCRIBING THE DISTRIBUTION OF A SINGLE VARIABLE; histogram flunks if judge==0, discrete saving(flunks-l,replace) ; *This is for the discrete variable flunks, if the judge variable-0. It saves the graph as flunks-l.gph. To convert the graph to TIFF, just right-click on it in STATA and save as TIFF. ; graph winrate, box saving (winrate-box,replace) ; * for a box and whiskers plot; graph winrate, histogram bin(20) saving (winrate-histogram,replace) ; kdensity winrate, saving (winrate-kdensity,replace) ; * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~--; *MISSING VALUES; * FOr missing values in entries, use a period, . Other thigns cause trouble. Note that a missing value can be treated as being a very large number, for variable generation purposes. Watch out for things like replace var1 =4 if var2>6, because if var2 is missing, that command will make var1=4; * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~--; INSHEET; This is to get spreadsheet data into STATA: * Save the spreadsheet as a tab file. Then say INSHEET USING MYNAME.TXT; OUTSHEET: This is to get spreadsheet data out of STATA. Outsheet using myname.csv, comma, replace That uses commas to separate the variables. * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~--; CONVERTING STATA 9 DATASET TO STATA 4. Stata has set things up so later versions are incompatible with earlier ones. Stata 9 on Steel will read my co-authors' files, but not my Stata 7 on my PC. Here's how to solve that: Upload the file smith.dta to Stata 9. Use smith.dta outsheet using smith.txt Download the smith.txt to Stata 7. On the PC: set mem 50m *for a big dataset insheet using smith.txt save smith1.dta * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~--; * IF STATEMENTS AND STRING VARIABLES; replace appointed =1 if state =="NJ" ; replace appointed =1 if state =="AK" ; LOGICAL OPERATORS: "not equal to " is ~= Use & instead of "and" in logical statements. * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~--; * SUMMARY STATISTICS; Do not use the SUMMARIZE or SUMMARIZE, DETAILS command. Instead, use tabstat, like this: tabstat budget felclosed felconv, stats(min p25 median mean p75 max) f(%7.2f) columns(statistics) ; INSPECT is a good command too. It gives you little histograms for each variable. * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~--; * HOW TO HAVE ROBUST STANDARD ERRORS IN TOBIT IN STATA; The documentation at http://www.stata.com/support/faqs/stat/tobit.html is cryptic and seems to be missing some lines in the middle, so I am writing up these instructions. I still don't understand what is going on, but this at least makes the command work. Suppose you have regressed winrate using tobit on prosrate budpros term crimerate pop. You used tobit because winrate is censored at 0 and 100-- you have the observations at the extremes, but the values cannot be less than 0 or greater than 100. You used this command: tobit winrate prosrate budpros term pop, ll(0) ul(100) ; Another way to get exactly the same results is to use INTREG, the interval regression command. This is designed for when some of your data is intervals, as when for example you know that somebody's income is between 1 and 3 million, but not exactly where in between. Here, we use it differently. First, define two new variables winrate1 and winrate2. Each observation is in an interval [winrate1, winrate2]. Winrate1 will be winrate, or will be missing (representing negative infinity) if winrate takes its lower bound of 0. Winrate2 will be winrate, or will be missing (representing infinity) if winrate takes its upper bound of 100. Here are the Stata commands to generate those variables: gen winrate1=winrate; replace winrate1 =. if winrate <= 0; gen winrate2=winrate; replace winrate2 =. if winrate >= 100; Suppose our data was like this: (12, 0, 50, 100, 23) Then winrate 1 would be (12,. , 50, 100, 23) Then winrate 2 would be (12, 0, 50, ,, 23) The intervals in the regression would be ([12,12], [-infinity, 0], [50,50], [100, infinity], [23,23]). Then use the INTREG command. Its first two variables, winrate1 and winrate2, are the bounds that make up the intervals for each observation of the dependent variable, and the rest are independent variables. intreg winrate winrate2 prosrate budpros term pop; The regression above should give exactly the same results as the tobit command specified earlier. It doesn't quite, for me, but I put that down to different maximization algorithms for the two commands. Now, though, we can have robust standard errors as an option to correct for heteroskedasticity: intreg winrate winrate2 prosrate budpros term pop, robust; This last command has the exact same coefficient estimates, but different, and consistent, standard errors. What if you only left-censor, so winrate can't be less than 0, but it can be greater than 100? Then generate the bounds like this: gen winrate1=winrate; replace winrate1 =. if winrate <= 0; gen winrate2=winrate; What if you only right-censor, so winrate can be less than 0, but it can't be greater than 100? Then generate the bounds like this: gen winrate1=winrate; gen winrate2=winrate; replace winrate2 =. if winrate >= 100; * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~--; *MARGINAL EFFECTS IN TOBIT; See if dtobit, dlogit2, dprobit work. They report marginal effects instead of coefficients. Use the command dprobit, at(xxxx) if you want it calculated at xxx- but xxx has to be a matrix, and I don't know how to get a median matrix. Also, dtobit has to be installed from the web, epsecially. . Otherwise, use the mfx command below. mfx is the marginal effects command. Use a dot if you are only left-censored or right-censored. predict(e(0, 100)) will give the marginal effect on the expected value of the dependent variable conditional on being uncensored, E(y|a