Political Science 551
I, 2006-07
Professor Herbert Kritzer
TA: Peter Holm


NOTE: This syllabus is subject to minor changes both before and during the semester.


 

Introduction to Statistical Inference for Political Research

There is one required text for the course, available at the University Bookstore:

In addition, a set of readings will be placed on electronic reserve; I may be able to make available a packet for individuals to copy. The online syllabus should provide direct links to these readings (let me know if any of the links fail to work); readings shown in red on the online syllabus have not yet had links established. Due to copyright restrictions some of the readings may be in password-protected PDF files; the password to access these files will be distributed in class.

The requirements for this course consist of the following:

A variety of small data sets will be available for students to draw upon in doing the research exercise; also, students may do this exercise in conjunction with a paper for another course. In any case, the topic of the exercise must be approved by the instructor in advance.

Two lectures a week plus a "lab" section have been scheduled. The lab session is intended as an informal time for clarification, specific help with problems, instruction in using Stata (the statistical package we will be using), and the like. It is extremely important that students keep up with the material; once you fall behind it is very difficult to catch up.

Suggestions and feedback regarding the course are welcome at any time.

PLEASE NOTE THE FOLLOWING:

Formulas discussed in class will be posted as PDF files prior to each lecture. They will be formatted for note taking. I strongly suggest printing these off prior to the lecture and bringing them to class so that you can annotate them with your own notes. Links will be added to the online syllabus with this information.

Extra review sessions will be scheduled just before the two midterms and before the final examination.

During the first week of classes, the discussion sections will meet at the Social Science Micro Computing Lab (SocSciML) in 3218a Social Science; the purpose of these sessions is to introduce students to the lab facility and its procedures, and to Stata (the statistical package we will be using). Peter will also schedule one or two sessions in the small lab in North Hall which is available for use by Political Science graduate students. I have also scheduled an optional algebra review session the first week (Wednesday, September 6, 7:15-9:00 pm, 4208 H.C. White); this session will cover basic algebra concepts and the algebra of summation. (This session also serves as a "make up" for a class on Tuesday, September 26 when I have to be out of town for a speaking engagement.)

Problems denoted by letters can be found at the end of the syllabus.

E-mail questions are welcome; e-mail addresses are at the end of the syllabus.

I welcome students to come see my during my office hours (Tuesday 11-12; Thursday, 1:15-2:15) or make an appointment at a time that is mutually convenient; my office is located in North Hall 201D.



ASSIGNMENTS
Week Topic & Reading Assignment
PART I: DATA DESCRIPTION
1 Sept. 5
Description of One Variable:
Plots, percentages
central tendency (properties of the mean)
  • Reading:
    • M&M, §1.1, §1.2 (to p. 48)
  • Problems
    • 1.18, 1.20, 1.26

NOTE: There will be an optional algebra review session on Wednesday evening, September 6, 7:15-9:00 PM.

 
  
2 Sept. 12
Description of One or Two Variables:
Dispersion and shape [notes with central tendency lecture]
Crosstabulation, difference of means
 
 
3 Sept. 19

Description of Two Variables:
   Linear regression and correlation

Introduction to Probability I
   Sampling
   Basic Probability Rules
   

 

 
 
PART II: PROBABILITY THEORY
4 Sept. 26
Introduction to Probability I:
Conditional probability [notes with previous lecture]
Bayes Theorem
One Random Variable
  • Reading:
    • M&M, §4.3, §4.4
  • Problems:
    • 4.14, 4.22, 4.44, 4.46, 4.52

NOTE: NO CLASS TUESDAY

 
  
5 Oct. 3
Introduction to Probability II:
Two or More Random Variables
  • Reading
    • M&M §4.5
  • Problems:
    • 4.74, 4.86, 4.88, 4.90, 4.96, 4.116

 

 
 
6 Oct. 10
Introduction to Probability (continued)
MIDTERM
Review Session Wednesday evening, Time TBA
Midterm, Thursday in class

(no discussion sections on Thursday)

 

 

7 Oct. 17
Univariate Probability Distributions:
Binomial and normal distributions
  • Reading:
    • M&M, §1.3, pp. 284-285, §5.1
  • Problems:
    • B, C, D, E, 1.92, 1.98, 1.100, 4.42
 
 
  PART III: STATISTICS
8 Oct. 24
Estimation:
Point estimation; sampling distributions
  • Reading:
    • M&M, §3.4, §5.2
  • Problems:
    • 3.62, 3.64, 3.68, 5.20, 5.26, 5.32, 5.36, 5.50, G, H
   
9 Oct. 31
Using sampling distributions:
Interval Estimation & Hypothesis Testing; Power
 
 
10 Nov. 7
Inference for Means
  
11 Nov. 14
Inference for Category Data
Proportions and contingency tables
 
12 Nov. 21
MIDTERM WEEK
Review Session Monday evening, Time TBA
Exam, Tuesday during class
   
13 Nov.28
Regression Inference I:
The regression model
  
14 Dec. 5
Regression Inference II:
Interval estimates and tests of significance
  
15 Dec 12
Inference for Means II: Oneway Analysis of Variance

Introduction to Multiple Regression (if time permits)

     
* * RESEARCH EXERCISE DUE * *
FRIDAY, DEC. 22, 4 P.M.

* * * FINAL EXAMINATION * * *
WEDNESDAY, DEC. 20, 12:25 P.M.

 




E-mail Addresses:
     KRITZER@POLISCI.WISC.EDU
     PHOLM@WISC.EDU



PROBLEMS IDENTIFIED BY LETTERS

A. (1) Make a histogram of the following numbers:
  36, 43, 82, 84, 81, 84, 45, 60, 64, 71, 81, 78, 79, 43, 79
(2) Make a histogram for numbers that are ten times the numbers in (1). Compare the display to that in part (1).
(3) Make a histogram for numbers that are five times the numbers in (1). Compare the display to those in parts (1) and (2).
(4) Make a stem-and-leaf (by hand) of the numbers in A(1) above.
  
B. The baseball World Series pits the winner of the American League against the winner of the National League. These two teams play until one team wins a total of four games.
(1) Pretend that the two teams are perfectly matched and there is a 50-50 chance of each team winning any given game. Further, suppose the games are independent; that is, the change of winning a particular game does not depend on whether other games have been won, or lost. Use Stata to simulate 20 World Series; a simple Stata "program" to generate these data is available; paste this into the Stata Do-file editor (start the do file editor by clicking on Window and then on Do-file Editor) and execute it by highlighting the commands and clicking the Run button. Before running the program, open the textbook to Table B, close your eyes, and point to a number in the table; then look for the line in the program that reads "set seed 53" and replace the 53 with the number you are pointiing at (at least three digits).; Now, if you look closely at the commands you will see a line that inclues "gen X=uniform()>(1-.5)"; this tells Stata to generate a random variable from a bermoulli distribution with a probability of .5. Recall that the bernoulli distribution has only two values, 0 and 1; pretend a 1 means the National League won and 0 means the American League won. For each series, determine which team won. How many games lasted only four games? How many lasted five games? Six games? Seven games?
(2) Repeat (1) but now pretend the National League had a probability of .6 of winning each game; you will need to change the line in the Stata program that generated the bernoulli variables so that the probability is now .6 rather than .5. How do these results compare with those in (1)?
  
C.

A total of 16 mice are sent down a maze, one by one. From previous experience it is believed that the probability a mouse turns right is .38. Suppose their turning pattern follows a binomial distribution. Using the display command with the Binomial function in Stata, answer each of the following:

In Stata, Binomial(n,k,p) returns the probability of k or more successes (i.e., a CDF) in n trials when the probability of a success on a single trial is p; the formal of a command to display the probability of 5 or more heads on 10 tosses of an honest coin would be:

display Binomial(10,5,.5)

Remember that Stata is case sensitive, so the "B" in "Binomial" must be capitalized. To get a PDF (i.e., exactly 5 heads rather than 5 or more heads), calculate the CDF for 5 and for 6 and then take the difference.

(1) What is the probability that exactly seven of the 16 mice turn right?
(2) That eight or fewer turn right?
(3) That more than eight turn right?
(4) That eight or more turn right?
  
D. Acceptance Sampling. When a company buys a large lot of materials, they usually don't check every single item to see that they are all satisfactory. Instead, some companies pick a sample of items, then check these and if they do not find many defective items, they go ahead and accept the whole lot. In this problem we'll look at the kind of risk they run when they do this. Suppose the inspection plan consists of looking at 40 items chosen at random from a large shipment, then accepting the entire shipment if there are 0 or 1 defective items, and rejecting the shipment if there are 2 or more defective items.
(1)

If in the entire shipment 25% of the items are defective, what is the probability the shipment will be accepted? [HINT: Use Stata's Binomial function.]

(2)

Compute the probability of acceptance if the shipment has 8% defective, 5% defective, 3%, 2% 1%, .5%, and .1% defective. Then sketch a plot of the probability of acceptance versus the percent defective.

Use Stata's Binomial function. Put the six values in the first column of the data sheet (to access the data sheet, click on Window and then Data Editor). Then use the generate command and the Binomial function to get the probabilities in the second column (you have to exit the Data Editor to get back to the command line):

generate var2=Binomial(40,2,var1)

This will give you the probability of rejecting the shipment. You should be able to figure out how to obtain the probability of accepting the sample using another generate command to create another variable var3.

You can then display the results by typing list var3, and you can use Stata's graphing capabilities to prepare the plot.

  
E. Suppose X has a binomial distribution with p = .8 and n = 25. Use the Binomial function in Stata to calculate each of the probabilities below exactly. Also compute the normal approximation to these probabilities. Remember to go to .5 below and above when computing the normal approximation. Compare the binomial results with the normal approximations.
(1) Prob (X = 21)
(2) Prob (X less than or equal to 21)
(3) Prob (X more than or equal to 24)
(4) Prob (21 less than or equal to X less than or equal to 24)
  
F. If you have a binomial distribution with a large value of n and a small value of p, its probabilities can be closely approximated by a Poisson distribution with a mean equal to np.
(1)

Use n = 30 and p =.01 and compute the corresponding binomial and Poisson probabilities (pdfs).

To get the binomial pdf, put the number 0 to 30 in first column of a Stata data sheet (which I will call var1), and then use the generate command with the Binomial function to get the cdf (which I will call var2):

gen var2=Binomial(30,var1,.01)

Now use the following two commands to get the pdf:

generate var3=var2-var2[_n+1]
replace var3=var2[31] if _n==_N

[NOTE: If you make a mistake with the generate command, and want to replace a variable with a new specification, you must either use "replace" instead of "generate" or first "drop" the variable in question (e.g., "drop var3") ]

Getting the Poisson cdf and pdf is a bit more cumbersome because there is no "Poisson" function equivalent to the "Binomial" function. If L is equal to lambda (np) and and your count is in the variable var1, then you can get the Poisson PDF with the following command:

generate var4= (exp(L)*.L^var1)/exp(lnfactorial(var1))

where you replace L with the value of np.

Do they seem to be pretty close?

(2)

Compare the binomial and Poisson cdfs. How well do they agree?

The Binomial function will provide you with the cdf (in variable var2 if you followed the procedure above.

To get the cdf of the Poisson, assuming the pdf is in var4, use the following two commands:

generate var5=1-sum(var4[_n-1])

(3) Repeat (1) but use n = 30 and p = .5. How good is the approximation now?
(4) Compare the binomial and Poisson cdfs. How well do they agree?
  
G. I have a provided an Stata program that will use Stata's random number generator to create a dataset consisting of 100 rows and 10 variables named case1 to case10 such the the variables are randomly drawn from a normal distribution with a mean of 50 and a standard deviation of 6. Download this file and paste it into Stata's do-file editor (the editor can be opened from the Window menu in Stata or by Cntrl-8). Look closely at the program and find the line that sets "obs 100" (setting the number of observations to 100) and the line that contains 50+6*invnorm(uniform()) which tells Stata to generate random numbers with a mean of 50 and a standard deviation of 6. The last line in the program asks Stata to generate a histogram of the 100 means it obtains.
(1) Run the program (you can do this from the Tools menu in the Do-file editor), and print the resulting histogram. Describe the histogram.
(2)

Repeat (1) twice, each time changing the value in the SET SEED statement to a different random number. Describe the differences among the three histograms. How do you explain the differences?

(3) Repeat (1) using a sample size of 2 (i.e., use only the first two columns of random values by changing the "egen" statement as described in the Stata program). Compare your results.
(4) Repeat for a sample size of 5 (use only the first five columns of random values). Compare your results
(5) Increase the number of samples from 100 to 1,000 by modifying the "obs" statement in the do file, and repeat parts (1), (3), and (4). Compare your results.
  
H.

Problem G used a normal distribution, and you should have found that the shape of the distribution is close to normal. The central limit theorem states that as the sample size increases, the sampling distribution of the mean approaches normality for any distribution. Modify the Stata program from Problem G to draw numbers from a uniform distribution with a minimum of 35 and a maximum of 65; to do this, replace the statement that now reads on the right of the = sign,
    50+6*invnorm(uniform())
with
   35+(65-35)*uniform()

(1) Repeat parts (1), (3), (4) and (5) from Problem G. Compare the results.
(2) Without doing any additional calculations, what difference would it have made if you had used RSUM (which computes the sum rather than the mean) rather than RMEAN?
  
J. The New York Times recently carried an article on homicide rates under the title "Curse of the South: Behind American Homicide." The article showed that the homicide rate in the South was 9.0 per 100,000 population compared to 5.4 for the Northeast, 6.4 for the Midwest, and 7.7 for the West. The data from the article are in the Stata file murder.dta can be downloaded by clicking on the name or can be found in the course home directory; using those data perform a two sample t-test comparing the South to the rest of the country. Discuss your results.
  
K. Use Stata to do a simple analysis of variance of the murder rate data comparing the four regions identified in the Times article. Discuss your results.
  
L. Segal and his colleagues have published an update of ideology measures for Supreme Court justices that include considerably more justices. Replicate the regression analysis reported in the Segal and Cover article in the readings using the data in the file justices.dta, which can be downloaded by clicking on the link, or accessed from the course home directory. How do your results compare to what was reported in the article?
   
M.

Three candidates ran for state legislature. In their reports on campaign contributions, they reported the following figures:

Candidate
n
mean
s
A
88
$121.7
32.7
B
91
$84.4
18.4
C
54
$111.9

28.5

All
233
$104.9
31.6

Complete a oneway analysis of variance testing the hypothesis that the mean contribution varied by candidate. Present the results using an ANOVA table, and state you conclusion about the hypothesis you tested.





E-mail Addresses:
     kritzer@polisci.wisc.edu
     pholm@wisc.edu

Bert Kritzer, 608-263-2277, Kritzer@PoliSci.Wisc.Edu

Last modified, October 18, 2006