|Political Science 551
TA: Peter Holm
NOTE: This syllabus is subject to minor changes both before and during the semester.
There is one required text for the course, available at the University Bookstore:
In addition, a set of readings will be placed on electronic reserve; I may be able to make available a packet for individuals to copy. The online syllabus should provide direct links to these readings (let me know if any of the links fail to work); readings shown in red on the online syllabus have not yet had links established. Due to copyright restrictions some of the readings may be in password-protected PDF files; the password to access these files will be distributed in class.
The requirements for this course consist of the following:
A variety of small data sets will be available for students to draw upon in doing the research exercise; also, students may do this exercise in conjunction with a paper for another course. In any case, the topic of the exercise must be approved by the instructor in advance.
Two lectures a week plus a "lab" section have been scheduled. The lab session is intended as an informal time for clarification, specific help with problems, instruction in using Stata (the statistical package we will be using), and the like. It is extremely important that students keep up with the material; once you fall behind it is very difficult to catch up.
Suggestions and feedback regarding the course are welcome at any time.
PLEASE NOTE THE FOLLOWING:
Formulas discussed in class will be posted as PDF files prior to each lecture. They will be formatted for note taking. I strongly suggest printing these off prior to the lecture and bringing them to class so that you can annotate them with your own notes. Links will be added to the online syllabus with this information.
Extra review sessions will be scheduled just before the two midterms and before the final examination.
During the first week of classes, the discussion sections will meet at the Social Science Micro Computing Lab (SocSciML) in 3218a Social Science; the purpose of these sessions is to introduce students to the lab facility and its procedures, and to Stata (the statistical package we will be using). Peter will also schedule one or two sessions in the small lab in North Hall which is available for use by Political Science graduate students. I have also scheduled an optional algebra review session the first week (Wednesday, September 6, 7:15-9:00 pm, 4208 H.C. White); this session will cover basic algebra concepts and the algebra of summation. (This session also serves as a "make up" for a class on Tuesday, September 26 when I have to be out of town for a speaking engagement.)
Problems denoted by letters can be found at the end of the syllabus.
E-mail questions are welcome; e-mail addresses are at the end of the syllabus.
I welcome students to come see my during my office hours (Tuesday 11-12; Thursday, 1:15-2:15) or make an appointment at a time that is mutually convenient; my office is located in North Hall 201D.
|Week||Topic & Reading Assignment|
|PART I: DATA DESCRIPTION|
|1 Sept. 5||
NOTE: There will be an optional algebra review session on Wednesday evening, September 6, 7:15-9:00 PM.
|2 Sept. 12||
|3 Sept. 19||
Description of Two Variables:
|PART II: PROBABILITY THEORY|
|4 Sept. 26||
NOTE: NO CLASS TUESDAY
|5 Oct. 3||
|6 Oct. 10||
(no discussion sections on Thursday)
|7 Oct. 17||
|PART III: STATISTICS|
|8 Oct. 24||
|9 Oct. 31||
|10 Nov. 7||
|11 Nov. 14||
|12 Nov. 21||
|14 Dec. 5||
|15 Dec 12||
Introduction to Multiple Regression (if time permits)
|* * RESEARCH EXERCISE DUE * *
FRIDAY, DEC. 22, 4 P.M.
* * * FINAL EXAMINATION * * *
PROBLEMS IDENTIFIED BY LETTERS
|A.||(1)||Make a histogram of the following numbers:
36, 43, 82, 84, 81, 84, 45, 60, 64, 71, 81, 78, 79, 43, 79
|(2)||Make a histogram for numbers that are ten times the numbers in (1). Compare the display to that in part (1).|
|(3)||Make a histogram for numbers that are five times the numbers in (1). Compare the display to those in parts (1) and (2).|
|(4)||Make a stem-and-leaf (by hand) of the numbers in A(1) above.|
|B.||The baseball World Series pits the winner of the American League against the winner of the National League. These two teams play until one team wins a total of four games.|
|(1)||Pretend that the two teams are perfectly matched and there is a 50-50 chance of each team winning any given game. Further, suppose the games are independent; that is, the change of winning a particular game does not depend on whether other games have been won, or lost. Use Stata to simulate 20 World Series; a simple Stata "program" to generate these data is available; paste this into the Stata Do-file editor (start the do file editor by clicking on Window and then on Do-file Editor) and execute it by highlighting the commands and clicking the Run button. Before running the program, open the textbook to Table B, close your eyes, and point to a number in the table; then look for the line in the program that reads "set seed 53" and replace the 53 with the number you are pointiing at (at least three digits).; Now, if you look closely at the commands you will see a line that inclues "gen X=uniform()>(1-.5)"; this tells Stata to generate a random variable from a bermoulli distribution with a probability of .5. Recall that the bernoulli distribution has only two values, 0 and 1; pretend a 1 means the National League won and 0 means the American League won. For each series, determine which team won. How many games lasted only four games? How many lasted five games? Six games? Seven games?|
|(2)||Repeat (1) but now pretend the National League had a probability of .6 of winning each game; you will need to change the line in the Stata program that generated the bernoulli variables so that the probability is now .6 rather than .5. How do these results compare with those in (1)?|
A total of 16 mice are sent down a maze, one by one. From previous experience it is believed that the probability a mouse turns right is .38. Suppose their turning pattern follows a binomial distribution. Using the display command with the Binomial function in Stata, answer each of the following:
|(1)||What is the probability that exactly seven of the 16 mice turn right?|
|(2)||That eight or fewer turn right?|
|(3)||That more than eight turn right?|
|(4)||That eight or more turn right?|
|D.||Acceptance Sampling. When a company buys a large lot of materials, they usually don't check every single item to see that they are all satisfactory. Instead, some companies pick a sample of items, then check these and if they do not find many defective items, they go ahead and accept the whole lot. In this problem we'll look at the kind of risk they run when they do this. Suppose the inspection plan consists of looking at 40 items chosen at random from a large shipment, then accepting the entire shipment if there are 0 or 1 defective items, and rejecting the shipment if there are 2 or more defective items.|
If in the entire shipment 25% of the items are defective, what is the probability the shipment will be accepted? [HINT: Use Stata's Binomial function.]
Compute the probability of acceptance if the shipment has 8% defective, 5% defective, 3%, 2% 1%, .5%, and .1% defective. Then sketch a plot of the probability of acceptance versus the percent defective.
|E.||Suppose X has a binomial distribution with p = .8 and n = 25. Use the Binomial function in Stata to calculate each of the probabilities below exactly. Also compute the normal approximation to these probabilities. Remember to go to .5 below and above when computing the normal approximation. Compare the binomial results with the normal approximations.|
|(1)||Prob (X = 21)|
|(2)||Prob (X less than or equal to 21)|
|(3)||Prob (X more than or equal to 24)|
|(4)||Prob (21 less than or equal to X less than or equal to 24)|
|F.||If you have a binomial distribution with a large value of n and a small value of p, its probabilities can be closely approximated by a Poisson distribution with a mean equal to np.|
Use n = 30 and p =.01 and compute the corresponding binomial and Poisson probabilities (pdfs).
Do they seem to be pretty close?
Compare the binomial and Poisson cdfs. How well do they agree?
|(3)||Repeat (1) but use n = 30 and p = .5. How good is the approximation now?|
|(4)||Compare the binomial and Poisson cdfs. How well do they agree?|
|G.||I have a provided an Stata program that will use Stata's random number generator to create a dataset consisting of 100 rows and 10 variables named case1 to case10 such the the variables are randomly drawn from a normal distribution with a mean of 50 and a standard deviation of 6. Download this file and paste it into Stata's do-file editor (the editor can be opened from the Window menu in Stata or by Cntrl-8). Look closely at the program and find the line that sets "obs 100" (setting the number of observations to 100) and the line that contains 50+6*invnorm(uniform()) which tells Stata to generate random numbers with a mean of 50 and a standard deviation of 6. The last line in the program asks Stata to generate a histogram of the 100 means it obtains.|
|(1)||Run the program (you can do this from the Tools menu in the Do-file editor), and print the resulting histogram. Describe the histogram.|
Repeat (1) twice, each time changing the value in the SET SEED statement to a different random number. Describe the differences among the three histograms. How do you explain the differences?
|(3)||Repeat (1) using a sample size of 2 (i.e., use only the first two columns of random values by changing the "egen" statement as described in the Stata program). Compare your results.|
|(4)||Repeat for a sample size of 5 (use only the first five columns of random values). Compare your results|
|(5)||Increase the number of samples from 100 to 1,000 by modifying the "obs" statement in the do file, and repeat parts (1), (3), and (4). Compare your results.|
Problem G used a normal distribution, and you should have found that
the shape of the distribution is close to normal. The central limit
theorem states that as the sample size increases, the sampling distribution
of the mean approaches normality for any distribution. Modify the Stata
program from Problem G to draw numbers from a uniform distribution with
a minimum of 35 and a maximum of 65; to do this, replace the statement
that now reads on the right of the = sign,
|(1)||Repeat parts (1), (3), (4) and (5) from Problem G. Compare the results.|
|(2)||Without doing any additional calculations, what difference would it have made if you had used RSUM (which computes the sum rather than the mean) rather than RMEAN?|
|J.||The New York Times recently carried an article on homicide rates under the title "Curse of the South: Behind American Homicide." The article showed that the homicide rate in the South was 9.0 per 100,000 population compared to 5.4 for the Northeast, 6.4 for the Midwest, and 7.7 for the West. The data from the article are in the Stata file murder.dta can be downloaded by clicking on the name or can be found in the course home directory; using those data perform a two sample t-test comparing the South to the rest of the country. Discuss your results.|
|K.||Use Stata to do a simple analysis of variance of the murder rate data comparing the four regions identified in the Times article. Discuss your results.|
|L.||Segal and his colleagues have published an update of ideology measures for Supreme Court justices that include considerably more justices. Replicate the regression analysis reported in the Segal and Cover article in the readings using the data in the file justices.dta, which can be downloaded by clicking on the link, or accessed from the course home directory. How do your results compare to what was reported in the article?|
Three candidates ran for state legislature. In their reports on campaign contributions, they reported the following figures:
Complete a oneway analysis of variance testing the hypothesis that the mean contribution varied by candidate. Present the results using an ANOVA table, and state you conclusion about the hypothesis you tested.
Last modified, October 18, 2006