Problem Set 2
(due February 17, 2004)
- Suppose that a random sample of five families had the following annual incomes
and savings (in thousands of dollars):
Family Income X Savings Y
A 8 .6
B 11 1.2
C 9 1.0
D 6 .7
E 6 .3
- Plot Y on X for each of the families.
- Calculate by hand the regression line predicting Savings by Income ("regress
Savings on Income"); verify the results using Stata (or your preferred
- Graph the line on the plot of Y on X.
- Interpret the intercept b0.
- Compute a 95% confidence interval for b1.
- Using your results above, compute the 95% confidence interval for average
savings of families with each of the following incomes in thousands of
- Which of the intervals above is least precise? Most precise? Why?
- Using the values listed in (f), compute the 95% prediction interval
for a single family for each of the values listed (i.e., a forecast interval).
- Using the data from Problem 1 above, compute Consumption C:
C = X - Y
- Calculate the regression line predicting Consumption by Income.
- Compare the regression slope for this line to the slope you obtained
in Problem 1.
- Suppose that a study of candidate expenditures by Republican candidates
and election outcomes in competitive districts produced the following results:
Expenditure per Republican Proportion
Voter of the Vote
- Graph the data.
- Compute the least squares regression line predicting outcome by expenditure
("regress outcome on expenditure").
- Suppose you were in a hurry, and decided to compute the regression line
by just connecting the first and last points, to get an estimator of ß1
we will call b~1.
- Draw the line connecting the first and last points.
- Write out a formula for b~1 in terms of X1,
Y1, X7, and Y7.
- Is b~1 a linear estimator of ß1?
- Is b~1 an unbiased estimator of ß1?
- Without doing any calculations, can you say how the variance of b~1
compares to the variance of the least squares estimator b1?
On what basis do you say this?
- Verify your answer in (v) by actually calculating the variances of
b1 and b~1 for the data above, expressing
your answer in terms of the unknown sigma2 (i.e., the variance
around the population regression line).
- Consider an alternate estimator, b*1, that uses instead of
the most extreme values an alternate pair of less extreme values--say
X2 and X6.
- Is b*1 linear and/or unbiased?
- Calculate the variance of b*1 and compare it to the variable
- Answer true or false; if false, indicate how the statement should
b~1 has less variance than b*1,
which illustrates the general principle that the more a pair of observations
are spread out, the more statistical leverage they exert, and hence
the more efficient they are.
- Suppose that a random sample of four states had the following per capita
incomes (in $1,000) and per capita expenditures on police services (in $100's):
Per Capita Per Capita
State Income Police Expenditures
A 12 1.0
B 8 1.0
C 7 .6
D 17 2.2
- Estimate the regression line
Y = ß0 + ß1X
- What is the 95% interval estimate of ß1?
- Graph the 4 points and the fitted line, and then graph as welll as you
can the acceptable slopes (passing through the point formed by the means
of X and Y) given by the interval estimate in (b).
- What is the 95% interval estimate of ß0?
- Which of the following hypotheses do these data allow you to reject at
the the .05 level (two-tailed)?
- ß1 = 0.00
- ß1 = 0.05
- ß1 = 0.10
- ß1 = 0.50
- Carry out project 1.43 in Kuter et al. The data are on the disk
provided with Kuter. They can also be downloaded from the Web by clicking
here and saving the data to a file.
- Carry out project 1.44 in Kuter (this uses the same data as 1.43).
- Provide a one paragraph statement of the research question you propose to
serve as the basis of your research paper; identify one or more sources of
data (i.e., ICPSR study number and title, or other published data source).
Bert Kritzer, 608-263-2277, Kritzer@PoliSci.Wisc.Edu
Last modified, February 26, 2004