Problem Set 2

(due February 17, 2004)

  1. Suppose that a random sample of five families had the following annual incomes and savings (in thousands of dollars):
    
                     Family   Income X    Savings Y
                  ----------------------------------
                        A       8          .6
                        B      11         1.2
                        C       9         1.0
                        D       6          .7
                        E       6          .3
    
    1. Plot Y on X for each of the families.
    2. Calculate by hand the regression line predicting Savings by Income ("regress Savings on Income"); verify the results using Stata (or your preferred software).
    3. Graph the line on the plot of Y on X.
    4. Interpret the intercept b0.
    5. Compute a 95% confidence interval for b1.
    6. Using your results above, compute the 95% confidence interval for average savings of families with each of the following incomes in thousands of dollars:
      1. 6
      2. 8
      3. 10
      4. 12
    7. Which of the intervals above is least precise? Most precise? Why?
    8. Using the values listed in (f), compute the 95% prediction interval for a single family for each of the values listed (i.e., a forecast interval).


  2. Using the data from Problem 1 above, compute Consumption C:
    C = X - Y
    1. Calculate the regression line predicting Consumption by Income.
    2. Compare the regression slope for this line to the slope you obtained in Problem 1.



  3. Suppose that a study of candidate expenditures by Republican candidates and election outcomes in competitive districts produced the following results:
    
                    X                         Y
               Expenditure per       Republican Proportion
                  Voter                   of the Vote
              -------------------------------------------------
                    $1                        40
                    $2                        50
                    $3                        50
                    $4                        70
                    $5                        65
                    $6                        65
                    $7                        80
    
    1. Graph the data.
    2. Compute the least squares regression line predicting outcome by expenditure ("regress outcome on expenditure").
    3. Suppose you were in a hurry, and decided to compute the regression line by just connecting the first and last points, to get an estimator of ß1 we will call b~1.
      1. Draw the line connecting the first and last points.
      2. Write out a formula for b~1 in terms of X1, Y1, X7, and Y7.
      3. Is b~1 a linear estimator of ß1?
      4. Is b~1 an unbiased estimator of ß1?
      5. Without doing any calculations, can you say how the variance of b~1 compares to the variance of the least squares estimator b1? On what basis do you say this?
      6. Verify your answer in (v) by actually calculating the variances of b1 and b~1 for the data above, expressing your answer in terms of the unknown sigma2 (i.e., the variance around the population regression line).
    4. Consider an alternate estimator, b*1, that uses instead of the most extreme values an alternate pair of less extreme values--say X2 and X6.
      1. Is b*1 linear and/or unbiased?
      2. Calculate the variance of b*1 and compare it to the variable of b~1.
      3. Answer true or false; if false, indicate how the statement should be corrected.
        b~1 has less variance than b*1, which illustrates the general principle that the more a pair of observations are spread out, the more statistical leverage they exert, and hence the more efficient they are.


  4. Suppose that a random sample of four states had the following per capita incomes (in $1,000) and per capita expenditures on police services (in $100's):
  5. 
                   Per Capita        Per Capita
             State  Income      Police Expenditures
                      (X)                (Y)
                   ($1,000's)          ($100's)
           -----------------------------------------
               A      12                1.0
               B       8                1.0
               C       7                 .6
               D      17                2.2
    
    1. Estimate the regression line
      Y = ß0 + ß1X
    2. What is the 95% interval estimate of ß1?
    3. Graph the 4 points and the fitted line, and then graph as welll as you can the acceptable slopes (passing through the point formed by the means of X and Y) given by the interval estimate in (b).
    4. What is the 95% interval estimate of ß0?
    5. Which of the following hypotheses do these data allow you to reject at the the .05 level (two-tailed)?
      1. ß1 = 0.00
      2. ß1 = 0.05
      3. ß1 = 0.10
      4. ß1 = 0.50


  6. Carry out project 1.43 in Kuter et al. The data are on the disk provided with Kuter. They can also be downloaded from the Web by clicking here and saving the data to a file.

  7. Carry out project 1.44 in Kuter (this uses the same data as 1.43).

  8. Provide a one paragraph statement of the research question you propose to serve as the basis of your research paper; identify one or more sources of data (i.e., ICPSR study number and title, or other published data source).

Bert Kritzer, 608-263-2277, Kritzer@PoliSci.Wisc.Edu
Last modified, February 26, 2004