RAND Health Insurance – Biddle and Hamermesh – Economics Assignment Help
The RAND Health Insurance Experiment (HIE)
What causal questions was the RAND HIE designed to answer?
Go to http: //masteringmetrics.com/resources/ and download the Stata data associated with Tables 1.3 and 1.4 in MM. The person_years.dta dataset contains information on the RAND HIE sample, in- cluding demographic characteristics and treatment assigned. The annual_spend.dta dataset contains information on annual hospital expenditures. To link these together, merge person_years.dta with annual_spend.dta using the variables person and year . Keep only those person/year observations that appear in both datasets. Generate a variable for total hospital spending, equal to the sum of dollars spent on inpatient care ( inpdol ) and outpatient care ( outsum ). Calculate the difference in average hospital spending between people who report being in excellent health ( exc_health ) versus those who report being in bad health ( bad_health ). Is this difference statistically significant at the 5% level?
As described in MM Chapter 1, the RAND HIE had many small treatment groups – in fact, the variable plan in your dataset shows that there were 24 different groups. Define a new variable plantype that divides these into 4 larger categories as follows. Plan Type 1 ( Free ) is plan 24; Plan Type 2 ( Individual Deductible ) is plans 1 and 5; Plan Type 3 ( Cost-Sharing ) is plans 9-23, inclusive; and Plan Type 4 ( Catastrophic ) is plans 2-4 and plans 6-8, both inclusive. What is the average hospital spending in each group? Is the difference in hospital spending between Plan Types 1 and 4 significant at the 5% level?
Clear your Stata session and read in rand_initial_sample_2.dta . The four plan types have already been defined in this dataset, which also contains the variable ghindx , a general health index. Is the difference in the average health between Plan Type 1 and 4 significant at the 5% level? How do your results from parts 3 and 4 relate to the HIE findings discussed in MM chapter?
Regression on time spent sleeping
Biddle and Hamermesh (1990) study the determinants of time spent sleeping (outside the classroom!). Using a sample from the 1975-1976 Time Use Study, they run regressions like this:
sleep = ß 0 + ß 1 totwork + ß 2 educ + ß 3 age + u
where sleep and totwrk (total work) are measured in minutes per week and educ (education) and age are measured in years. Means and standard deviations of these variables are as follows:
What would you expect such a regression to show and why?
A friend who took the Applied Econometrics class last year proposes to add work experience to the model. His experience variable is defined as years since graduation. Does the addition of such a variable to this model make sense?
One set of estimates look like this:
sleep = 3638.25 – 0.148totwork – 11.13educ + 2.20age + e ; n = 706 , R 2 = 0.113
Interpret the coefficients on work and education.
Sleep is negatively correlated with both hours of work and schooling. Are these likely to be causal relationships? Explain.
Interpret the R 2 . Suppose the Time Use Study collected data on whether the survey respondent had an alarm clock at home. What will happen to the R 2 if you add this covariate to the model? Should you add it?
Hypothesis testing and regression in practice
Table 1.1. in MM compares the health and demographic characteristics of insured and uninsured couples in the NHIS. Panel A compares the health across husbands in this sample with and without health insurance (HI). Calculate by hand the t-statistic for the null hypothesis that there is no difference between the health of husbands with and without HI in this sample. Construct a 95% confidence interval for the difference. Show your work. Is the difference significantly different from zero?
Download the Stata data and .do file used to produce MM Table 1.1 from the MM Resources page. Execute the Stata code in the .do file through line 35 to make sure that you use the same selection criteria that were used to produce Table 1.1.
(a) Use the sum command to calculate average health separately for husbands with and without health insurance. What is the difference in average health by insurance status? Is this difference statistically significant at the 5% level? Construct a 95% confidence interval for the difference. Do you notice any discrepancies between your answers here and those in Part 1? If so, why might there be discrepancies? (b) Use the NHIS data to construct a variable such that a regression of health on this variable reproduces the difference calculated in question (a), above. Compare the difference, t -statistic, and confidence interval for your regression estimate of differences in health with those you computed in (a).
Panel B of Table 1.1. shows that husbands with and without HI differ along many demographic dimensions. It is possible that the difference in health between the Some HI and No HI groups may be smaller if we compare across groups that are more homogeneous. Let’s use the same data as in part 2 to investigate this.
a. Is the difference between the health of husbands with Some HI and No HI significantly different from zero if you restrict to men who:
are employed and have at least 12 years of education?
iare employed, have at least 12 years of education, and earn income of at least $80,000?
(b) We can also examine this using regressions. Starting with your regression from part 2 (b) above, sequentially add controls for age ( age ), years of education ( yedu ), and income ( inc ).
Does any set of controls eliminate the difference in health between insured and uninsured? Explain how the results change as you add controls and what changes in the estimates as you add more controls might mean.
In the last regression, are the three controls, age, years of education and income, jointly significant?
Reload the RAND HIE dataset rand_initial_sample_2.dta used in Question A4. Define a dummy variable called anydum , which is equal to 1 for individuals with Plan Types 1-3 ( any insurance ) and equal to 0 for individuals with Plan Type 4 (only catastrophic insurance). Regress the general health index ghindx on a dummy for any insurance (as a reminder, ghindx , is a general health index similar to that in the NHIS, but scaled differently).
(a) Interpret your estimates of this model.
b) Sequentially add controls for age ( age ), education ( educper ), and income ( income1 ).
Do these controls have much of an effect on your estimates? Why is the effect of adding these demographic controls so different from what you saw in question 3b- ? (Hint: Think about the differences between the NHIS and the RAND HIE data.)
ii. In the last regression, are the three controls jointly significant?