4 Empirical Strategy

We now turn toward estimating the causal impact of the Adams Scholarship on students’ college

outcomes. Comparing outcomes of those eligible and ineligible for the Adams Scholarship would

confound the impact of the scholarship with the fact that eligible students have higher academic

skill than ineligible ones. We eliminate this source of omitted variable bias by using a regression

discontinuity design that compares students just above and below the eligibility thresholds. Students just above and just below these thresholds should be similar to each other except for receipt

of the scholarship. Though the scholarship may incentivize students to raise their test scores and

qualify for the aid, there is little scope for manipulation of test scores around eligibility thresholds

18In Section 6, we describe the average cost trade-off calculations and show that despite initial savings, there are

earnings losses even larger than the savings.

11

for three reasons. First, the earliest cohorts of students took their MCAS exams prior to the announcement of the Adams Scholarship. Second, at the time of test administration, the district-level

75th percentile threshold is impossible for individual students to know precisely. Third, exams are

centrally scored and raw scores transformed into scaled scores via an algorithm unknown to students, their families or teachers.

Figure 2 provides a graphical interpretation of scholarship eligibility in three types of school

districts. In each type of district, the straight line with a slope of negative one represents the cutoff

that determines whether a student’s total MCAS scores (math + ELA) places her in the top 25%

of her school district. The W-shaped boundary defines the region in which students have scored

“advanced” in one subject and “proficient” or “advanced” in the other. In low-performing districts

with 25% cutoff scores of at most 500, that cutoff is so low that passing the proficient/advanced

threshold is sufficient (and necessary) to win a scholarship. In medium-scoring districts with 25%

cutoff scores between 502 and 518, that cutoff and proficient/advanced threshold interact in a

complex way. In high-performing districts with 25% cutoff scores of at least 520, that cutoff is

so high that passing it is sufficient to win. Scholarship winners are those students whose test

scores thus fall in the shaded region of the graph. We note here that MCAS scores have risen

dramatically since the inception of the program, as shown in Figure A.5. Because so many students

pass the proficient/advanced threshold, relatively few districts in our sample are low-performing

as defined by Figure 2. In other words, it is the top 25% boundary that is generally of the greatest

importance, which can be seen by the fact that a full 25% of students qualify for the scholarship

each year.

There are many strategies for dealing with multidimensional regression discontinuities, as

discussed by Reardon and Robinson (2012). Examples of such situations in the economics of education include Papay et al. (2010, 2011a,b). We collapse the discontinuity into a single dimension

by defining for each student the distance of her math score from the minimum math score that defines eligibility, given her school district and ELA score. In Figure 2, this can be thought of as the

horizontal distance between the point defined by each student’s pair of test scores and the dark

12

line defining the eligibility threshold in her school district.19 We use raw scores rather than scaled

scores in defining the running variable for two reasons. First, the raw scores are a finer measure

of skill than the scaled score bins into which they are collapsed. Second, we observed extreme

bunching in values of the scaled scores, particularly around the values that define the proficient

and advanced thresholds.

This bunching is driven entirely by the way that Massachusetts assigns groups of raw scores into scaled score bins, as the raw scores themselves have the extremely

smooth distributions seen in Figures A.6 and A.7.20

As a result, the density of the running variable shown in Figure 3 looks largely smooth, suggesting little scope for endogenous manipulation that would violate the assumptions underlying

the regression discontinuity design (McCrary, 2008). We do, however, see a small spike at zero itself, which is driven by the fact that a district’s 75% threshold is mechanically more likely to fall on

test scores that are more common in that district. Figure A.8 is consistent with this fact, showing

that no such spike occurs in the low-performing districts for which only the proficient/advanced

threshold, and not the 75% threshold, defines the boundary.21 Though the spike is small and not

driven by endogenous manipulation of the running variable itself, we later show that our central

results are robust to and even strengthend by excluding students directly on the boundary, in a

so-called “doughnut hole” regression discontinuity.

To estimate the causal effect of the Adams Scholarship, we use local linear regression to estimate linear probability models of the form:

Yijt = Î²0 + Î²1Adamsijt + Î²2Gapijt + Î²3Gapijt × Adamsijt + �ijt. (1)

where Gapijt is the running variable described above and Adams is an indicator for Adams Scholarship eligibility (Gapijt ≥ 0).22 The causal effect of winning the Adams Scholarship on an out19Our results are robust to defining the running variable as the vertical distance, the distance of each student’s ELA

score from the minimum ELA score that defines eligibility, given her school district and math score.

20Goodman (2008) characterized each student by the minimum of her scaled score distance from the proficient/advanced and top 25% thresholds. Distance to the top 25% threshold is not an easily defined quantity when

raw scores are used because the straight line boundary observed in Figure 2 becomes quite jagged. We therefore prefer

the running variable described in the text above. Estimates using the running variable as defined in Goodman (2008)

are, nonetheless, quite similar to those presented here and are available by request from the authors.

21Figure A.9 show very similar patterns for the 2005-08 sample.

22We use linear probability models here and in our later IV regressions rather than limited dependent variable models

13

come, Yijt, should be estimated by Î²1 if the usual assumptions underlying the validity of the regression discontinuity design are not violated. Assuming that treatment effects are homogeneous

along different parts of the eligibility threshold, this coefficient measures a local average treatment

effect for students near the threshold, weighted by the probability of a given student being near

the threshold itself (Reardon and Robinson, 2012).

Our preferred implementation uses local linear regression with an triangular kernel that weights

points near the threshold more heavily than those far from the threshold. We compute optimal bandwidths following the procedure developed by Imbens and Kalyanaraman (2012), which

trades off precision for bias generated by deviations from linearity away from the threshold.

Across nearly all of our outcomes and samples,

the optimal bandwidth generated by this procedure falls somewhere between 10 and 15 raw score points. For simplicity and ease of defining a

single sample across outcomes, we choose as our default specification a bandwidth of 12. We then

show that our results are quite robust to a wider set of bandwidths, to inclusion of demographic

controls, to inclusion of school district by cohort fixed effects, and to use of parametric specifications, including polynomials of various degrees. We cluster standard errors by 12th grade school

district in all specifications in order to account for within district correlations in the error term �ijt.

As further reassurance of the validity of the discontinuity design employed here, Table 3 tests

whether observed covariates vary discontinuously at the eligibility threshold. The first eight

columns test the basic covariates, including gender, race, low income, limited English proficiency

and special education status.

With the exception of marginally significant but small differences

in the probability of being black or “other” race for the 2005-06 sample, none of those covariates

shows a statistically significant discontinuity in either the 2005-06 or the 2005-08 sample. The

estimates are precise enough to rule out economically significant discontinuities as well. To test

whether these covariates are jointly discontinuous, we generate in columns 9 and 10 predicted

math and ELA z-scores by regressing scores from the class of 2004 on the demographic controls

listed in the previous eight columns. We then use the resulting regression estimates to predict

for the reasons discussed by Angrist (2001). In particular, we are interested in directly interpretable causal effects and

not on structural parameters generated by non-linear models.

We also note that estimates generated by probit and logit

models turn out to be extremely similar to those generated by the linear probability model above.

14

scores for students in subsequent classes. The estimates in columns 9 and 10 suggest no discontinuity in predicted test scores and the estimates are precise enough to rule out differences around

the eligibility threshold of more than 0.02 standard deviations in academic skill. Figure 4 shows

graphically the average predicted scores of students in each bin defined by distance from the eligibility threshold, confirming the lack of any clear difference in academic skill between students

just above and just below the threshold in the 2005-06 sample.23