4 Empirical Strategy
We now turn toward estimating the causal impact of the Adams Scholarship on students’ college
outcomes. Comparing outcomes of those eligible and ineligible for the Adams Scholarship would
confound the impact of the scholarship with the fact that eligible students have higher academic
skill than ineligible ones. We eliminate this source of omitted variable bias by using a regression
discontinuity design that compares students just above and below the eligibility thresholds. Students just above and just below these thresholds should be similar to each other except for receipt
of the scholarship. Though the scholarship may incentivize students to raise their test scores and
qualify for the aid, there is little scope for manipulation of test scores around eligibility thresholds
18In Section 6, we describe the average cost trade-off calculations and show that despite initial savings, there are
earnings losses even larger than the savings.
11
for three reasons. First, the earliest cohorts of students took their MCAS exams prior to the announcement of the Adams Scholarship. Second, at the time of test administration, the district-level
75th percentile threshold is impossible for individual students to know precisely. Third, exams are
centrally scored and raw scores transformed into scaled scores via an algorithm unknown to students, their families or teachers.
Figure 2 provides a graphical interpretation of scholarship eligibility in three types of school
districts. In each type of district, the straight line with a slope of negative one represents the cutoff
that determines whether a student’s total MCAS scores (math + ELA) places her in the top 25%
of her school district. The W-shaped boundary defines the region in which students have scored
“advanced” in one subject and “proficient” or “advanced” in the other. In low-performing districts
with 25% cutoff scores of at most 500, that cutoff is so low that passing the proficient/advanced
threshold is sufficient (and necessary) to win a scholarship. In medium-scoring districts with 25%
cutoff scores between 502 and 518, that cutoff and proficient/advanced threshold interact in a
complex way. In high-performing districts with 25% cutoff scores of at least 520, that cutoff is
so high that passing it is sufficient to win. Scholarship winners are those students whose test
scores thus fall in the shaded region of the graph. We note here that MCAS scores have risen
dramatically since the inception of the program, as shown in Figure A.5. Because so many students
pass the proficient/advanced threshold, relatively few districts in our sample are low-performing
as defined by Figure 2. In other words, it is the top 25% boundary that is generally of the greatest
importance, which can be seen by the fact that a full 25% of students qualify for the scholarship
each year.
There are many strategies for dealing with multidimensional regression discontinuities, as
discussed by Reardon and Robinson (2012). Examples of such situations in the economics of education include Papay et al. (2010, 2011a,b). We collapse the discontinuity into a single dimension
by defining for each student the distance of her math score from the minimum math score that defines eligibility, given her school district and ELA score. In Figure 2, this can be thought of as the
horizontal distance between the point defined by each student’s pair of test scores and the dark
12
line defining the eligibility threshold in her school district.19 We use raw scores rather than scaled
scores in defining the running variable for two reasons. First, the raw scores are a finer measure
of skill than the scaled score bins into which they are collapsed. Second, we observed extreme
bunching in values of the scaled scores, particularly around the values that define the proficient
and advanced thresholds.
This bunching is driven entirely by the way that Massachusetts assigns groups of raw scores into scaled score bins, as the raw scores themselves have the extremely
smooth distributions seen in Figures A.6 and A.7.20
As a result, the density of the running variable shown in Figure 3 looks largely smooth, suggesting little scope for endogenous manipulation that would violate the assumptions underlying
the regression discontinuity design (McCrary, 2008). We do, however, see a small spike at zero itself, which is driven by the fact that a district’s 75% threshold is mechanically more likely to fall on
test scores that are more common in that district. Figure A.8 is consistent with this fact, showing
that no such spike occurs in the low-performing districts for which only the proficient/advanced
threshold, and not the 75% threshold, defines the boundary.21 Though the spike is small and not
driven by endogenous manipulation of the running variable itself, we later show that our central
results are robust to and even strengthend by excluding students directly on the boundary, in a
so-called “doughnut hole” regression discontinuity.
To estimate the causal effect of the Adams Scholarship, we use local linear regression to estimate linear probability models of the form:
Yijt = β0 + β1Adamsijt + β2Gapijt + β3Gapijt × Adamsijt + �ijt. (1)
where Gapijt is the running variable described above and Adams is an indicator for Adams Scholarship eligibility (Gapijt ≥ 0).22 The causal effect of winning the Adams Scholarship on an out19Our results are robust to defining the running variable as the vertical distance, the distance of each student’s ELA
score from the minimum ELA score that defines eligibility, given her school district and math score.
20Goodman (2008) characterized each student by the minimum of her scaled score distance from the proficient/advanced and top 25% thresholds. Distance to the top 25% threshold is not an easily defined quantity when
raw scores are used because the straight line boundary observed in Figure 2 becomes quite jagged. We therefore prefer
the running variable described in the text above. Estimates using the running variable as defined in Goodman (2008)
are, nonetheless, quite similar to those presented here and are available by request from the authors.
21Figure A.9 show very similar patterns for the 2005-08 sample.
22We use linear probability models here and in our later IV regressions rather than limited dependent variable models
13
come, Yijt, should be estimated by β1 if the usual assumptions underlying the validity of the regression discontinuity design are not violated. Assuming that treatment effects are homogeneous
along different parts of the eligibility threshold, this coefficient measures a local average treatment
effect for students near the threshold, weighted by the probability of a given student being near
the threshold itself (Reardon and Robinson, 2012).
Our preferred implementation uses local linear regression with an triangular kernel that weights
points near the threshold more heavily than those far from the threshold. We compute optimal bandwidths following the procedure developed by Imbens and Kalyanaraman (2012), which
trades off precision for bias generated by deviations from linearity away from the threshold.
Across nearly all of our outcomes and samples,
the optimal bandwidth generated by this procedure falls somewhere between 10 and 15 raw score points. For simplicity and ease of defining a
single sample across outcomes, we choose as our default specification a bandwidth of 12. We then
show that our results are quite robust to a wider set of bandwidths, to inclusion of demographic
controls, to inclusion of school district by cohort fixed effects, and to use of parametric specifications, including polynomials of various degrees. We cluster standard errors by 12th grade school
district in all specifications in order to account for within district correlations in the error term �ijt.
As further reassurance of the validity of the discontinuity design employed here, Table 3 tests
whether observed covariates vary discontinuously at the eligibility threshold. The first eight
columns test the basic covariates, including gender, race, low income, limited English proficiency
and special education status.
With the exception of marginally significant but small differences
in the probability of being black or “other” race for the 2005-06 sample, none of those covariates
shows a statistically significant discontinuity in either the 2005-06 or the 2005-08 sample. The
estimates are precise enough to rule out economically significant discontinuities as well. To test
whether these covariates are jointly discontinuous, we generate in columns 9 and 10 predicted
math and ELA z-scores by regressing scores from the class of 2004 on the demographic controls
listed in the previous eight columns. We then use the resulting regression estimates to predict
for the reasons discussed by Angrist (2001). In particular, we are interested in directly interpretable causal effects and
not on structural parameters generated by non-linear models.
We also note that estimates generated by probit and logit
models turn out to be extremely similar to those generated by the linear probability model above.
14
scores for students in subsequent classes. The estimates in columns 9 and 10 suggest no discontinuity in predicted test scores and the estimates are precise enough to rule out differences around
the eligibility threshold of more than 0.02 standard deviations in academic skill. Figure 4 shows
graphically the average predicted scores of students in each bin defined by distance from the eligibility threshold, confirming the lack of any clear difference in academic skill between students
just above and just below the threshold in the 2005-06 sample.23