Guaranteed validity for empirical approaches to adaptive data analysis
Rogers, Ryan; Roth, Aaron; Smith, Adam; Srebro, Nathan; Thakkar, Om; Woodworth, Blake
We design a general framework for answering
adaptive statistical queries that focuses on
providing explicit confidence intervals along
with point estimates. Prior work in this area
has either focused on providing tight confidence
intervals for specific analyses, or providing
general worst-case bounds for point estimates.
Unfortunately, as we observe, these
worst-case bounds are loose in many settings— often not even beating simple baselines like
sample splitting. Our main contribution is
to design a framework for providing valid,
instance-specific confidence intervals for point
estimates that can be generated by heuristics.
When paired with good heuristics, this
method gives guarantees that are orders of
magnitude better than the best worst-case
bounds. We provide a Python library implementing
our method.
↧