Differentially private simple linear regression
Alabi, Daniel; McMillan, Audra; Sarathy, Jayshree; Smith, Adam; Vadhan, Salil
Economics and social science research often
require analyzing datasets of sensitive personal information
at fine granularity, with models fit to small subsets
of the data. Unfortunately, such fine-grained analysis
can easily reveal sensitive individual information. We
study regression algorithms that satisfy differential privacy,
a constraint which guarantees that an algorithm’s
output reveals little about any individual input data
record, even to an attacker with side information about
the dataset. Motivated by the Opportunity Atlas, a highprofile,
small-area analysis tool in economics research,
we perform a thorough experimental evaluation of differentially
private algorithms for simple linear regression
on small datasets with tens to hundreds of records—a
particularly challenging regime for differential privacy.
In contrast, prior work on differentially private linear
regression focused on multivariate linear regression on
large datasets or asymptotic analysis. Through a range
of experiments, we identify key factors that affect the
relative performance of the algorithms. We find that algorithms
based on robust estimators—in particular, the
median-based estimator of Theil and Sen—perform best
on small datasets (e.g., hundreds of datapoints), while
algorithms based on Ordinary Least Squares or Gradient
Descent perform better for large datasets. However,
we also discuss regimes in which this general finding
does not hold. Notably, the differentially private analogues
of Theil–Sen (one of which was suggested in a
theoretical work of Dwork and Lei) have not been studied
in any prior experimental work on differentially private
linear regression.
↧