When is memorization of irrelevant training data necessary for high-accuracy learning?
Brown, Gavin; Bun, Mark; Feldman, Vitaly; Smith, Adam; Talwar, Kunal
Modern machine learning models are complex and frequently encode surprising amounts
of information about individual inputs. In extreme cases, complex models appear to
memorize entire input examples, including seemingly irrelevant information (social security
numbers from text, for example). In this paper, we aim to understand whether this
sort of memorization is necessary for accurate learning. We describe natural prediction
problems in which every sufficiently accurate training algorithm must encode, in the prediction
model, essentially all the information about a large subset of its training examples.
This remains true even when the examples are high-dimensional and have entropy much
higher than the sample size, and even when most of that information is ultimately irrelevant
to the task at hand. Further, our results do not depend on the training algorithm
or the class of models used for learning.
Our problems are simple and fairly natural variants of the next-symbol prediction
and the cluster labeling tasks. These tasks can be seen as abstractions of image- and
text-related prediction problems. To establish our results, we reduce from a family of
one-way communication problems for which we prove new information complexity lower
bounds.
↧