Membership inference attacks (MIAs) aim to determine whether a specific
sample was used to train a predictive model. Knowing this may indeed lead to a
privacy breach. Most MIAs, however, make use of the model’s prediction scores –
the probability of each output given some input – following the intuition that
the trained model tends to behave differently on its training data. We argue
that this is a fallacy for many modern deep network architectures.
Consequently, MIAs will miserably fail since overconfidence leads to high
false-positive rates not only on known domains but also on out-of-distribution
data and implicitly acts as a defense against MIAs. Specifically, using
generative adversarial networks, we are able to produce a potentially infinite
number of samples falsely classified as part of the training data. In other
words, the threat of MIAs is overestimated, and less information is leaked than
previously assumed. Moreover, there is actually a trade-off between the
overconfidence of models and their susceptibility to MIAs: the more classifiers
know when they do not know, making low confidence predictions, the more they
reveal the training data.

By admin