Machine Learning Models that Remember Too Much

Song, Congzheng; Ristenpart, Thomas; Shmatikov, Vitaly

Computer Science > Cryptography and Security

arXiv:1709.07886 (cs)

[Submitted on 22 Sep 2017]

Title:Machine Learning Models that Remember Too Much

Authors:Congzheng Song, Thomas Ristenpart, Vitaly Shmatikov

View PDF

Abstract:Machine learning (ML) is becoming a commodity. Numerous ML frameworks and services are available to data holders who are not ML experts but want to train predictive models on their data. It is important that ML models trained on sensitive inputs (e.g., personal images or documents) not leak too much information about the training data.
We consider a malicious ML provider who supplies model-training code to the data holder, does not observe the training, but then obtains white- or black-box access to the resulting model. In this setting, we design and implement practical algorithms, some of them very similar to standard ML techniques such as regularization and data augmentation, that "memorize" information about the training dataset in the model yet the model is as accurate and predictive as a conventionally trained model. We then explain how the adversary can extract memorized information from the model.
We evaluate our techniques on standard ML tasks for image classification (CIFAR10), face recognition (LFW and FaceScrub), and text analysis (20 Newsgroups and IMDB). In all cases, we show how our algorithms create models that have high predictive power yet allow accurate extraction of subsets of their training data.

Subjects:	Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Cite as:	arXiv:1709.07886 [cs.CR]
	(or arXiv:1709.07886v1 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.1709.07886

Submission history

From: Congzheng Song [view email]
[v1] Fri, 22 Sep 2017 18:00:19 UTC (4,143 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CR

< prev | next >

new | recent | 1709

Change to browse by:

cs
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Congzheng Song
Thomas Ristenpart
Vitaly Shmatikov

export BibTeX citation

Computer Science > Cryptography and Security

Title:Machine Learning Models that Remember Too Much

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:Machine Learning Models that Remember Too Much

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators