SynthEval: A Framework for Detailed Utility and Privacy Evaluation of Tabular Synthetic Data

Lautrup, Anton Danholt; Hyrup, Tobias; Zimek, Arthur; Schneider-Kamp, Peter

Computer Science > Machine Learning

arXiv:2404.15821 (cs)

[Submitted on 24 Apr 2024]

Title:SynthEval: A Framework for Detailed Utility and Privacy Evaluation of Tabular Synthetic Data

Authors:Anton Danholt Lautrup, Tobias Hyrup, Arthur Zimek, Peter Schneider-Kamp

View PDF HTML (experimental)

Abstract:With the growing demand for synthetic data to address contemporary issues in machine learning, such as data scarcity, data fairness, and data privacy, having robust tools for assessing the utility and potential privacy risks of such data becomes crucial. SynthEval, a novel open-source evaluation framework distinguishes itself from existing tools by treating categorical and numerical attributes with equal care, without assuming any special kind of preprocessing steps. This~makes it applicable to virtually any synthetic dataset of tabular records. Our tool leverages statistical and machine learning techniques to comprehensively evaluate synthetic data fidelity and privacy-preserving integrity. SynthEval integrates a wide selection of metrics that can be used independently or in highly customisable benchmark configurations, and can easily be extended with additional metrics. In this paper, we describe SynthEval and illustrate its versatility with examples. The framework facilitates better benchmarking and more consistent comparisons of model capabilities.

Subjects:	Machine Learning (cs.LG); Performance (cs.PF)
Cite as:	arXiv:2404.15821 [cs.LG]
	(or arXiv:2404.15821v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2404.15821

Submission history

From: Anton D. Lautrup [view email]
[v1] Wed, 24 Apr 2024 11:49:09 UTC (1,565 KB)

Computer Science > Machine Learning

Title:SynthEval: A Framework for Detailed Utility and Privacy Evaluation of Tabular Synthetic Data

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:SynthEval: A Framework for Detailed Utility and Privacy Evaluation of Tabular Synthetic Data

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators