On the Reduction of Biases in Big Data Sets for the Detection of Irregular Power Usage

Glauner, Patrick; State, Radu; Valtchev, Petko; Duarte, Diogo

Computer Science > Machine Learning

arXiv:1801.05627 (cs)

[Submitted on 17 Jan 2018 (v1), last revised 3 Apr 2018 (this version, v2)]

Title:On the Reduction of Biases in Big Data Sets for the Detection of Irregular Power Usage

Authors:Patrick Glauner, Radu State, Petko Valtchev, Diogo Duarte

View PDF

Abstract:In machine learning, a bias occurs whenever training sets are not representative for the test data, which results in unreliable models. The most common biases in data are arguably class imbalance and covariate shift. In this work, we aim to shed light on this topic in order to increase the overall attention to this issue in the field of machine learning. We propose a scalable novel framework for reducing multiple biases in high-dimensional data sets in order to train more reliable predictors. We apply our methodology to the detection of irregular power usage from real, noisy industrial data. In emerging markets, irregular power usage, and electricity theft in particular, may range up to 40% of the total electricity distributed. Biased data sets are of particular issue in this domain. We show that reducing these biases increases the accuracy of the trained predictors. Our models have the potential to generate significant economic value in a real world application, as they are being deployed in a commercial software for the detection of irregular power usage.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:1801.05627 [cs.LG]
	(or arXiv:1801.05627v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1801.05627
Journal reference:	Proceedings of the 13th International FLINS Conference on Data Science and Knowledge Engineering for Sensing Decision Support (FLINS 2018)

Submission history

From: Patrick Glauner [view email]
[v1] Wed, 17 Jan 2018 11:48:18 UTC (26 KB)
[v2] Tue, 3 Apr 2018 09:06:42 UTC (26 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 1801

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Patrick O. Glauner
Radu State
Petko Valtchev
Diogo Duarte

export BibTeX citation

Computer Science > Machine Learning

Title:On the Reduction of Biases in Big Data Sets for the Detection of Irregular Power Usage

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:On the Reduction of Biases in Big Data Sets for the Detection of Irregular Power Usage

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators