Similarity Learning for High-Dimensional Sparse Data

Liu, Kuan; Bellet, Aurélien; Sha, Fei

Computer Science > Machine Learning

arXiv:1411.2374v1 (cs)

[Submitted on 10 Nov 2014 (this version), latest version 9 Sep 2019 (v3)]

Title:Similarity Learning for High-Dimensional Sparse Data

Authors:Kuan Liu, Aurélien Bellet, Fei Sha

View PDF

Abstract:A good measure of similarity between data points is crucial to the performance of many machine learning, data mining and information retrieval tasks. Metric and similarity learning methods have emerged as a powerful way to automatically learn them from data, but they do not scale well with the feature dimensionality. In this paper, we propose a similarity learning method that can efficiently deal with high-dimensional sparse data. This is done by assuming that the similarity parameters are decomposable as a convex combination of rank-one matrices with a specific sparsity structure, together with the use of an approximate Frank-Wolfe procedure to learn these parameters based on relative similarity constraints on the training data. Our algorithm greedily incorporates one pair of features at a time into the similarity, providing an efficient way to control the number of active features and thus reduce overfitting. It enjoys strong convergence guarantees and its time and memory complexity depends on the sparsity of the data instead of the dimension of the feature space. Our experiments on real-world high-dimensional datasets demonstrate its potential for classification, dimensionality reduction and data exploration.

Comments:	14 pages
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:1411.2374 [cs.LG]
	(or arXiv:1411.2374v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1411.2374

Submission history

From: Aurélien Bellet [view email]
[v1] Mon, 10 Nov 2014 10:40:47 UTC (843 KB)
[v2] Wed, 21 Oct 2015 13:45:00 UTC (870 KB)
[v3] Mon, 9 Sep 2019 16:53:40 UTC (870 KB)

Computer Science > Machine Learning

Title:Similarity Learning for High-Dimensional Sparse Data

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Similarity Learning for High-Dimensional Sparse Data

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators