Neural PLDA Modeling for End-to-End Speaker Verification

Ramoji, Shreyas; Krishnan, Prashant; Ganapathy, Sriram

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2008.04527 (eess)

[Submitted on 11 Aug 2020]

Title:Neural PLDA Modeling for End-to-End Speaker Verification

Authors:Shreyas Ramoji, Prashant Krishnan, Sriram Ganapathy

View PDF

Abstract:While deep learning models have made significant advances in supervised classification problems, the application of these models for out-of-set verification tasks like speaker recognition has been limited to deriving feature embeddings. The state-of-the-art x-vector PLDA based speaker verification systems use a generative model based on probabilistic linear discriminant analysis (PLDA) for computing the verification score. Recently, we had proposed a neural network approach for backend modeling in speaker verification called the neural PLDA (NPLDA) where the likelihood ratio score of the generative PLDA model is posed as a discriminative similarity function and the learnable parameters of the score function are optimized using a verification cost. In this paper, we extend this work to achieve joint optimization of the embedding neural network (x-vector network) with the NPLDA network in an end-to-end (E2E) fashion. This proposed end-to-end model is optimized directly from the acoustic features with a verification cost function and during testing, the model directly outputs the likelihood ratio score. With various experiments using the NIST speaker recognition evaluation (SRE) 2018 and 2019 datasets, we show that the proposed E2E model improves significantly over the x-vector PLDA baseline speaker verification system.

Comments:	Accepted in Interspeech 2020. GitHub Implementation Repos: this https URL and this https URL
Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
Cite as:	arXiv:2008.04527 [eess.AS]
	(or arXiv:2008.04527v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2008.04527

Submission history

From: Shreyas Ramoji [view email]
[v1] Tue, 11 Aug 2020 05:54:54 UTC (312 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Neural PLDA Modeling for End-to-End Speaker Verification

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Neural PLDA Modeling for End-to-End Speaker Verification

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators