First Attempt at Boltzmann Machines for Speaker Recognition

From LRDE

Revision as of 12:15, 26 April 2016 by Bot (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Abstract

Frequently organized by NIST, Speaker Recognition evaluations (SRE) show high accuracy rates. This demonstrates that this field of research is mature. The latest progresses came from the proposition of low dimensional i-vectors representation and new classifiers such as Probabilistic Linear Discriminant Analysis (PLDA) or Cosine Distance classifier. In this paper, we study some variants of Boltzmann Machines (BM). BM is used in image processing but still unexplored in Speaker Verification (SR). Given two utterances, the SR task consists to decide whether they come from the same speaker or not. Based on this definition, we can illustrate SR as two-classes (same vs. different speakers classes) classification problem. Our first attempt of using BM is to model each class with one generative Restricted Boltzmann Machine (RBM) with symmetric Log-Likelihood Ratio on both models as decision score. This new approach achieved an Equal Error Rate (EER) of 7% and a minimum Detection Cost Function (DCF) of 0.035 on the female content of the NIST SRE 2008. The objective of this research is mainly to explore a new paradigm i.e. BM without necessarily obtaining better performance than the state-of-the-art system.


Bibtex (lrde.bib)

@InProceedings{	  sennoussaoui.12.odyssey,
  author	= {M. Sennoussaoui and Najim Dehak and P. Kenny and R\'eda
		  Dehak and P. Dumouchel},
  title		= {First Attempt at {Boltzmann} Machines for Speaker
		  Recognition},
  booktitle	= {Odyssey Speaker and Language Recognition Workshop},
  year		= 2012,
  address	= {Singapore},
  month		= jun,
  abstract	= {Frequently organized by NIST, Speaker Recognition
		  evaluations (SRE) show high accuracy rates. This
		  demonstrates that this field of research is mature. The
		  latest progresses came from the proposition of low
		  dimensional i-vectors representation and new classifiers
		  such as Probabilistic Linear Discriminant Analysis (PLDA)
		  or Cosine Distance classifier. In this paper, we study some
		  variants of Boltzmann Machines (BM). BM is used in image
		  processing but still unexplored in Speaker Verification
		  (SR). Given two utterances, the SR task consists to decide
		  whether they come from the same speaker or not. Based on
		  this definition, we can illustrate SR as two-classes (same
		  vs. different speakers classes) classification problem. Our
		  first attempt of using BM is to model each class with one
		  generative Restricted Boltzmann Machine (RBM) with
		  symmetric Log-Likelihood Ratio on both models as decision
		  score. This new approach achieved an Equal Error Rate (EER)
		  of 7\% and a minimum Detection Cost Function (DCF) of 0.035
		  on the female content of the NIST SRE 2008. The objective
		  of this research is mainly to explore a new paradigm i.e.
		  BM without necessarily obtaining better performance than
		  the state-of-the-art system.}
}