Kernel Combination for SVM Speaker Verification

From LRDE

Abstract

We present a new approach for constructing the kernels used to build support vector machines for speaker verification. The idea is to construct new kernels by taking linear combination of many kernels such as the GLDS and GMM supervector kernels. In this new kernel combination, the combination weights are speaker dependent rather than universal weights on score level fusion and there is no need for extra-data to estimate them. An experiment on the NIST 2006 speaker recognition evaluation dataset (all trial) was done using three different kernel functions (GLDS kernel, linear and Gaussian GMM supervector kernels). We compared our kernel combination to the optimal linear score fusion obtained using logistic regression. This optimal score fusion was trained on the same test data. We had an equal error rate of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \simeq 5,9\%} using the kernel combination technique which is better than the optimal score fusion system (Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \simeq 6,0\%} ).


Bibtex (lrde.bib)

@InProceedings{	  dehak.08.odysseya,
  author	= {R\'eda Dehak and Najim Dehak and Patrick Kenny and Pierre
		  Dumouchel},
  title		= {Kernel Combination for {SVM} Speaker Verification},
  booktitle	= {Proceedings of the Speaker and Language Recognition
		  Workshop (IEEE-Odyssey 2008)},
  year		= 2008,
  address	= {Stellenbosch, South Africa},
  month		= jan,
  abstract	= {We present a new approach for constructing the kernels
		  used to build support vector machines for speaker
		  verification. The idea is to construct new kernels by
		  taking linear combination of many kernels such as the GLDS
		  and GMM supervector kernels. In this new kernel
		  combination, the combination weights are speaker dependent
		  rather than universal weights on score level fusion and
		  there is no need for extra-data to estimate them. An
		  experiment on the NIST 2006 speaker recognition evaluation
		  dataset (all trial) was done using three different kernel
		  functions (GLDS kernel, linear and Gaussian GMM supervector
		  kernels). We compared our kernel combination to the optimal
		  linear score fusion obtained using logistic regression.
		  This optimal score fusion was trained on the same test
		  data. We had an equal error rate of $\simeq 5,9\%$ using
		  the kernel combination technique which is better than the
		  optimal score fusion system ($\simeq 6,0\%$).}
}