Kernel Combination for SVM Speaker Verification
From LRDE
- Authors
- Réda Dehak, Najim Dehak, Patrick Kenny, Pierre Dumouchel
- Where
- Proceedings of the Speaker and Language Recognition Workshop (IEEE-Odyssey 2008)
- Place
- Stellenbosch, South Africa
- Type
- inproceedings
- Date
- 2007-09-25
Abstract
We present a new approach for constructing the kernels used to build support vector machines for speaker verification. The idea is to construct new kernels by taking linear combination of many kernels such as the GLDS and GMM supervector kernels. In this new kernel combination, the combination weights are speaker dependent rather than universal weights on score level fusion and there is no need for extra-data to estimate them. An experiment on the NIST 2006 speaker recognition evaluation dataset (all trial) was done using three different kernel functions (GLDS kernel, linear and Gaussian GMM supervector kernels). We compared our kernel combination to the optimal linear score fusion obtained using logistic regression. This optimal score fusion was trained on the same test data. We had an equal error rate of using the kernel combination technique which is better than the optimal score fusion system ().
Bibtex (lrde.bib)
@InProceedings{ dehak.08.odysseya, author = {R\'eda Dehak and Najim Dehak and Patrick Kenny and Pierre Dumouchel}, title = {Kernel Combination for {SVM} Speaker Verification}, booktitle = {Proceedings of the Speaker and Language Recognition Workshop (IEEE-Odyssey 2008)}, year = 2008, address = {Stellenbosch, South Africa}, month = jan, abstract = {We present a new approach for constructing the kernels used to build support vector machines for speaker verification. The idea is to construct new kernels by taking linear combination of many kernels such as the GLDS and GMM supervector kernels. In this new kernel combination, the combination weights are speaker dependent rather than universal weights on score level fusion and there is no need for extra-data to estimate them. An experiment on the NIST 2006 speaker recognition evaluation dataset (all trial) was done using three different kernel functions (GLDS kernel, linear and Gaussian GMM supervector kernels). We compared our kernel combination to the optimal linear score fusion obtained using logistic regression. This optimal score fusion was trained on the same test data. We had an equal error rate of $\simeq 5,9\%$ using the kernel combination technique which is better than the optimal score fusion system ($\simeq 6,0\%$).} }