Difference between revisions of "Publications/dehak.09.interspeech"

From LRDE

 
Line 36: Line 36:
 
Class Covariance Normalization achieved the best
 
Class Covariance Normalization achieved the best
 
performance.<nowiki>}</nowiki>
 
performance.<nowiki>}</nowiki>
<nowiki>}</nowiki>
 
 
@InProceedings<nowiki>{</nowiki> dehak.09.interspeechb,
 
author = <nowiki>{</nowiki>Pierre Dumouchel and Najim Dehak and Yazid Attabi and
 
R\'eda Dehak and Narj\`es Boufaden<nowiki>}</nowiki>,
 
title = <nowiki>{</nowiki>Cepstral and Long-Term Features for Emotion Recognition<nowiki>}</nowiki>,
 
booktitle = <nowiki>{</nowiki>Interspeech<nowiki>}</nowiki>,
 
year = 2009,
 
month = sep,
 
note = <nowiki>{</nowiki>Open Performance Sub-Challenge Prize<nowiki>}</nowiki>,
 
abstract = <nowiki>{</nowiki>In this paper, we describe systems that were developed for
 
the Open Performance Sub-Challenge of the INTERSPEECH 2009
 
Emotion Challenge. We participate to both two-class and
 
five-class emotion detection. For the two-class problem,
 
the best performance is obtained by logistic regression
 
fusion of three systems. Theses systems use short- and
 
long-term speech features. This fusion achieved an absolute
 
improvement of 2,6\% on the unweighted recall value
 
compared with [6]. For the five-class problem, we submitted
 
two individual systems: cepstral GMM vs. long-term GMM-UBM.
 
The best result comes from a cepstral GMM and produced an
 
absolute improvement of 3,5\% compared to [6].<nowiki>}</nowiki>
 
 
<nowiki>}</nowiki>
 
<nowiki>}</nowiki>
   

Latest revision as of 00:09, 17 July 2016

Abstract

This paper presents a new speaker verification system architecture based on Joint Factor Analysis (JFA) as feature extractor. In this modeling, the JFA is used to define a new low-dimensional space named the total variability factor space, instead of both channel and speaker variability spaces for the classical JFA. The main contribution in this approach, is the use of the cosine kernel in the new total factor space to design two different systems: the first system is Support Vector Machines based, and the second one uses directly this kernel as a decision score. This last scoring method makes the process faster and less computation complex compared to others classical methods. We tested several intersession compensation methods in total factors, and we found that the combination of Linear Discriminate Analysis and Within Class Covariance Normalization achieved the best performance.


Bibtex (lrde.bib)

@InProceedings{	  dehak.09.interspeech,
  author	= {Najim Dehak and R\'eda Dehak and Patrick Kenny and Niko
		  Brummer and Pierre Ouellet and Pierre Dumouchel},
  title		= {Support Vector Machines versus Fast Scoring in the
		  Low-Dimensional Total Variability Space for Speaker
		  Verification},
  booktitle	= {Interspeech},
  year		= 2009,
  month		= sep,
  abstract	= {This paper presents a new speaker verification system
		  architecture based on Joint Factor Analysis (JFA) as
		  feature extractor. In this modeling, the JFA is used to
		  define a new low-dimensional space named the total
		  variability factor space, instead of both channel and
		  speaker variability spaces for the classical JFA. The main
		  contribution in this approach, is the use of the cosine
		  kernel in the new total factor space to design two
		  different systems: the first system is Support Vector
		  Machines based, and the second one uses directly this
		  kernel as a decision score. This last scoring method makes
		  the process faster and less computation complex compared to
		  others classical methods. We tested several intersession
		  compensation methods in total factors, and we found that
		  the combination of Linear Discriminate Analysis and Within
		  Class Covariance Normalization achieved the best
		  performance.}
}