Unsupervised Methods for Speaker Diarization: An Integrated and Iterative Approach

From LRDE

Revision as of 17:57, 4 January 2018 by Bot (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Abstract

In speaker diarization, standard approaches typically perform speaker clustering on some initial segmentation before refining the segment boundaries in a re-segmentation step to obtain a final diarization hypothesis. In this paper, we integrate an improved clustering method with an existing re-segmentation algorithm and, in iterative fashion, optimize both speaker cluster assignments and segmentation boundaries jointly. For clustering, we extend our previous research using factor analysis for speaker modeling. In continuing to take advantage of the effectiveness of factor analysis as a front-end for extracting speaker-specific features (i.e., i-vectors), we develop a probabilistic approach to speaker clustering by applying a Bayesian Gaussian Mixture Model (GMM) to principal component analysis (PCA)-processed i-vectors. We then utilize information at different temporal resolutions to arrive at an iterative optimization scheme that, in alternating between clustering and re-segmentation stepsdemonstrates the ability to improve both speaker cluster assignments and segmentation boundaries in an unsupervised manner. Our proposed methods attain results that are comparable to those of a state-of-the-art benchmark set on the multi-speaker CallHome telephone corpus. We further compare our system with a Bayesian nonparametric approach to diarization and attempt to reconcile their differences in both methodology and performance.


Bibtex (lrde.bib)

@Article{	  shum.13.taslp,
  author	= {S. Shum and Najim Dehak and R\'eda Dehak and J. Glass},
  title		= {Unsupervised Methods for Speaker Diarization: An
		  Integrated and Iterative Approach},
  journal	= {IEEE Transactions on Audio, Speech, and Language
		  Processing},
  year		= 2013,
  volume	= 21,
  number	= 10,
  pages		= {2015--2028},
  month		= oct,
  abstract	= {In speaker diarization, standard approaches typically
		  perform speaker clustering on some initial segmentation
		  before refining the segment boundaries in a re-segmentation
		  step to obtain a final diarization hypothesis. In this
		  paper, we integrate an improved clustering method with an
		  existing re-segmentation algorithm and, in iterative
		  fashion, optimize both speaker cluster assignments and
		  segmentation boundaries jointly. For clustering, we extend
		  our previous research using factor analysis for speaker
		  modeling. In continuing to take advantage of the
		  effectiveness of factor analysis as a front-end for
		  extracting speaker-specific features (i.e., i-vectors), we
		  develop a probabilistic approach to speaker clustering by
		  applying a Bayesian Gaussian Mixture Model (GMM) to
		  principal component analysis (PCA)-processed i-vectors. We
		  then utilize information at different temporal resolutions
		  to arrive at an iterative optimization scheme that, in
		  alternating between clustering and re-segmentation steps,
		  demonstrates the ability to improve both speaker cluster
		  assignments and segmentation boundaries in an unsupervised
		  manner. Our proposed methods attain results that are
		  comparable to those of a state-of-the-art benchmark set on
		  the multi-speaker CallHome telephone corpus. We further
		  compare our system with a Bayesian nonparametric approach
		  to diarization and attempt to reconcile their differences
		  in both methodology and performance.}
}