Mel Frequency Cepstral Coefficients based Speaker Diarization

Authors: Fanny Riols
Type: techreport
Year: 2016
Number: 1515

Abstract

Speaker diarization has emerged as an increasingly important and dedicated domain of speech research. It relates to the problem of determining "who spoke when?". It means that we would like to find the intervals during which each speaker is active. By computing the Mel Frequency Cepstral Coefficients (MFCC) features from a given speech signal and using the Independent Component Analysis (ICA) on these features, we are able to segment the speech with the help of a Hidden Markov Model (HMM) and Gaussian Mixture Model (GMM). We will use this algorithm for speaker diarization in verification system, with multi-speaker audio data, such as interview of microphone segment of NIST Speaker Recognition Evaluation.

Mel Frequency Cepstral Coefficients based Speaker Diarization

From LRDE

Abstract