Mel Frequency Cepstral Coefficients based Speaker Diarization

From LRDE

Revision as of 16:39, 25 July 2018 by Fanny Riols (talk | contribs)

Abstract

Speaker diarization has emerged as an increasingly important and dedicated domain of speech research. It relates to the problem of determining "who spoke when?". It means that we would like to find the intervals during which each speaker is active. By computing the Mel Frequency Cepstral Coefficients (MFCC) features from a given speech signal and using the Independent Component Analysis (ICA) on these features, we are able to segment the speech with the help of a Hidden Markov Model (HMM) and Gaussian Mixture Model (GMM). We will use this algorithm for speaker diarization in verification system, with multi-speaker audio data, such as interview of microphone segment of NIST Speaker Recognition Evaluation.