Time Delay Neural Networks-Based Universal Background Model for Speaker Recognition

From LRDE

The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Abstract

In speaker recognition, deep neural networks (DNN) have recently proved to be more efficient than traditional gaussian mixture models (GMM) for collecting Baum-Welch statistics that can be used for i-vector extraction. However, this type of architecture can be too slow at evaluation time, requiring a GPU to achieve real-time performance. We show how triphone posteriors produced by a time delay neural network (TDNN) can be used to create a more lightweight supervised GMM serving as a universal background model (UBM) inside the i-vector framework. The equal error rate (EER) obtained with this approach is compared to those obtained with traditional GMM-based UBM.