Time Delay Neural Networks-Based Universal Background Model for Speaker Recognition

From LRDE

Revision as of 18:05, 9 January 2018 by Bot (talk | contribs) (Created page with "{{CSIReport | authors = Valentin Iovene | title = Time Delay Neural Networks-Based Universal Background Model for Speaker Recognition | year = 2017 | number = 1703 | abstract ...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Abstract

In speaker recognition, deep neural networks (DNN) have recently proved to be more efficient than traditional gaussian mixture models (GMM) for collecting Baum-Welch statistics that can be used for i-vector extraction. However, this type of architecture can be too slow at evaluation time, requiring a GPU to achieve real-time performance. We show how triphone posteriors produced by a time delay neural network (TDNN) can be used to create a more lightweight supervised GMM serving as a universal background model (UBM) inside the i-vector framework. The equal error rate (EER) obtained with this approach is compared to those obtained with traditional GMM-based UBM.