Scale-Covariant Features for Automatic Speech Recognition

From CNBH Acoustic Scale Wiki

Jump to: navigation, search

MFCC values for vowels produced with different vocal tract lengths, in a three-dimensional space of three of the MFCC coefficient values. The values for the different vowels do not separate well.

The clustering of auditory features for ASR in a three-dimensional space of the weights of the three gaussians. The clusters for different vowels are well separated.

Vowel spectra from a man and a child, and their MFCC reconstruction. The MFCC reconstruction is good, but the individual MFCC values are not scale-shift covariant.

Gaussians fitted to the spectral distribution from an auditory filterbank in response to the vowel /i/ from different-sized speakers. The positions of the gaussians shift, but their amplitudes remain the same.

Scale-Covariant Features for Automatic Speech Recognition

From CNBH Acoustic Scale Wiki

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

links