Category:Application of HSR to ASR

From CNBH Acoustic Scale Wiki

Jump to: navigation, search
Introduction to the content of the wiki

Perceptual experiments with communication sounds show what everyone intuitively knows; auditory perception is singularly robust to changes in both the resonance rate and the pulse rate of a communication sound Smith et al. (2005), Smith and Patterson (2005), Ives et al. (2005), van Dinther and Patterson (2006), Smith et al. (2007). The experiments show that we have no difficulty whatsoever understanding when a child and an adult have spoken the same speech sounds (syllables or words), despite substantial differences in pulse rate and resonance rate of the waves carrying the message. We also know which speaker has the higher pitch and which speaker is bigger (i.e., which speaker has the longer vocal tract). Perceptual experiments have been performed with vowels Smith et al. (2005) syllables Ives et al. (2005), musical notes van Dinther and Patterson (2006) and animal calls; they all lead to the conclusion that auditory perception is singularly robust to the scale variability in communication sounds. It is also the case that the robustness of human perception extends to speech sounds and musical sounds scaled well beyond the range of normal experience Smith et al. (2005), van Dinther and Patterson (2006), which suggests that the robustness is based on automatic adaptation or normalization mechanisms rather than learning. A description of how the auditory system might perform the necessary normalization is presented in The robustness of bio-acoustic communication and the role of normalization.

The robustness of auditory perception stands in contrast to the lack of robustness in mechanical speech recognition systems; a speech recognizer trained on the speech of a man is typically not able to recognize the speech of a woman, let alone the speech of a child. Thus, the robustness which we take for granted and think of as trivial, poses a very difficult problem if it is left to the recognition system that follows the pre-processor to learn about pulse rate and resonance rate variability from a time-frequency representation like the spectrogram.

The category HSR for ASR focuses on the application of knowledge about Human Speech Recognition (HSR) to Automatic Speech Recognition (ASR).


Project Reports

Establishing Norms for the Robustness of Automatic Speech Recognition

Performance deteriorates as the VTL of the speaker deviates from that of the training speaker

Establishing Norms for the Robustness of Human Speech Recognition

Performance is largely unaffected when the VTL of the speaker differs from that of the training speaker

Estimating Vocal Tract Length from a Stream of Vowel Sounds

Estimates of vocal tract length derived from individual vowel Sounds

Excerpts from published papers

Low-Dimensional, Auditory Feature Vectors that Improve VTL Normalization in Automatic Speech Recognition

Auditory features that improve VTL normalization in automatic speech recognition

Research projects

Scale-Covariant Features for Automatic Speech Recognition

Published papers for the Category: Application of HSR to ASR

Comparing the Robustness of HSR and ASR: Monaghan et al. (2008)


Personal tools