Category:Application of HSR to ASR

From CNBH Acoustic Scale Wiki

Jump to: navigation, search

← Introduction to the content of the wiki

Perceptual experiments with communication sounds show what everyone intuitively knows; auditory perception is singularly robust to changes in both the resonance rate and the pulse rate of a communication sound Smith et al. (2005), Smith and Patterson (2005), Ives et al. (2005), van Dinther and Patterson (2006), Smith et al. (2007). The experiments show that we have no difficulty whatsoever understanding when a child and an adult have spoken the same speech sounds (syllables or words), despite substantial differences in pulse rate and resonance rate of the waves carrying the message. We also know which speaker has the higher pitch and which speaker is bigger (i.e., which speaker has the longer vocal tract). Perceptual experiments have been performed with vowels Smith et al. (2005) syllables Ives et al. (2005), musical notes van Dinther and Patterson (2006) and animal calls; they all lead to the conclusion that auditory perception is singularly robust to the scale variability in communication sounds. It is also the case that the robustness of human perception extends to speech sounds and musical sounds scaled well beyond the range of normal experience Smith et al. (2005), van Dinther and Patterson (2006), which suggests that the robustness is based on automatic adaptation or normalization mechanisms rather than learning. A description of how the auditory system might perform the necessary normalization is presented in The robustness of bio-acoustic communication and the role of normalization.

The robustness of auditory perception stands in contrast to the lack of robustness in mechanical speech recognition systems; a speech recognizer trained on the speech of a man is typically not able to recognize the speech of a woman, let alone the speech of a child. Thus, the robustness which we take for granted and think of as trivial, poses a very difficult problem if it is left to the recognition system that follows the pre-processor to learn about pulse rate and resonance rate variability from a time-frequency representation like the spectrogram.

The category HSR for ASR focuses on the application of knowledge about Human Speech Recognition (HSR) to Automatic Speech Recognition (ASR).

Excerpts from published papers

Low-Dimensional, Auditory Feature Vectors that Improve VTL Normalization in Automatic Speech Recognition

Auditory features that improve VTL normalization in automatic speech recognition

Research projects

Scale-Covariant Features for Automatic Speech Recognition

Published papers for the Category: Application of HSR to ASR

Comparing the Robustness of HSR and ASR: Monaghan et al. (2008)

References

Ives, D.T., Smith, D.R.R. and Patterson, R.D. (2005). “Discrimination of speaker size from syllable phrases.” J. Acoust. Soc. Am., 118, p.3816-3822. [1] [2]
Monaghan, J.J., Feldbauer, C., Walters, T.C. and Patterson, R.D. (2008). “Low-dimensional, auditory feature vectors that improve vocal-tract-length normalization in automatic speech recognition.” J. Acoust. Soc. Am., 123, p.3066. [1]
Smith, D.R.R., Patterson, R.D., Turner, R.E., Kawahara, H. and Irino, T. (2005). “The processing and perception of size information in speech sounds.” J. Acoust. Soc. Am., 117, p.305-318. [1] [2] [3]
Smith, D.R.R., Walters, T.C. and Patterson, R.D. (2007). “Discrimination of speaker sex and size when glottal-pulse rate and vocal-tract length are controlled.” J. Acoust. Soc. Am., 122, p.3628-3639. [1]
Smith, D.R.R. and Patterson, R.D. (2005). “The interaction of glottal-pulse rate and vocal-tract length in judgements of speaker size, sex, and age.” J. Acoust. Soc. Am., 118, p.3177-3186. [1]
van Dinther, R. and Patterson, R.D. (2006). “Perception of acoustic scale and size in musical instrument sounds.” J. Acoust. Soc. Am., 120, p.2158-76. [1] [2] [3]

Pages in category "Application of HSR to ASR"

The following 4 pages are in this category, out of 4 total.

E

L

Low-Dimensional, Auditory Feature Vectors that Improve VTL Normalization in Automatic Speech Recognition

Category:Application of HSR to ASR

From CNBH Acoustic Scale Wiki

Contents

Project Reports

Establishing Norms for the Robustness of Automatic Speech Recognition

Establishing Norms for the Robustness of Human Speech Recognition

Estimating Vocal Tract Length from a Stream of Vowel Sounds

Excerpts from published papers

Low-Dimensional, Auditory Feature Vectors that Improve VTL Normalization in Automatic Speech Recognition

Research projects

Scale-Covariant Features for Automatic Speech Recognition

Published papers for the Category: Application of HSR to ASR

References

Pages in category "Application of HSR to ASR"

E

L

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

links