Category:Auditory Processing of Communication Sounds
From CNBH Acoustic Scale Wiki
Auditory perceptions are constructed in the brain from sounds entering the ear canal, in conjunction with current context and information from memory. It is not possible to make direct measurements of perceptions, so all descriptions of perceptions involve explicit, or implicit, models of how perceptions are constructed. The category Auditory Processing of Communication Sounds focuses on how the auditory system might construct your initial experience of a sound, referred to as the 'auditory image'. It describes a computational model of how the construction might be accomplished -- the Auditory Image Model (AIM). The category Perception of Communication Sounds focuses on the structures that appear in the auditory image and how we perceive them. These categories are intended to work as a pair, with the reader going back and forth as their interest shifts back and forth from the perceptions themselves and how the auditory system might construct our perceptions.
Roy Patterson , Tom Walters
Contents |
Introduction
In this auditory processing category, it is assumed that the sub-cortical auditory system creates a perceptual space, in which an initial auditory image of a sound is assembled by the cochlea and mid-brain using largely data-driven processes. The auditory image and the space it occupies is analogous to the visual image and space that appear when you open your eyes in the morning. If the sound arriving at the ears is a noise, the auditory image is filled with activity, but it lacks organization and the details are continually fluctuating. If the sound has a pulse-resonance form, an auditory figure appears in the auditory image with an elaborate structure that reflects the phase-locked neural firing pattern produced by the sound in the cochlea. Extended segments of sound, like syllables or musical notes, cause auditory figures to emerge, evolve, and decay in what might be referred to as auditory events. All of the processing up to the level of auditory figures and events can proceed without the need of top-down processing associated with context or attention. For example, if you are presented with the call of a new animal that you have never encountered before, the early stages of auditory processing will still produce an auditory event for the sound, even though you have no context for the sound and might be puzzled by the event. It also seems likely that the initial stages of processing operate as normal when you are asleep, so auditory figures and events are produced and exist in the auditory pathway when you are asleep. The Auditory Image Model is intended to simulate the neural processing involving in constructing our initial auditory images of sounds without reference to the context in which they occur or our memory of similar events.
The main focus of the category Auditory Processing of Communication Sounds is the form of our initial auditory images of sounds: how they are constructed from the neural activity pattern (NAP) flowing from the cochlea, and the events that arise in this space of auditory perception in response to communication sounds. It appears that the space of auditory perception is rather different from the {time, frequency} space normally used to represent speech and musical sounds, and the auditory events that appear in the auditory space are very different from the smooth energy envelopes that represent events in the spectrogram. Briefly, it appears that the space of auditory perception has three dimensions, linear time, logarithmic scale and logarithmic cycles, which will be explained below. The {log-scale, log-cycles} plane of the space is obtained through a unitary transform of the traditional {linear-time,linear-frequency} plane, and the auditory figures that appear in this new plane have the property of being scale-shift covariant (ssc), with regard to both resonance rate and pulse rate. Moreover, the three forms of information in communication sounds are largely orthogonal in this plane. The advantage of the auditory space is that the message of a communication sound appears in a form that is essentially fixed, independent of the pulse rate and the resonance rate of the sound that conveys the message. It also appears that scale-shift covariance of this form is not mathematically possible in a {linear-time, linear-frequency} representation like the spectrogram. If this is the case, then it is important to understand scale-shift covariance and the space of auditory perception in order to improve the robustness of computer-based, sound processors like speech recognition machines and music classifiers.
Papers in preparation
The processing of Temporal Fine Structure Access to this page is currently restricted
Excerpts from published papers
Research projects
The Pole-Zero Filter Cascade Access to this page is currently restricted
AIM2006 Documentation
Published papers for the Category: Auditory Processing of Communication Sounds
Auditory filter banks: Patterson et al. (1995), Irino and Patterson (1997), Irino and Patterson (2001), Patterson et al. (2003), Unoki et al. (2006), Irino and Patterson (2006)
The construction of auditory images: Patterson et al. (1992), Patterson (1994a), Patterson (1994b), Patterson et al. (1995), Patterson and Holdsworth (1996), Bleeck et al. (2004), Patterson et al. (2006)
Invariant and scale-shift-covariant versions of the auditory image: Irino and Patterson (2002), Patterson et al. (2007), Irino et al. (2007)
Damped and ramped sounds in the auditory image: Patterson (1994a), Patterson (1994b), Irino and Patterson (1996), Patterson and Irino (1998), Akeroyd and Patterson (1995), Akeroyd and Patterson (1997), Uppenkamp et al. (2001), Lorenzi et al. (1997), Lorenzi et al. (1998), Pressnitzer et al. (2000), Neuert et al. (2001)
Pitch producing sounds in the auditory image: Patterson et al. (1996), Yost et al. (1996), Yost et al. (1998), Patterson et al. (2000), Handel and Patterson (2000), Winter et al. (2001), Wiegrebe et al. (2000), Wiegrebe and Patterson (1999), Wiegrebe et al. (1998), Stein et al. (2005), Krumbholz et al. (2000), Pressnitzer et al. (2001), Krumbholz et al. (2001), Krumbholz et al. (2003), Ives and Patterson (2008)
References
- Akeroyd, M.A. and Patterson, R.D. (1995). “Discrimination of wideband noises modulated by a temporally asymmetric function.” J. Acoust. Soc. Am., 98, p.2466-2474. [1]
- Akeroyd, M.A. and Patterson, R.D. (1997). “A comparison of detection and discrimination of temporal asymmetry in amplitude modulation.” J. Acoust. Soc. Am., 101, p.430-439. [1]
- Bleeck, S., Ives, T. and Patterson, R.D. (2004). “Aim-mat: The Auditory Image Model in MATLAB.” Acta Acustica, 90, p.781-787. [1]
- Handel, S. and Patterson, R.D. (2000). “The perceptual tone/noise ratio of merged, iterated rippled noises with octave, harmonic, and nonharmonic delay ratios.” J. Acoust. Soc. Am., 108, p.692-695. [1]
- Irino, T., Walters, T.C. and Patterson, R.D. (2007). “A computational auditory model with a nonlinear cochlea and acoustic scale normalization”, in Proceedings of the 19th International Congress on Acoustics, p.07-003. [1]
- Irino, T. and Patterson, R.D. (1996). “Temporal asymmetry in the auditory system.” J. Acoust. Soc. Am., 99, p.2316-2331. [1]
- Irino, T. and Patterson, R.D. (1997). “A time-domain, level-dependent auditory filter: The gammachirp.” J. Acoust. Soc. Am., 101, p.412-419. [1]
- Irino, T. and Patterson, R.D. (2001). “A compressive gammachirp auditory filter for both physiological and psychophysical data.” J. Acoust. Soc. Am., 109, p.2008-2022. [1]
- Irino, T. and Patterson, R.D. (2002). “Segregating Information about the Size and Shape of the Vocal Tract using a Time-Domain Auditory Model: The Stabilised Wavelet-Mellin Transform.” Speech Commun., 36, p.181-203. [1]
- Irino, T. and Patterson, R.D. (2006). “A Dynamic Compressive Gammachirp Auditory Filterbank.” IEEE Transactions on Audio, Speech, and Language Processing, 14, p.2222-2232. [1]
- Ives, D.T. and Patterson, R.D. (2008). “Pitch strength decreases as F0 and harmonic resolution increase in complex tones composed exclusively of high harmonics.” J. Acoust. Soc. Am., 123, p.2670-9. [1]
- Krumbholz, K., Patterson, R.D., Nobbe, A. and Fastl, H. (2003). “Microsecond temporal resolution in monaural hearing without spectral cues?.” J. Acoust. Soc. Am., 113, p.2790-2800. [1]
- Krumbholz, K., Patterson, R.D. and Nobbe, A. (2001). “Asymmetry of masking between noise and iterated rippled noise: Evidence for time-interval processing in the auditory system.” J. Acoust. Soc. Am., 110, p.2096-2107. [1]
- Krumbholz, K., Patterson, R.D. and Pressnitzer, D. (2000). “The lower limit of pitch as determined by rate discrimination.” J. Acoust. Soc. Am., 108, p.1170-1180. [1]
- Lorenzi, C., Gallégo, S. and Patterson, R.D. (1997). “Discrimination of temporal asymmetry in cochlear implantees.” J. Acoust. Soc. Am., 102, p.482-5. [1]
- Lorenzi, C., Gallégo, S. and Patterson, R.D. (1998). “Amplitude compression in cochlear implants artificially restricts the perception of temporal asymmetry.” Br. J. Audiol., 32, p.367-74. [1]
- Neuert, V., Pressnitzer, D., Patterson, R.D. and Winter, I.M. (2001). “The responses of single units in the inferior colliculus of the guinea pig to damped and ramped sinusoids.” Hear. Res., 159, p.36-52. [1]
- Patterson, R.D. (1994). “The sound of a sinusoid: Spectral models.” J. Acoust. Soc. Am., 96, p.1409-1418. [1] [2]
- Patterson, R.D. (1994). “The sound of a sinusoid: Time-interval models.” J. Acoust. Soc. Am., 96, p.1419-1428. [1] [2]
- Patterson, R.D., Allerhand, M.H. and Giguère, C. (1995). “Time-domain modeling of peripheral auditory processing: A modular architecture and a software platform.” J. Acoust. Soc. Am., 98, p.1890-1894. [1] [2]
- Patterson, R.D., Anderson, T.R. and Francis, K. (2006). “Binaural auditory images for noise-resistant speech recognition”, in Listening to Speech, An Auditory Perspective, Greenberg, S. and Ainsworth, W. editors, p.257-269 (Lawrence Erlbaum Associates, Routledge). [1]
- Patterson, R.D., Handel, S., Yost, W.A. and Datta, A.J. (1996). “The relative strength of the tone and noise components in iterated rippled noise.” J. Acoust. Soc. Am., 100, p.3286-3294. [1]
- Patterson, R.D., Robinson, K., Holdsworth, J., McKeown, D., Zhang, C. and Allerhand, M. (1992). “Complex Sounds and Auditory Images”, in Auditory Physiology and Perception, Y Cazals L. Demany and Horner, K. editors (Pergamon Press, Oxford). [1]
- Patterson, R.D., Unoki, M. and Irino, T. (2003). “Extending the domain of center frequencies for the compressive gammachirp auditory filter.” J. Acoust. Soc. Am., 114, p.1529-1542. [1]
- Patterson, R.D., van Dinther, R. and Irino, T. (2007). “The robustness of bio-acoustic communication and the role of normalization”, in Proceedings of the 19th International Congress on Acoustics, p.07-011. [1]
- Patterson, R.D., Yost, W.A., Handel, S. and Datta, A.J. (2000). “The perceptual tone/noise ratio of merged iterated rippled noises.” J. Acoust. Soc. Am., 107, p.1578-1588. [1]
- Patterson, R.D. and Holdsworth, J. (1996). “A Functionl Model of Neural Activity Patterns and Auditory Images.” Advances in Speech, Hearing and Language Processing, Vol 3. Part B. JAI Press, London., p.547-563. [1]
- Patterson, R.D. and Irino, T. (1998). “Modeling temporal asymmetry in the auditory system.” J. Acoust. Soc. Am., 104, p.2967-2979. [1]
- Pressnitzer, D., Patterson, R.D. and Krumbholtz, K. (2001). “The lower limit of melodic pitch.” J. Acoust. Soc. Am., 109, p.2074-2084. [1]
- Pressnitzer, D., Winter, I.M. and Patterson, R.D. (2000). “The responses of single units in the ventral cochlear nucleus of the guinea pig to damped and ramped sinusoids.” Hear. Res., 149, p.155-66. [1]
- Stein, A., Ewert, S.D. and Wiegrebe, L. (2005). “Perceptual interaction between carrier periodicity and amplitude modulation in broadband stimuli: a comparison of the autocorrelation and modulation-filterbank model.” J. Acoust. Soc. Am., 118, p.2470-2481. [1]
- Unoki, M., Irino, T., Glasberg, B., Moore, B.C. and Patterson, R.D. (2006). “Comparison of the roex and gammachirp filters as representations of the auditory filter.” J. Acoust. Soc. Am., 120, p.1474-1492. [1]
- Uppenkamp, S., Fobel, S. and Patterson, R.D. (2001). “The effects of temporal asymmetry on the detection and perception of short chirps.” Hear. Res., 158, p.71-83. [1]
- Wiegrebe, L., Hirsch, H.S., Patterson, R.D. and Fastl, H. (2000). “Temporal dynamics of pitch strength in regular-interval noises: effect of listening region and an auditory model.” J. Acoust. Soc. Am., 107, p.3343-3350. [1]
- Wiegrebe, L., Patterson, R.D., Demany, L. and Carlyon, R.P. (1998). “Temporal dynamics of pitch strength in regular interval noises.” J. Acoust. Soc. Am., 104, p.2307-2313. [1]
- Wiegrebe, L. and Patterson, R.D. (1999). “The role of envelope modulation in spectrally unresolved iterated rippled noise.” Hear. Res., 132, p.94-108. [1]
- Winter, I.M., Wiegrebe, L. and Patterson, R.D. (2001). “The temporal representation of the delay of iterated rippled noise in the ventral cochlear nucleus of the guinea-pig.” J Physiol, 537, p.553-66. [1]
- Yost, W.A., Patterson, R. and Sheft, S. (1996). “A time domain description for the pitch strength of iterated rippled noise.” J. Acoust. Soc. Am., 99, p.1066-1078. [1]
- Yost, W.A., Patterson, R. and Sheft, S. (1998). “The role of the envelope in processing iterated rippled noise.” J. Acoust. Soc. Am., 104, p.2349-2361. [1]
Pages in category "Auditory Processing of Communication Sounds"
The following 20 pages are in this category, out of 20 total.