Introduction to the content of the wiki
From CNBH Acoustic Scale Wiki
This wiki is primarily concerned with bio-acoustic communication – how the sounds of animals, humans and instruments are produced, and how we perceive them – and the role of source size in bio-acoustic communication. It is argued that:
- Communication sounds have a special, pulse-resonance form that makes them informative, and makes them stand out from background noise in the natural world.
- The size of a communication sound, or its acoustic scale, varies with the size of the animal or instrument producing the sound, simply because larger objects vibrate more slowly than smaller objects (other things being equal).
- There are two aspects to acoustic scale variation in communication sounds: 1. The rate of pulses decreases as the size of the animal or instrument increases. 2. The rate of oscillation in the resonance decreases as the size of the animal or instrument increases. In mammals, for example, the pulses are produced by the vocal cords. They get longer and heavier as the individual grows up, and so, the glottal pulse rate decreases as the animal grows up. The resonances are produced by the vocal tract above the larynx. The vocal tract gets longer as the individual grows up, and so, the resonant frequencies of their sounds go down as the animal grows up.
- The variation in source size produces a discrimination-generalization problem for the listener. For example, animals need to know whether two distinct calls come from individuals of two different species, or whether they are two members of the same species that happen to differ in size. The fact that humans are very good at the discrimination-generalization problem suggests that the peripheral auditory system constructs a size invariant representation of these sounds; that is, a version of the information in the sound that is independent of both aspects of acoustic scale.
- We need to understand the relationship between the production and perception of communication sounds if we are to understand bio-acoustic communication, in general, and the perception of speech and music, in particular. If we can characterize acoustic-scale normalization in signal processing terms, the algorithms should help us improve the robustness of speech recognition machines. Currently, they have great difficulty with size variation. Normalization algorithms would also assist in the development of audio search engines for the internet.
The production and perception of communication sounds was originally divided into seven categories on the acoustic scale wiki. The sections are listed below and the categories are maintained for historical reference. The route to the information in these categories is now directed through the categories listed on the sidebar.
- An Introduction to Communication Sounds, which provides an overview of the important concepts in a relatively compact form. Specifically, it explains the concept of acoustic scale which is the form of size information conveyed by the sound to the listener.
- The Information in Communication Sounds, which describes the modes of production and the effect of variation in body size on the two aspects of acoustic scale in these sounds, that is, the pulse rate and the resonance rate.
- The Perception of Communication Sounds, which describes behavioural experiments designed to show that humans can (a) extract the message of a communication sound without being confused by the size information in the sound, and (b) extract the size information in communication sounds without being confused by the message of the sound. There is also a discussion paper concerning what we mean by the terms auditory image, auditory figure, auditory object, auditory event and auditory scene.
- Auditory Processing of Communication Sounds, which describes a computational model of auditory perception. The model is intended to illustrate the representations of sound produced in the early stages of auditory processing: namely, basilar membrane motion, the neural activity pattern observed in the auditory nerve, and the auditory image -- a hypothetical representation of the neural activity that supports our initial perception of sound.
- HSR for ASR, which describes how knowledge about the early stages of Human Speech Recognition (HSR) might be used to improve the early stages of Automatic Speech Recognition (ASR).
- The Evolution of Communication Sounds, which considers the signal processing at the heart of the evolution of bio-acoustic communication, and why the auditory analysis of communication sounds is so different from the spectrographic signal processing used by engineers to analyse speech and music. Specifically, the aim is to explain why auditory processing begins with a wavelet analysis, why hair cells phase lock to basilar membrane motion, and why auditory compression is so fast acting.
- Brain Imaging of Communication Sound Processing, which contains summaries of brain imaging studies designed to locate regions of activation associated with the early stages of communication sound processing.
An extended description of auditory perception is provided in Now See Hear! – a parallel stream of the wiki with videos that illustrate how the auditory system separates temporal fine structure, which we hear as pitch and timbre, from the temporal envelope, which we hear as the dynamics of auditory events.
Sounds and Videos There are pages that list some of the Sounds and Videos used to illustrate the form of information in communication sounds.
There is also a page describing the Sound Tools used to make or modify some of the sounds described in the wiki, like Musical Rain or The syllable database