Overview
From CNBH Acoustic Scale Wiki
Roy Patterson
|
The animals and instruments that produce communication sounds come in families, for example, the string family, the brass family, the woodwinds and the family of human voices. Within a family, the members are distinguished by their register, often referred to in vocal terms as soprano, alto, tenor, baritone and base. This overview focuses on the perception of register within a family of animals or instruments and the form of the register information in four domains:
- the resonant structures of the animal or instrument that produces the sound,
- the medium that transmits the sound to the listener,
- the physiological structures that analyse the sound in the auditory system of the listener, and
- the ultimate perception of the sound by a sentient being.
Contents |
The domains of sound
The domains and the relationships between the variables in the different domains are illustrated for the string family of instruments in Figure 0.1: The panels in the first row contain pictures that illustrate the domains. The panels in the second row list some of the properties that determine the form of register information in the different domains.
In the instrument
The picture in the first column (panel 1a) shows a family of eight violins which all have the same shape and structure, and which all produce the "string" sound of the violin family. The instrument maker varies the properties of the instrument (panel 2a) to produce the different members of the family and so produce the string sound in a wide range of registers. As the length and volume of the body increase, and the length and mass of the strings increase, the register goes from above that of a violin, through the violin, viola, cello and contrabass, and on to an even large member of the family, the octobase.
In the air
The picture in the second column (panel 1b) illustrates the waveforms produced by large (upper) and small (lower) members of the string family, respectively. Communication sounds typically have the pulse-resonance form shown in this panel. The relevant variables in this acoustic domain (panel 2b) are wavelength, density of the medium and the speed of sound. The message of the communication (the sound of a string instrument) is represented by the complex shape of the resonance that follows each pulse. The register information appears in the "acoustic scale" of the components of the wave form: The period between pulses is the "acoustic scale of the source of excitation." The average period of the cycles of the resonance that follows each pulse is the "acoustic scale of the body resonance." Together, these acoustic scale variables transmit the information concerning the register of the instrument through the air to the listener.
In the auditory system
When a sound arrives at the ear, the cochlea performs a wavelet transform on it and arranges the components of the sound along the basilar membrane in accordance with their wavelength. It also records the time intervals between peaks in the basilar membrane motion at each wavelength, and computes something like a time-interval histogram showing which interpeak intervals appear most frequently at that wavelength. The array of histograms for all wavelengths is often referred to as an auditory image, the picture in the third column (panel 1c) illustrates what the auditory image might look like for the larger of the string sounds in panel 1b. The dimensions of this auditory representation of sound are the tonotopic axis of the cochlea (y) and inter-pulse interval. Despite the complexity of auditory processing, this internal representation of sound preserves instrument family and register information in the form of an auditory figure, which is the activity attached to the vertical ridge in panel 1c. The shape of the figure is the family information. The time-interval between repeats of the auditory figure is the period of the source of excitation, that is, the pulse period. The vertical position of the figure specifies the acoustic scale of the resonant filter in the body of the instrument.
In perception
The picture in the fourth column (panel 1d) shows a stave of notes and the image of a violin. Together they are meant to represent the extended perception of a melody and the recognition that it is a string instrument playing the melody and that the instrument is a violin rather than a cello, or some other member of the family. The question in this domain is how the two acoustic scale values combine to produce the specific perception of the mid-high register associated with the violin.
Summary of domains with regard to register information
The important variables for the discussion of register within a family are those that specify
- (column a) the physical size of components in the instrument that produce the sound,
- (column b) the scale of the corresponding properties of the sound as they exist in the air,
- (column c) the scale of the properties as they exist physiologically in the auditory system of the listener, and ultimately,
- (column d) the size or register component of our perception of the instrument.
The progression of register information through the domains of sound
The lower two rows of Figure 0.1 show the relationships that define the register of an instrument, and how this information progresses through the domains of sound to produce a register perception in the brain of a listener. The third row presents the relationships for the register information derived from the source of excitation, the strings; the fourth row presents the relationships for the register information derived from the body resonances. The relationships are written in terms of wavelength rather than frequency to emphasize the role of size in the perception of instrument register, and more generally in auditory communication. Wavelength is directly related to resonator size, and it is possible that the cochlea originally evolved to enable mammals to determine the wavelengths of the resonances in animal calls. This would have enabled them to determine the relative size of an individual with respect to the distribution of sizes for that species.
The register component provided by the excitation source
In the string family of instruments, the source of excitation is, of course, the string. The period of a vibrating string, ps, is directly related (panel 3a) to the length, Ls, and the mass, Ms, of the string, and it is inversely related to its tension, Ts. The strings on a contrabass are much longer and much more massive than those on a violin, and as a direct result, its tones have much longer periods than the tones of a violin and it has a lower register. For a given instrument, string length is fixed and the instrument maker varies the mass and tension of the string to get the appropriate period for the lowest note on the string. Mass and tension are also used to produce the progression of lowest-note periods across the strings of a given instrument. Together the lowest-note periods determine the set of notes that the instrument can play and, as a result, they determine the register information that the source can produce. From the perspective of the listener, however, the form of the register information is somewhat complex.
The musician varies string length, Ls, (by changing finger position) to produce the changes in note period, ps, that we perceive as melody. In so doing, the musician illustrates the range of notes provided by the instrument, and it is the distribution of notes, rather than the melody that is the source component of the register information conveyed to the listener. The melody information is the sequence of period ratios between successive notes, and as such, is separate from the register information; you can play a given melody on any member of the string family and it is still the "same" melody. Thus, the concept of register is an abstract property of the sound of an instrument, associated with the distribution of notes played on the strings over time. They define a weighted geometric average period for the instrument, and it is this geometric average period that is the register information provided by the instrument's source of excitation. The concept of register appears to assume that the brain has a model of each instrument built up from extended experience with the sound of the instrument -- a model that includes some representation of the distribution of notes to be expected from each instrument and the statistics of that distribution.
In the air (panel 3b), the distance between pulse peaks is the wavelength, λs, of the source component of the tone; numerically it is the period of the wave, ps, times the speed of sound in air, c. The excitation wavelength, λs, has the same units (e.g. cm) as string length, Ls, and the wavelength of the sound scales linearly with string length. This is the form of the register information associated with the excitation component of a string tone as it exists in the air between the instrument and the listener. The acoustic scale of the excitation component of the sound is its wavelength, λs, relative to a calibration wavelength, λs0, which for musical instruments is typically [?c/2pi440?]. In source-filter terms, it is the acoustic scale of the source, Ss.
The auditory image in panel 1c gives the impression that the physiological representation of time interval, and thus, ps is a spatial distance, and this may well be the case. The orderly layering of cells in the inferior colliculus of the brainstem and in primary auditory cortex have both been interpreted as spatial maps giving rise to "tonotopic" and "periodotopic" dimensions in the auditory pathway(Langner refs). If the representation of ps in the auditory system is spatial, then the register information regarding the source of excitation is length information, Xs, in the domain of auditory physiology, just as it is in the air and the instrument. With regard to the auditory image (1c), Xs is the position of the vertical ridge along the X dimension in the image. It is possible that Xs is a simple proportion of Ls, ps,λs or Ss. However, the musical scale is based on the octave, because notes an octave apart are heard as similar, and particularly consonant when played concurrently. This suggests that the internal representation of pitch at the point of perception is a logarithmic function, base 2, of Ss, λs, ps, or Ls.
In panel 3c, the component of the register information associated with the source, Xs, is written as a log2 function of λs, and in panel 4d, the register component of the perception, Rs, is a linear function of Xs. Alternatively, Xs may be directly proportional to Ss in the mid-brain, and the log transformation to the perceptual form, Rs may occur between the physiological representation in the mid-brain and the perceptual representation in auditory cortex. The log transformation is shown at the earlier stage between λs and Xs for symmetry with the filter component of register (described below) where a quasi-log transform is known to occur in the cochlea.
With regard to the perception of register (panel 3d), the physiological representation of the scale of the source, Xs, has to be interpreted through stored knowledge concerning the correspondence between the distribution of Xs values produced by a given instrument and the size of this string instrument, as it exists in the external world. It is this learned relationship which allows us to say whether a melody is being played by a violin, viola, cello or contrabass.
In summary, the relationship between the production, transmission, analysis and perception of string tones shows the importance of size and acoustic scale in determining, Rs, the component of the register perception associated with the source of excitation.
Pitch versus temporal regularity in the domains of sound
The inverse of the period of a wave is, of course, a frequency in Hz, and this frequency value is commonly used to specify the pitch that we perceive in response to a musical tone. Indeed, the simplicity of the relationship between temporal regularity in a variable of production, transmission or physiological analysis prompts many people to use the repetition rate of the source in air, (fs in Hertz) as the value for this source property in all of the domains of sound, including perception, where the units could not possibly be Hertz.
From the perspective of instrument register, it is perhaps unfortunate that the repetition rate of the sound was chosen rather than its wavelength. The number of times per second that the wave repeats is a rather less intuitive measure than wavelength when it comes to understanding the production, transmission and analysis of the information in bio-acoustic communication. The inverse of the period goes down as period, string length, wavelength and physiological time-interval go up.
We use instrument names to distinguish the different perceptions of instrument register within a family and we readily associate register with instrument size. Unfortunately, register terminology is frequency based, so small instruments are considered to play in a "high register" and large instruments in a "low register", which obscures the relationship between the perception of register and the form of size information in the domains of sound. We could think of small instruments as producing small notes and large instruments as producing large notes, which would fit with our perception of the volume of a sound. In this case, the relationship between register (as average volume) and the size variables would be direct.
The register component provided by the resonant filter
The filter in string instruments is a set of resonances involving the volume of air enclosed by the body, the vibration of the bridge and the vibration of the face plate (van Dinther and Patterson, 2006). The expression in panel 4a is meant to indicate that in each case, the period of the resonance, is directly related to the size of the components of the specific resonator. So for example, the period of the body resonance, pf, is directly related to the length, Lf, of the body. The body, bridge and face plate of the contrabass are all much larger than those of a violin, and as a direct result, the periods of its resonances are much greater than for those of the violin. The set of resonances collectively provides the filter component of the register information in the form of pf, Lf or Mf. The filter component of register is simpler than the source component inasmuch as the musician does not vary the value of the information in the course of playing the instrument. At the same time, the filter component of register is more complex than the source component, in the sense that there are always multiple resonators in the body and the sounds they produce are superimposed in the acoustic wave that the instrument produces, and so the specification and analysis of the filter information is somewhat more complex. Thus, the filter component of register is an abstract property of the production process, as it was with the source component of register. With regard to production, we can think of the filter component of register as the average period of the resonance that follows the abrupt onset of each source cycle. We are, however, assuming that the brain has a model of each instrument built up from experience that interprets the pattern of resonances in terms of musical instruments, and it knows that the position of the pattern along the cochlea is size, or register, information.
In the air (panel 4b), the average time between peaks in the resonance that follows the onset of each cycle provides an estimate of the average wavelength, λf, of the filter elements in the body of the instrument. λf has the same units (e.g. cm) as the spatial length dimensions of the resonant structures in the body of the instrument; numerically it is the period of the wave, pf, times the speed of sound in air, c. This is the form of the register information associated with the resonant filter of the instrument, as it exists in the air between the instrument and the listener. The acoustic scale of the filter component of the sound, Sf, is its wavelength, λf, relative to a calibration wavelength, λλ0, which for the filter component of an instrument is just the wavelength of one Hertz [?c/2pi?].
The cochlea segregates the resonant components of sound on the basis of their wavelength and, in so doing, creates a quasi-logarithmic wavelength axis along the length of the basilar membrane. So the "place" associated with wavelength, logλf, is proportional to "distance from the oval window" in the cochlea (panel 4c), that is, distance from the entrance to the cochlea. This is the "tonotopic" dimension of the cochlea. Auditory nerve fibres preserve this position information as do all of the main neural centres up to and including primary auditory cortex. This tonotopic representation of filter wavelength is what is portrayed by the vertical, or Y, dimension of the auditory image (panel 1c). Thus, wavelength is represented by length in the sense of position along the basilar membrane, or position within a neural array, all the way up the auditory pathway to auditory cortex. This means that in the domain of auditory physiology, the filter component of register information, Yf, is a length that is logarithmically related to the size of the instrument. With regard to the auditory image (1c), Yf is like the centroid of the activity along the Y dimension in the image. Since a doubling in the size of the instrument produces an octave shift in Yf, the relationship is written as a log2 function of λf in panel 4c. It could also be written as a log2 function of Sf, or pf, or Lf.
Note that, as the cochlea sorts the resonances by wavelength, it simplifies the shape of individual resonances as they appear in the auditory image. The simplification of the resonance presumably facilitates the analysis of the resonances by the auditory system, and makes it possible to determine whether they are relatively simple damped resonances, as with the second and third formants of the voice, or more complicated resonances as is the case of string instruments. It is these segregated resonances that produce the shape of the auditory figure in the auditory image (panel 1c).
With regard to the perception of register (panel 4d), the physiological representation of the scale of the filter, Yf, has to be interpreted through stored knowledge concerning the pattern of resonances in the auditory image and stored knowledge about the patterns produced by different instrument families. The shape of the pattern is the family information in the auditory image. The position of the pattern along the Y dimension of the auditory image is the register information, Rf, provided by the instrument filter.
In summary, the relationship between the production, transmission, analysis and perception of string tones shows the importance of size and acoustic scale in determining, Rf, the component of the register perception associated with the resonances that define the filter properties of the instrument.