AIM2006ModulesMMI
From CNBH Acoustic Scale Wiki
Aim2006 has a Development module for research on the auditory image and modelling of more central auditory processes based on auditory images. At the CNBH, there are currently three development projects which appear in this column of the aim2006 GUI; one is concerned with pitch perception and the other two are concerned with the invariance problem in speech and music, that is, the fact that human perception of speech and music sources is largely immune to changes in pulse rate and resonance rate (Ives, Smith, & Patterson, 2005; Smith & Patterson, 2005; Smith, Patterson, Turner, Kawahara, & Irino, 2005).
The default option is the Mellin transform, mt, which produces a Mellin Magnitude Image (MMI). It is a size invariant representation of the resonance structure of a source. In the MMI, vowels scaled in GPR and VTL produce the same pattern in the same position within the image. The processing is a two-dimensional form of Mellin transform (mt) in which both the time-interval dimension and the tonotopic frequency dimension are transformed to produce the scale-invariant representation.
The three present Development modules convert the SAI into three new representations:
- mt: the Mellin Transform produces a size invariant representation of the resonance structure of the source.
- sst: Size Shape Transform produces a size covariant representation of the resonance structure of the source.
- dpp: the Dual Pitch Profile is a coordinated log-frequency representation of the time-interval profile and the tonotopic profile of the auditory image.
Background
Human perception of speech and music sources is largely immune to changes in the pulse rate and the resonance rate in the sounds (Ives et al., 2005; Smith & Patterson, 2005; Smith et al., 2005); that is, the parameters associated with source size.
The Mellin transform module, mt, converts the SAI into a Mellin Magnitude Image (MMI). It is a size-invariant representation of the resonance structure of a source. In the MMI, vowels scaled in GPR and VTL produce the same pattern in the same position within the image. The processing is a two-dimensional form of Mellin transform (mt) in which both the time-interval dimension and the tonotopic frequency dimension are transformed to produce the size-invariant representation. Versions of AIM with gammatone or gammachirp filterbanks, strobed temporal integration and the Mellin transform all produce versions of what Irino and Patterson (2002) refer to as Stabilised Wavelet Mellin Transforms. They all produce size-invariant representations of the resonance structure of a sound.
Figure 11a shows the MMIs for the four example vowels using the dcgc/hl/sti03/mt model. The envelopes of the patterns for the four vowels are much more similar than in previous representations. That is, the portions of the plane that are occupied are highly correlated and the structure of the pattern within regions is similar. Thus, the variability due to GPR and VTL has essentially been removed from the representation. There are differences in the fine structure of the patterns which result from the degree to which the resonance fits between adjacent glottal pulses in each of the vowels.
Figure 11b shows the MMIs for the four example vowels using the gt/hcl/sti03/mt model. Again, there is good correlation between the areas of the planes in the subfigures. However, the time-interval fine structure is not as well-defined and regular as in Figure 11a in which the dcgc/hl/sti03/mt model was used. Qualitatively, the plots in Figure 11b seem to be more invariant to scale changes (VTL) than to pitch changes (GPR); this is in contrast to Figure 11a.
The Size Shape Transform is an alternative transform of the auditory image that is size covariant, rather than size invariant. The distinction derives from the operator mathematics used to derive the MMI. In the scale covariant image (SSI), vowels scaled in GPR and VTL produce the same pattern in the SSI, but the position of the pattern moves vertically in the tonotopic dimension with changes in scale, and the pattern shifts linearly with scale. This representation has several advantages that are currently being explored. The pattern of resonances has an intuitive form. It does not change with size and the size information appears in a compact form independent of the resonance pattern.
The Dual Pitch Profile is intended for research into the relationship between pitch information in the time-interval profile and the tonotopic profile of the auditory image. It provides a frequency coordinated combination of the time-interval profile and the tonotopic profile to reveal which form of pitch information dominates the perception in a given sound.
The user can also define new Development modules and add them to the menu. Indeed, they can add modules to any of the columns. There is web-based documentation (http://www.pdn.cam.ac.uk/cnbh/aim2006) with examples to explain the process of adding a new module.