The role of GPR and VTL in the definition of speaker identity

From CNBH Acoustic Scale Wiki

Jump to: navigation, search

← Category:Perception of Communication Sounds

This page is currently under construction.
You can check the latest changes in the history.

Etienne Gaudrain, Su Li, Vin Shen Ban, Roy D. Patterson

The aim of this project is to study the influence of GPR and VTL on the identification of a speaker.

The role of glottal pulse rate and vocal tract length in the perception of speaker identity (Interspeech 2009)

Method

Figure 1. Speaker positions in the GPR-VTL plane. The axes are both in semitones re the original speaker. The left panel shows an example of speaker conditions for a subject in the Anonymous experiment that involve roving. The right panel shows the speakers used for the named experiment.

Dwarf

Child

Woman

Man

Castrato

Five target speakers were chosen: a man, a woman, a child, a dwarf and a 'castrato'. These speakers are the centre of the spoke pattern in the right panel of Figure 1. For each of these speakers, 8 spokes were drawn, and 6 new speakers were defined from each spoke. Each target speaker was then compared to any of these 48 comparison speakers, or itself, thus generating 49 conditions per speaker, i.e. a total of 245 conditions. If the spokes and the points in each spoke are numbered as in Figure 2, then for each target speaker, the comparison speakers can be located using each number as a coordinate. For example (1,6) would be the 6th point on the 1st spoke.

Figure 2. Numbering of conditions on the spoke pattern. The number in circle is the number of the spoke. The points within each spoke are numbered from 1 to 6. Each point can then be located specifiying it spoke and position in spoke, like (2,6), the furthest point on the second spoke. The centre of the spoke pattern is noted (0,0).

(8,6)

(7,6)

(6,6)

(1,6)

Man (0,0)

(5,6)

(2,6)

(3,6)

(4,6)

Two experiments have been designed. In the Anonymous experiment, the subjects were presented two sets of three syllables from a speaker A and a speaker B, and they were asked “Is it possible that A and B were uttered by the same speaker?” One of the two speakers was a target speaker and the other one of the comparison speaker associated with this target. In the Named experiment, the target speakers were given names: James, Mary, Ethan, Tony and Alessandro respectively. The subjects were presented a set of three syllables from one of these named speakers and then another set from one of the comparison speakers. They were then asked: “Is it possible that the second set of syllables was also uttered by name?” where name was replaced by the name of the target speaker.

In both experiments the whole experiment was divided in 10 blocks. In each block, the 245 conditions were tested. The blocks were grouped by sessions of no more than 2 hours. Both experiment included a training procedure. In the Anonymous experiment the training consisted in a short block with 3 random conditions to allow the subjects to familiarize with the GUI and procedure. In the Named experiment the training was more thorough and repeated at the beginning of each session. In the training block, each target speaker was presented twice: first compared to itself (0,0), then compared to the furthest point on the second spoke (2,6).

Results

Figure 3. Similarity judgement (percent of "yes" responses) as a function of the radial distance from the target speaker. Each curve represents the average on the two experiments, the speakers and the subjects along one pair of spokes.

Figure 4. Similarity judgement as a function of VTL difference. Each curve represents a set of spokes. For the two diagonal spokes, the VTL distance is the projection of the radial distance on the VTL axis, i.e. the VTL component of the radial distance.

Figure 5. Similarity judgement (percent of "yes" responses) as a function of the radial distance from the target speaker. Each curve represent the average for all the speakers and all the subjects along one pair of spokes.

Figure 6. Similarity judgement along pairs of spokes. The Anonymous experiment data is in the left column, and the Named experiment data is in the right column. The similarity is the percentage of "yes" responses. The radial distance is the euclidian distance in the GPR-VTL plane expressed in semitones. Spokes 1-5 are changes in GPR only. Spokes 3-7 are changes in VTL only. Spokes 2-6 and 4-8 are changes along the diagonals. Each speaker is plotted with a different color. Error bars show the standard error.

Figure 7. Similarity judgement along pairs of spokes. The data for the male subjects is in the left column, and the data for the female subjects is in the right column. The similarity is the percentage of "yes" responses. The radial distance is the euclidian distance in the GPR-VTL plane expressed in semitones. Spokes 1-5 are changes in GPR only. Spokes 3-7 are changes in VTL only. Spokes 2-6 and 4-8 are changes along the diagonals. Each speaker is plotted with a different color. Error bars show the standard error. Looks like the effect is mainly driven by female listerners of the Named experiment.

Figure 8. Radial plot of the similarity threshold for the Anonymous experiment. The axes represent the distance in semitones from the target speaker to the comparison speaker. Each speaker is plotted with a different color. The dashed lines show the standard error across subjects. Thresholds greater than 1.5 the spoke length have been ignored.

Figure 9. Radial plot of the similarity threshold for the Named experiment. The axes represent the distance in semitones from the target speaker to the comparison speaker. Each speaker is plotted with a different color. The dashed lines show the standard error across subjects. Thresholds greater than 1.5 the spoke length have been ignored.

The role of GPR and VTL in the definition of speaker identity

From CNBH Acoustic Scale Wiki

Method

Results

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

links