Effects of manipulating fundamental frequency and speech rate on synthetic voice recognition performance and perceived speaker identity, sex, and age

Gous, GE ORCID logoORCID: https://orcid.org/0000-0002-7199-7333, 2017. Effects of manipulating fundamental frequency and speech rate on synthetic voice recognition performance and perceived speaker identity, sex, and age. PhD, Nottingham Trent University.

[thumbnail of Georgina Gous 2017.pdf]
Preview
Text
Georgina Gous 2017.pdf - Published version

Download (3MB) | Preview

Abstract

Vocal fundamental frequency (F0) and speech rate provide the listener with important information relating to the identity, sex, and age of the speaker. Furthermore, it has also been demonstrated that manipulations in F0 or speech rate can lead to accentuation effects in voice memory. As a result, listeners appear to exaggerate the representation of a target voice in terms of F0 or speech rate, and mistakenly remember it as being higher or lower in F0, or faster or slower in speech rate, than the voice originally heard. The aim of this thesis was to understand the effect of manipulations/shifts in F0 or speech rate on voice matching performance and perceived speaker identity, sex, and age. Synthesised male and female voices speaking prescribed sentences were generated and shifted in either F0 and speech rate. In the first set of experiments (Experiments 2, 3, and 4), male and female listeners made judgements about the perceived identity, sex, or age of the speaker. In the second set of experiments (Experiment 5, 6, and 7) male and female listeners made target matching responses for voices presented with and without a delay, and with different spoken sentences. The results of Experiments 2, 3, and 4 indicated the following: (1) Shifts in either F0 or speech rate increased uncertainty about the identity of the speaker, though were more robust to shifts in speech rate than they were to shifts in F0. (2) Shifts in F0 also increased uncertainty about speaker sex, but shifts in speech rate did not. Male voices were accurately perceived as male irrespective of the direction of manipulation in F0. However, for female voices, decreasing F0 increased the uncertainty of speaker sex (i.e., the voices were more likely to be perceived as male rather than female). (3) Increasing either F0 or speech rate resulted in both male and female voices as sounding younger, whereas decreasing either F0 or speech rate lead to listeners perceiving the voices as sounding older. The results of Experiments 5, 6, and 7 indicated the following: (4) Shifts in either F0 or speech rate did increase matching errors for the target voice, however, there was no evidence of an accentuation effect. Specifically, for voices shifted in F0, there was an increase in the selection of voices higher in F0 compared to voices lower in F0. For voices shifted in speech rate, there was an increase in the selection of voices faster in speech rate compared to voices slower in speech rate, but only for slow speech rate target voices. (5) Accentuation errors were no more likely to occur when the inter-stimulus interval was increased, or (6) when a different sentence was spoken in the sequential voice pair to the one previously spoken by the target voice.

The findings have theoretical and applied relevance. The work has provided a clearer understanding of how shifts in F0 or speech rate are likely to affect perceptions about the identity, sex, and age of the speaker than was possible to establish from previous studies. It has also contributed further to our understanding about the effect of shifts in F0 or speech rate on voice matching performance, and their importance in accurate recognition. This information might be insightful to the police and help to determine the accuracy of descriptions made about a voice and decisions made during a voice lineup, particularly if a suspect of a crime was likely to be disguising their voice.

Item Type: Thesis
Creators: Gous, G.E.
Date: September 2017
Rights: This work is the intellectual property of the author. You may copy up to 5% of this work for private study, or personal, non-commercial research. Any re-use of the information contained within this document should be fully referenced, quoting the author, title, university, degree level and pagination. Queries or requests for any other use, or if a more substantial copy is required, should be directed to the owner(s) of the Intellectual Property Rights.
Divisions: Schools > School of Social Sciences
Record created by: Linda Sullivan
Date Added: 25 Jun 2018 08:56
Last Modified: 25 Jun 2018 08:56
URI: https://irep.ntu.ac.uk/id/eprint/33899

Actions (login required)

Edit View Edit View

Statistics

Views

Views per month over past year

Downloads

Downloads per month over past year