Support Vector Learning for Gender Classification Using Audio and Visual Cues: A Comparison
SVM '02 Proceedings of the First International Workshop on Pattern Recognition with Support Vector Machines
Hi-index | 0.00 |
The performance of a sex identification system working on 16 ms segments of speech is discussed. Consideration is given to: the effectiveness of individual cepstral coefficients and frame-to-frame differences of those coefficients for sex identification; performance of the Gaussian classifier as a function of the amount of training data; the results of simplifications to the Gaussian classifier; performance when training and testing are done by phoneme, by phoneme-class, and on all speech; effects of training on one phoneme and testing on speech from different phonemes; and distribution of sex identification errors by speaker. It is concluded that, if the speech signal is of high quality, it should not be difficult to build a practical system which uses successive outputs from the single-frame Gaussian classifier to accurately classify a speaker as being male or female. Additional testing needs to be done on data which are not of such high quality.