Detecting stress in spoken English using Decision Trees and Support Vector Machines

Authors:
Huayang Xie;Peter Andreae;Mengjie Zhang;Paul Warren
Affiliations:
Victoria University of Wellington, Wellington, New Zealand;Victoria University of Wellington, Wellington, New Zealand;Victoria University of Wellington, Wellington, New Zealand;Victoria University of Wellington, Wellington, New Zealand
Venue:
ACSW Frontiers '04 Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation - Volume 32
Year:
2004

Citing 4
Cited 4

Automatic detection of prosodic constituents for parsing

Automatic detection of prosodic constituents for parsing
Support-Vector Networks

Machine Learning
Acoustic characteristics of lexical stress in continuous speech

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 3 - Volume 3
Learning models for English speech recognition

ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26

Genetic Programming for detecting rhythmic stress in spoken English

International Journal of Knowledge-based and Intelligent Engineering Systems - Genetic Programming An Emerging Engineering Tool
Genetic programming for automatic stress detection in spoken english

EuroGP'06 Proceedings of the 2006 international conference on Applications of Evolutionary Computing
Modelling lexical stress

TSD'05 Proceedings of the 8th international conference on Text, Speech and Dialogue
Automatic stress exaggeration by prosody modification to assist language learners perceive sentence stress

International Journal of Speech Technology

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper describes an approach to the detection of stress in spoken New Zealand English. After identifying the vowel segments of the speech signal, the approach extracts two different sets of features - prosodic features and vowel quality features - from the vowel segments. These features are then normalised and scaled to obtain speaker independent feature values that can be used to classify each vowel segment as stressed or unstressed. We used Decision Trees (C4.5) and Support Vector Machines (LIBSVM) to learn stress-detecting classifiers with various combinations of the features. The approach was evaluated on 60 adult female utterances with 703 vowels and a maximum accuracy of 84.72% was achieved. The results showed that a combination of features derived from duration and amplitude achieved the best performance but the vowel quality features also achieved quite reasonable results.