Fundamentals of speech recognition
Fundamentals of speech recognition
Content-based music structure analysis with applications to music semantics understanding
Proceedings of the 12th annual ACM international conference on Multimedia
Music Key Detection for Musical Audio
MMM '05 Proceedings of the 11th International Multimedia Modelling Conference
Music structure based vector space retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic language recognition using acoustic features
ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Syllabic level automatic synchronization of music signals and text lyrics
MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
A phonotactic language model for spoken language identification
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A Vector Space Modeling Approach to Spoken Language Identification
IEEE Transactions on Audio, Speech, and Language Processing
Automatic mood detection and tracking of music audio signals
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
Sung language recognition relies on both effective feature extraction and acoustic modeling. In this paper, we study rhythm based music segmentation with the frame size being the duration of the smallest note in the music, as opposed to fixed length segmentation in spoken language recognition. It is found that acoustic features extracted from the rhythm based segmentation scheme outperform those from fixed length segmentation. We also study the effectiveness of a musically motivated acoustic feature. Octave scale cepstral coefficients (OSCCs) by comparing with the other acoustic features: Log frequency cepstral coefficients, Linear prediction coefficients (LPC) and LPC-derived cepstral coefficients. Finally, we examine the modeling capabilities of Gaussian mixture models and support vector machines in sung language recognition experiments. Experiments conducted on a corpus of 400 popular songs sung in English, Chinese, German, and Indonesian, showed that the OSCC feature outperforms other features. A sung language recognition accuracy of 64.9% was achieved when Gaussian mixture models were trained on shifted-delta-OSCC acoustic features, extracted via rhythm based music segmentation.