Theoretical Computer Science
Minimisation of acyclic deterministic automata in linear time
Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
Machine Learning
Joint lexicon, acoustic unit inventory and model design
Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Finite-state transducers in language and speech processing
Computational Linguistics
Computer Vision for Music Identification
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
A Review of Audio Fingerprinting
Journal of VLSI Signal Processing Systems
Content-based methods for the management of digital music
ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04
Waveprint: Efficient wavelet-based audio fingerprinting
Pattern Recognition
An audio indexing system for election video material
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
General suffix automaton construction algorithm and space bounds
Theoretical Computer Science
General indexation of weighted automata: application to spoken utterance retrieval
SpeechIR '04 Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT-NAACL 2004
OpenFst: a general and efficient weighted finite-state transducer library
CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata
Factor automata of automata and applications
CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata
Analysis of Minimum Distances in High-Dimensional Musical Spaces
IEEE Transactions on Audio, Speech, and Language Processing
Music identification via vocabulary tree with MFCC peaks
MIRUM '11 Proceedings of the 1st international ACM workshop on Music information retrieval with user-centered and multimodal strategies
On the learnability of shuffle ideals
ALT'12 Proceedings of the 23rd international conference on Algorithmic Learning Theory
On the learnability of shuffle ideals
The Journal of Machine Learning Research
Hi-index | 0.00 |
We present an approach to music identification based on weighted finite-state transducers and Gaussian mixture models, inspired by techniques used in large-vocabulary speech recognition. Our modeling approach is based on learning a set of elementary music sounds in a fully unsupervised manner. While the space of possible music sound sequences is very large, our method enables the construction of a compact and efficient representation for the song collection using finite-state transducers. This paper gives a novel and substantially faster algorithm for the construction of factor transducers, the key representation of song snippets supporting our music identification technique. The complexity of our algorithm is linear with respect to the size of the suffix automaton constructed. Our experiments further show that it helps speed up the construction of the weighted suffix automaton in our task by a factor of 17 with respect to our previous method using the intermediate steps of determinization and minimization. We show that, using these techniques, a large-scale music identification system can be constructed for a database of over 15 000 songs while achieving an identification accuracy of 99.4% on undistorted test data, and performing robustly in the presence of noise and distortions.