Fast text searching: allowing errors
Communications of the ACM
Corpus processing for lexical acquisition
Corpus processing for lexical acquisition
Corpus processing for lexical acquisition
An Efficient, Probabilistically Sound Algorithm for Segmentation andWord Discovery
Machine Learning - Special issue on natural language learning
Grammar fragment acquisition using syntactic and semantic clustering
Speech Communication
An investigation of linguistic features and clustering algorithms for topical document clustering
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Inference of Variable-length Acoustic Units for Continuous Speech Recognition
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 3 - Volume 3
Approaches to Phoneme-Based Topic Spotting: An Experimental Comparison
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 3 - Volume 3
Pattern discovery in sequence databases: algorithms and applications to dna/protein classification
Pattern discovery in sequence databases: algorithms and applications to dna/protein classification
Topic segmentation: algorithms and applications
Topic segmentation: algorithms and applications
The Journal of Machine Learning Research
TextTiling: segmenting text into multi-paragraph subtopic passages
Computational Linguistics
Advances in domain independent linear text segmentation
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Effective utterance classification with unsupervised phonotactic models
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Active learning for classifying phone sequences from unsupervised phonotactic models
NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
A phonotactic-semantic paradigm for automatic spoken document classification
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Using co-composition for acquiring syntactic and semantic subcategorisation
ULA '02 Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition - Volume 9
BuzzTrack: topic detection and tracking in email
Proceedings of the 12th international conference on Intelligent user interfaces
An algorithm for unsupervised topic discovery from broadcast news stories
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Learning concept hierarchies from text corpora using formal concept analysis
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
This work deals with automatic lexical acquisition and topic discovery from a speech stream. The proposed algorithm builds a lexicon enriched with topic information in three steps: transcription of an audio stream into phone sequences with a speaker- and task-independent phone recogniser, automatic lexical acquisition based on approximate string matching, and hierarchical topic clustering of the lexical entries based on a knowledge-poor co-occurrence approach. The resulting semantic lexicon is then used to automatically cluster the incoming speech stream into topics. The main advantages of this algorithm are its very low computational requirements and its independence to pre-defined linguistic resources, which makes it easy to port to new languages and to adapt to new tasks. It is evaluated both qualitatively and quantitatively on two corpora and on two tasks related to topic clustering. The results of these evaluations are encouraging and outline future directions of research for the proposed algorithm, such as building automatic orthographic labels of the lexical items.