Automatic Text Theme Generation and the Analysis of Text Structure
Automatic Text Theme Generation and the Analysis of Text Structure
The Unsupervised Acquisition of a Lexicon from Continuous Speech
The Unsupervised Acquisition of a Lexicon from Continuous Speech
Word clustering and disambiguation based on co-occurrence data
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Distributional clustering of English words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
The SMART Retrieval System—Experiments in Automatic Document Processing
The SMART Retrieval System—Experiments in Automatic Document Processing
ITERATE: a conceptual clustering algorithm for data mining
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Multi-candidate reduction: Sentence compression as a tool for document summarization tasks
Information Processing and Management: an International Journal
Automatic discovery of topics and acoustic morphemes from speech
Computer Speech and Language
Who is who and what is what: experiments in cross-document co-reference
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Propagating Fine-Grained Topic Labels in News Snippets
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Hi-index | 0.00 |
The cost of annotating a large corpus with thousands of distinct topics is high. In addition, human annotators usually fail to indicate all of the relevant topics for each document. It would be desirable to determine the topics in any new domain or language automatically, given only a large corpus in that domain and language. We present an algorithm called Unsupervised Topic Discovery, which creates topics from a collection of news stories, provides a human understandable topic label and then assigns the topics to the news stories. Finally, we report our results on a collection of broadcast news stories in English and Arabic.