Improving text categorization methods for event tracking
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Unsupervised and supervised clustering for topic tracking
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Termight: Coordinating Humans and Machines in Bilingual Terminology Acquisition
Machine Translation
Constrained K-means Clustering with Background Knowledge
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Semi-supervised Clustering by Seeding
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
An empirical study of smoothing techniques for language modeling
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Language-specific models in multilingual topic tracking
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
We address the problem dealing with skewed data, and propose a method for estimating effective training stories for the topic tracking task. For a small number of labelled positive stories, we extract story pairs which consist of positive and its associated stories from bilingual comparable corpora. To overcome the problem of a large number of labelled negative stories, we classify them into some clusters. This is done by using k-means with EM. The results on the TDT corpora show the effectiveness of the method.