A maximum entropy approach to natural language processing
Computational Linguistics
Reexamining the cluster hypothesis: scatter/gather on retrieval results
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Concept decompositions for large sparse text data using clustering
Machine Learning
FOCS '02 Proceedings of the 43rd Symposium on Foundations of Computer Science
Proceedings of the 13th international conference on World Wide Web
Automatic categorization of query results
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Learning to cluster web search results
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Simple BM25 extension to multiple weighted fields
Proceedings of the thirteenth ACM international conference on Information and knowledge management
A search result clustering method using informatively named entities
Proceedings of the 7th annual ACM international workshop on Web information and data management
Information and Complexity in Statistical Modeling
Information and Complexity in Statistical Modeling
An information-theoretic external cluster-validity measure
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Improving quality of search results clustering with approximate matrix factorisations
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Fine-grained topic detection in news search results
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Full and mini-batch clustering of news articles with Star-EM
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Scalable dynamic nonparametric Bayesian models of content and users
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
In this paper, we present a system for clustering the search results of a news search engine. The news search interface includes the relevant news articles to a given query organized in terms of related news stories. Here each cluster corresponds to a news story and the news articles are clustered into stories. We present a system that clusters the search results of a news search system in a fast and scalable manner. The clustering system is organized into three components including offline clustering, incremental clustering and realtime clustering. We propose novel techniques for clustering the search results in realtime. The experimental results with large collections of news documents reveal that our system is both scalable and also achieves good accuracy in clustering the news search results.