The effectiveness and efficiency of agglomerative hierarchic clustering in document retrieval
The effectiveness and efficiency of agglomerative hierarchic clustering in document retrieval
The nature of statistical learning theory
The nature of statistical learning theory
Reexamining the cluster hypothesis: scatter/gather on retrieval results
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Feature Selection: Evaluation, Application, and Small Sample Performance
IEEE Transactions on Pattern Analysis and Machine Intelligence
Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Making large-scale support vector machine learning practical
Advances in kernel methods
Grouper: a dynamic clustering interface to Web search results
WWW '99 Proceedings of the eighth international conference on World Wide Web
An investigation of linguistic features and clustering algorithms for topical document clustering
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
The effectiveness of query-specific hierarchic clustering in information retrieval
Information Processing and Management: an International Journal
IEEE Intelligent Systems
Cluster-based retrieval using language models
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to cluster web search results
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
HLT '01 Proceedings of the first international conference on Human language technology research
Tracking and summarizing news on a daily basis with Columbia's Newsblaster
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Automated multi-document summarization in NeATS
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Test collection management and labeling system
Proceedings of the 9th ACM symposium on Document engineering
Natural language processing tool to support web search
Proceedings of the 13th International Conference on Humans and Computers
Hi-index | 0.00 |
News portals gather and organize news articles published daily on the Internet. Typically, news articles are clustered into 'events' and each cluster is displayed with a short description of its contents. A particularly interesting choice for describing the contents of a cluster is a machine-generated multi-document summary of the articles in the cluster. Such summaries are informative and help news readers to identify and explore only clusters of interest. Naturally, multi-document clusters and summaries are also valuable to help users navigate the results of keyword-search queries. Unfortunately, current document summarizers are still slow; as a result, search strategies that define document clusters and their multi-document summaries online, in a query-specific manner, are prohibitively expensive. In contrast, search strategies that only return offline, query-independent document clusters are efficient, but might return clusters whose (query-independent) summaries are of little relevance to the queries. In this paper, we present an efficient Hybrid search strategy to address the limitations of fully online and fully offline summarization-aware search approaches. Extensive experiments involving user relevance judgments and real news articles show that the quality of our Hybrid results is high, and that these results are computed in substantially less time than with the fully online strategy. We have implemented our strategy and made it available on the Newsblaster news summarization system, which crawls and summarizes news articles from a variety of web sources on a daily basis.