Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Inferring hierarchical descriptions
Proceedings of the eleventh international conference on Information and knowledge management
The Journal of Machine Learning Research
Centroid-based summarization of multiple documents
Information Processing and Management: an International Journal
A clustering method for news articles retrieval system
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
A Concept-Driven Algorithm for Clustering Search Results
IEEE Intelligent Systems
Automatically labeling hierarchical clusters
dg.o '06 Proceedings of the 2006 international conference on Digital government research
Automatic Discovery of Concepts from Text
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Multidocument Summary Generation: Using Informative and Event Words
ACM Transactions on Asian Language Information Processing (TALIP)
Introduction to Information Retrieval
Introduction to Information Retrieval
A survey of Web clustering engines
ACM Computing Surveys (CSUR)
Dynamicity vs. effectiveness: studying online clustering for scatter/gather
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Enhancing cluster labeling using wikipedia
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Clustering and visualization in a multi-lingual multi-document summarization system
ECIR'03 Proceedings of the 25th European conference on IR research
Prototype hierarchy based clustering for the categorization and navigation of web collections
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
Scatter/Gather systems are increasingly becoming useful in browsing document corpora. Usability of the present-day systems are restricted to monolingual corpora, and their methods for clustering and labeling do not easily extend to the multilingual setting, especially in the absence of dictionaries/machine translation. In this paper, we study the cluster labeling problem for multilingual corpora in the absence of machine translation, but using comparable corpora. Using a variational approach, we show that multilingual topic models can effectively handle the cluster labeling problem, which in turn allows us to design a novel Scatter/Gather system ShoBha. Experimental results on three datasets, namely the Canadian Hansards corpus, the entire overlapping Wikipedia of English, Hindi and Bengali articles, and a trilingual news corpus containing 41,000 articles, confirm the utility of the proposed system.