Silhouettes: a graphical aid to the interpretation and validation of cluster analysis
Journal of Computational and Applied Mathematics
Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Comparison of hierarchic agglomerative clustering methods for document retrieval
The Computer Journal
Probabilistic document indexing from relevance feedback data
SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Implementing an efficient minimum capacity cut algorithm
Mathematical Programming: Series A and B
Reexamining the cluster hypothesis: scatter/gather on retrieval results
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
The cluster hypothesis revisited
SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Grouper: a dynamic clustering interface to Web search results
WWW '99 Proceedings of the eighth international conference on World Wide Web
Using clustering and classification approaches in interactive retrieval
Information Processing and Management: an International Journal - Special issue on interactivity at the text retrieval conference (TREC)
Unsupervised learning by probabilistic latent semantic analysis
Machine Learning
Evaluating document clustering for interactive information retrieval
Proceedings of the tenth international conference on Information and knowledge management
Information Retrieval
Finding the flow in web site search
Communications of the ACM
The effectiveness of query-specific hierarchic clustering in information retrieval
Information Processing and Management: an International Journal
Faceted metadata for image search and browsing
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Using Noun Phrase Heads to Extract Document Keyphrases
AI '00 Proceedings of the 13th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
Cluster Validation with Generalized Dunn's Indices
ANNES '95 Proceedings of the 2nd New Zealand Two-Stream International Conference on Artificial Neural Networks and Expert Systems
Document clustering based on non-negative matrix factorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
The Journal of Machine Learning Research
From Retrieval Status Values to Probabilities of Relevance for Advanced IR Applications
Information Retrieval
Entity-based cross-document coreferencing using the Vector Space Model
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Locality preserving indexing for document representation
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Cluster-based retrieval using language models
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Findex: search result categories help users when document ranking fails
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Query-sensitive similarity measures for information retrieval
Knowledge and Information Systems
Regularizing ad hoc retrieval scores
Proceedings of the 14th ACM international conference on Information and knowledge management
The SMART Retrieval System—Experiments in Automatic Document Processing
The SMART Retrieval System—Experiments in Automatic Document Processing
Keyword-based document clustering
AsianIR '03 Proceedings of the sixth international workshop on Information retrieval with Asian languages - Volume 11
Respect my authority!: HITS without hyperlinks, utilizing cluster-based language models
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Document clustering with prior knowledge
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A Bayesian Model for Supervised Clustering with the Dirichlet Process Prior
The Journal of Machine Learning Research
Enhancing the Effectiveness of Clustering with Spectra Analysis
IEEE Transactions on Knowledge and Data Engineering
Text document clustering based on frequent word meaning sequences
Data & Knowledge Engineering
A probability ranking principle for interactive information retrieval
Information Retrieval
The opposite of smoothing: a language model approach to ranking query-specific document clusters
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A cluster-based resampling method for pseudo-relevance feedback
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Spectral geometry for simultaneously clustering and ranking query search results
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A rank-aggregation approach to searching for optimal query-specific clusters
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval
Introduction to Information Retrieval
Efficient Phrase-Based Document Similarity for Clustering
IEEE Transactions on Knowledge and Data Engineering
Clustering XML Documents by Combining Content and Structure
ISISE '08 Proceedings of the 2008 International Symposium on Information Science and Engieering - Volume 01
A comparison of extrinsic clustering evaluation metrics based on formal constraints
Information Retrieval
Dynamicity vs. effectiveness: studying online clustering for scatter/gather
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Score Distributions in Information Retrieval
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Modeling the Score Distributions of Relevant and Non-relevant Documents
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
A New Measure of the Cluster Hypothesis
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Computing semantic relatedness using Wikipedia-based explicit semantic analysis
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
NLP support for faceted navigation in scholarly collections
NLPIR4DL '09 Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries
A uniqueness theorem for clustering
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Query-performance prediction and cluster ranking: two sides of the same coin
Proceedings of the 21st ACM international conference on Information and knowledge management
Probabilistic co-relevance for query-sensitive similarity measurement in information retrieval
Information Processing and Management: an International Journal
Ranking document clusters using markov random fields
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
From keywords to keyqueries: content descriptors for the web
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Exploiting Forum Thread Structures to Improve Thread Clustering
Proceedings of the 2013 Conference on the Theory of Information Retrieval
Hi-index | 0.00 |
Document clustering offers the potential of supporting users in interactive retrieval, especially when users have problems in specifying their information need precisely. In this paper, we present a theoretic foundation for optimum document clustering. Key idea is to base cluster analysis and evalutation on a set of queries, by defining documents as being similar if they are relevant to the same queries. Three components are essential within our optimum clustering framework, OCF: (1) a set of queries, (2) a probabilistic retrieval method, and (3) a document similarity metric. After introducing an appropriate validity measure, we define optimum clustering with respect to the estimates of the relevance probability for the query-document pairs under consideration. Moreover, we show that well-known clustering methods are implicitly based on the three components, but that they use heuristic design decisions for some of them. We argue that with our framework more targeted research for developing better document clustering methods becomes possible. Experimental results demonstrate the potential of our considerations.