Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Reexamining the cluster hypothesis: scatter/gather on retrieval results
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Optimization of inverted vector searches
SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
ACM Computing Surveys (CSUR)
Data mining: concepts and techniques
Data mining: concepts and techniques
Information Retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Hierarchically Classifying Documents Using Very Few Words
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Clustering with Instance-level Constraints
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
ROCK: A Robust Clustering Algorithm for Categorical Attributes
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Word association norms, mutual information, and lexicography
ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
A matrix density based algorithm to hierarchically co-cluster documents and words
WWW '03 Proceedings of the 12th international conference on World Wide Web
Efficient Phrase-Based Document Indexing for Web Document Clustering
IEEE Transactions on Knowledge and Data Engineering
Document clustering based on cluster validation
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Rule-based word clustering for document metadata extraction
Proceedings of the 2005 ACM symposium on Applied computing
Scalable hierarchical topic detection: exploring a sample based approach
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Combining preference- and content-based approaches for improving document clustering effectiveness
Information Processing and Management: an International Journal
A comprehensive comparison study of document clustering for a biomedical digital library MEDLINE
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Text mining without document context
Information Processing and Management: an International Journal - Special issue: Informetrics
Information Processing and Management: an International Journal
A comparison of alternative parse tree paths for labeling semantic roles
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Journal of Management Information Systems
Biomedical ontology improves biomedical literature clustering performance: a comparison study
International Journal of Bioinformatics Research and Applications
A collaborative filtering-based approach to personalized document clustering
Decision Support Systems
A Latent Semantic Indexing-based approach to multilingual document clustering
Decision Support Systems
Winnowing-based text clustering
Proceedings of the 17th ACM conference on Information and knowledge management
Finding cohesive clusters for analyzing knowledge communities
Knowledge and Information Systems
Managing Word Mismatch Problems in Information Retrieval: A Topic-Based Query Expansion Approach
Journal of Management Information Systems
Exploiting noun phrases and semantic relationships for text document clustering
Information Sciences: an International Journal
Evaluation of Text Clustering Algorithms with N-Gram-Based Document Fingerprints
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
A survey of Web clustering engines
ACM Computing Surveys (CSUR)
Preserving User Preferences in Automated Document-Category Management: An Evolution-Based Approach
Journal of Management Information Systems
An Approach to Web-Scale Named-Entity Disambiguation
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Computing term translation probabilities with generalized latent semantic analysis
EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations
Avoiding Bias in Text Clustering Using Constrained K-means and May-Not-Links
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Answer typing for information retrieval
Proceedings of the 18th ACM conference on Information and knowledge management
Exploiting corpus-related ontologies for conceptualizing document corpora
Journal of the American Society for Information Science and Technology
Automatic generation of information-seeking questions using concept clusters
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Unsupervised learning of narrative schemas and their participants
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Combining preference- and content-based approaches for improving document clustering effectiveness
Information Processing and Management: an International Journal
GOClonto: An ontological clustering approach for conceptualizing PubMed abstracts
Journal of Biomedical Informatics
Analyzing knowledge communities using foreground and background clusters
ACM Transactions on Knowledge Discovery from Data (TKDD)
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Term committee based event identification within news topics
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
A knowledge-driven approach to biomedical document conceptualization
Artificial Intelligence in Medicine
From frequency to meaning: vector space models of semantics
Journal of Artificial Intelligence Research
DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
Comprehensible and accurate cluster labels in text clustering
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
A novel approach for research paper abstracts summarization using cluster based sentence extraction
Proceedings of the International Conference & Workshop on Emerging Trends in Technology
Using a Wikipedia-based semantic relatedness measure for document clustering
TextGraphs-6 Proceedings of TextGraphs-6: Graph-based Methods for Natural Language Processing
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Mission-based navigational behaviour modeling for web recommender systems
WebKDD'04 Proceedings of the 6th international conference on Knowledge Discovery on the Web: advances in Web Mining and Web Usage Analysis
Multi-document summarization based on BE-Vector clustering
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Document clustering with grouping and chaining algorithms
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Information retrieval from the web: an interactive paradigm
MIS'05 Proceedings of the 11th international conference on Advances in Multimedia Information Systems
An experimental study of constrained clustering effectiveness in presence of erroneous constraints
Information Processing and Management: an International Journal
Phrase clustering without document context
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Space projections as distributional models for semantic composition
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Looking at word meaning: an interactive visualization of semantic vector spaces for Dutch synsets
EACL 2012 Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH
Journal of Web Engineering
Mining entity attribute synonyms via compact clustering
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Unsupervised identification of synonymous query intent templates for attribute intents
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Locality mutual clustering for document retrieval
Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
Hi-index | 0.00 |
Document clustering is useful in many information retrieval tasks: document browsing, organization and viewing of retrieval results, generation of Yahoo-like hierarchies of documents, etc. The general goal of clustering is to group data elements such that the intra-group similarities are high and the inter-group similarities are low. We present a clustering algorithm called CBC (Clustering By Committee) that is shown to produce higher quality clusters in document clustering tasks as compared to several well known clustering algorithms. It initially discovers a set of tight clusters (high intra-group similarity), called committees, that are well scattered in the similarity space (low inter-group similarity). The union of the committees is but a subset of all elements. The algorithm proceeds by assigning elements to their most similar committee. Evaluating cluster quality has always been a difficult task. We present a new evaluation methodology that is based on the editing distance between output clusters and manually constructed classes (the answer key). This evaluation measure is more intuitive and easier to interpret than previous evaluation measures.