Algorithms for clustering data
Algorithms for clustering data
Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
A probabilistic learning approach for document indexing
ACM Transactions on Information Systems (TOIS) - Special issue on research and development in information retrieval
Information retrieval: data structures and algorithms
Information retrieval: data structures and algorithms
Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Compression and fast indexing for multi-gigabyte text databases
Australian Computer Journal
HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering
Proceedings of the the seventh ACM conference on Hypertext
Dynamic itemset counting and implication rules for market basket data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Bayesian classification (AutoClass): theory and results
Advances in knowledge discovery and data mining
Multilevel hypergraph partitioning: application in VLSI domain
DAC '97 Proceedings of the 34th annual Design Automation Conference
The quest for correct information on the Web: hyper search engines
Selected papers from the sixth international conference on World Wide Web
Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient mining of association rules in text databases
Proceedings of the eighth international conference on Information and knowledge management
On Relevance, Probabilistic Indexing and Information Retrieval
Journal of the ACM (JACM)
Document Categorization and Query Generation on the World Wide WebUsing WebACE
Artificial Intelligence Review - Special issue on data mining on the Internet
ACM SIGKDD Explorations Newsletter
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Data Mining: An Overview from a Database Perspective
IEEE Transactions on Knowledge and Data Engineering
Using a Hash-Based Method with Transaction Trimming for Mining Association Rules
IEEE Transactions on Knowledge and Data Engineering
Text-Learning and Related Intelligent Agents: A Survey
IEEE Intelligent Systems
PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Frequent term-based text clustering
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Attribute (Feature) Completion - The Theory of Attributes from Data Mining Prospect
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Automatic Information Discovery from the "Invisible Web"
ITCC '02 Proceedings of the International Conference on Information Technology: Coding and Computing
Databases and the geometry of knowledge
Data & Knowledge Engineering
Granular Computing and Modeling the Human Thoughts in Web Documents
IFSA '07 Proceedings of the 12th international Fuzzy Systems Association world congress on Foundations of Fuzzy Logic and Soft Computing
Knowledge Based Search Engine: Granular Computing on the Web
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Granular computing: modeling human thoughts in the web by polyhedron
WImBI'06 Proceedings of the 1st WICI international conference on Web intelligence meets brain informatics
Semantic based real-time clustering for PubMed literatures
DS'07 Proceedings of the 10th international conference on Discovery science
Hi-index | 0.00 |
This paper presents a novel approach to document clustering based on some geometric structure in Combinatorial Topology. Given a set of documents, the set of associations among frequently co-occurring terms in documents forms naturally a simplicial complex. Our general thesis is each connected component of this simplicial complex represents a concept in the collection. Based on these concepts, documents can be clustered into meaningful classes. However, in this paper, we attack a softer notion, instead of connected components, we use maximal simplexes of highest dimension as representative of connected components, the concept so defined is called maximal primitive concepts. Experiments with three different data sets from Web pages and medical literature have shown that the proposed unsupervised clustering approach performs significantly better than traditional clustering algorithms, such as k-means, AutoClass and Hierarchical Clustering (HAG). This abstract geometric model seems have captured the latent semantic structure of documents.