A simplicial complex, a hypergraph, structure in the latent semantic space of document clustering

Authors:
Tsau Young Lin;I-Jen Chiang
Affiliations:
Department of Computer Science, San Jose State University, One Washington Square, San Jose, CA 95192-0249, USA;Graduate Institute of Medical Informatics, Taipei Medical University, 205 Wu-Hsien Street, Taipei 110, Taiwan, ROC
Venue:
International Journal of Approximate Reasoning
Year:
2005

Citing 28
Cited 5

Algorithms for clustering data

Algorithms for clustering data
Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
A probabilistic learning approach for document indexing

ACM Transactions on Information Systems (TOIS) - Special issue on research and development in information retrieval
Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Compression and fast indexing for multi-gigabyte text databases

Australian Computer Journal
HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering

Proceedings of the the seventh ACM conference on Hypertext
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Bayesian classification (AutoClass): theory and results

Advances in knowledge discovery and data mining
Multilevel hypergraph partitioning: application in VLSI domain

DAC '97 Proceedings of the 34th annual Design Automation Conference
The quest for correct information on the Web: hyper search engines

Selected papers from the sixth international conference on World Wide Web
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient mining of association rules in text databases

Proceedings of the eighth international conference on Information and knowledge management
On Relevance, Probabilistic Indexing and Information Retrieval

Journal of the ACM (JACM)
Document Categorization and Query Generation on the World Wide WebUsing WebACE

Artificial Intelligence Review - Special issue on data mining on the Internet
Web mining research: a survey

ACM SIGKDD Explorations Newsletter
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Data Mining: An Overview from a Database Perspective

IEEE Transactions on Knowledge and Data Engineering
Using a Hash-Based Method with Transaction Trimming for Mining Association Rules

IEEE Transactions on Knowledge and Data Engineering
Text-Learning and Related Intelligent Agents: A Survey

IEEE Intelligent Systems
Text Mining at the Term Level

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Frequent term-based text clustering

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Attribute (Feature) Completion - The Theory of Attributes from Data Mining Prospect

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Automatic Information Discovery from the "Invisible Web"

ITCC '02 Proceedings of the International Conference on Information Technology: Coding and Computing

Databases and the geometry of knowledge

Data & Knowledge Engineering
Granular Computing and Modeling the Human Thoughts in Web Documents

IFSA '07 Proceedings of the 12th international Fuzzy Systems Association world congress on Foundations of Fuzzy Logic and Soft Computing
Knowledge Based Search Engine: Granular Computing on the Web

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Granular computing: modeling human thoughts in the web by polyhedron

WImBI'06 Proceedings of the 1st WICI international conference on Web intelligence meets brain informatics
Semantic based real-time clustering for PubMed literatures

DS'07 Proceedings of the 10th international conference on Discovery science

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a novel approach to document clustering based on some geometric structure in Combinatorial Topology. Given a set of documents, the set of associations among frequently co-occurring terms in documents forms naturally a simplicial complex. Our general thesis is each connected component of this simplicial complex represents a concept in the collection. Based on these concepts, documents can be clustered into meaningful classes. However, in this paper, we attack a softer notion, instead of connected components, we use maximal simplexes of highest dimension as representative of connected components, the concept so defined is called maximal primitive concepts. Experiments with three different data sets from Web pages and medical literature have shown that the proposed unsupervised clustering approach performs significantly better than traditional clustering algorithms, such as k-means, AutoClass and Hierarchical Clustering (HAG). This abstract geometric model seems have captured the latent semantic structure of documents.