Discover the semantic topology in high-dimensional data

Authors:
I-Jen Chiang
Affiliations:
Graduate Institute of Medical Informatics, Taipei Medical University, Taipei 110, Taiwan, ROC and Graduate Institute of Biomedical Engineering, National Taiwan University, Taipei 100, Taiwan, ROC
Venue:
Expert Systems with Applications: An International Journal
Year:
2007

Citing 9
Cited 2

Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Reexamining the cluster hypothesis: scatter/gather on retrieval results

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Document Categorization and Query Generation on the World Wide WebUsing WebACE

Artificial Intelligence Review - Special issue on data mining on the Internet
Web mining research: a survey

ACM SIGKDD Explorations Newsletter
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Frequent term-based text clustering

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to match and cluster large high-dimensional data sets for data integration

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic Information Discovery from the "Invisible Web"

ITCC '02 Proceedings of the International Conference on Information Technology: Coding and Computing

Semantic based real-time clustering for PubMed literatures

DS'07 Proceedings of the 10th international conference on Discovery science
An application of fuzzy information granulation in the emerging area of online sports

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.05

Visualization

Abstract

Discovering the homogeneous concept groups in the high-dimensional data sets and clustering them accordingly are contemporary challenge. Conventional clustering techniques often based on Euclidean metric. However, the metric is ad hoc not intrinsic to the semantic of the documents. In this paper, we are proposing a novel approach, in which the semantic space of high-dimensional data is structured as a simplicial complex of Euclidean space (a hypergraph but with different focus). Such a simplicial structure intrinsically captures the semantic of the data; for example, the coherent topics of documents will appear in the same connected component. Finally, we cluster the data by the structure of concepts, which is organized by such a geometry.