Integration of semantic-based bipartite graph representation and mutual refinement strategy for biomedical literature clustering

Authors:
Illhoi Yoo;Xiaohua Hu;Il-Yeol Song
Affiliations:
University of Missouri-Columbia, Columbia, MO;Drexel University, Philadelphia, PA;Drexel University, Philadelphia, PA
Venue:
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2006

Citing 18
Cited 11

Recent trends in hierarchic document clustering: a critical review

Information Processing and Management: an International Journal
Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
A system for discovering relationships by feature extraction from text databases

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Reexamining the cluster hypothesis: scatter/gather on retrieval results

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Optimization of inverted vector searches

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Document clustering using word clusters via the information bottleneck method

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval

Information Retrieval
Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchically Classifying Documents Using Very Few Words

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Frequent term-based text clustering

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
An Adaptive Meta-Clustering Approach: Combining the Information from Different Clustering Results

CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Word association norms, mutual information, and lexicography

ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
Document clustering by concept factorization

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A comprehensive comparison study of document clustering for a biomedical digital library MEDLINE

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries

2008 Special Issue: Exploration of a collection of documents in neuroscience and extraction of topics by clustering

Neural Networks
Exploiting Wikipedia as external knowledge for document clustering

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A comparative study of ontology based term similarity measures on PubMed document clustering

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Term weighting evaluation in bipartite partitioning for text clustering

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Learning ontology resolution for document representation and its applications in text mining

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Towards bipartite graph data management

CloudDB '10 Proceedings of the second international workshop on Cloud data management
Ontology enhancement and concept granularity learning: keeping yourself current and adaptive

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Representing document as dependency graph for document clustering

Proceedings of the 20th ACM international conference on Information and knowledge management
Enriching short text representation in microblog for clustering

Frontiers of Computer Science in China
Ontology-enriched multi-document summarization in disaster management using submodular function

Information Sciences: an International Journal
A semantic social network-based expert recommender system

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce a novel document clustering approach that overcomes those problems by combining a semantic-based bipartite graph representation and a mutual refinement strategy. The primary contributions of this paper are the following. First, we introduce a new representation of documents using a bipartite graph between documents and co-occurrence concepts in the documents. Second, we show how to enhance clustering quality by applying the mutual refinement strategy to the initial clustering results. Third, through the experiments on MEDLINE documents, we show that our integrated method significantly enhances cluster quality and clustering reliability compared to existing clustering methods. Our approach improves on the average 29.5 cluster quality and 26.3 clustering reliability, in terms of misclassification index, over Bisecting K-means with the best parameters.