Dynamic hybrid clustering of bioinformatics by incorporating text mining and citation analysis

Authors:
Frizo Janssens;Wolfgang Glänzel;Bart De Moor
Affiliations:
Katholieke Universiteit Leuven;Katholieke Universiteit Leuven;Katholieke Universiteit Leuven
Venue:
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2007

Citing 13
Cited 7

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

Journal of Computational and Applied Mathematics
Algorithms for clustering data

Algorithms for clustering data
Using linear algebra for intelligent information retrieval

SIAM Review
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Clustering hypertext with applications to web searching

HYPERTEXT '00 Proceedings of the eleventh ACM on Hypertext and hypermedia
Modern Information Retrieval

Modern Information Retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Evaluating contents-link coupled web page clustering for web search results

Proceedings of the eleventh international conference on Information and knowledge management
Automatic Topic Identification Using Webpage Clustering

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Lucene in Action (In Action series)

Lucene in Action (In Action series)
Discovering evolutionary theme patterns from text: an exploration of temporal text mining

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Essential Dimensions of Latent Semantic Indexing (LSI)

HICSS '07 Proceedings of the 40th Annual Hawaii International Conference on System Sciences

Hybrid clustering for validation and improvement of subject-classification schemes

Information Processing and Management: an International Journal
STORIES in Time: A Graph-Based Interface for News Tracking and Discovery

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Analyzing knowledge communities using foreground and background clusters

ACM Transactions on Knowledge Discovery from Data (TKDD)
From bursty patterns to bursty facts: The effectiveness of temporal text mining for news

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Publication activity, citation impact and bi-directional links between publications and patents in biotechnology

Scientometrics
Detecting the knowledge structure of bioinformatics by mining full-text collections

Scientometrics
Story graphs: Tracking document set evolution using dynamic graphs

Intelligent Data Analysis - Dynamic Networks and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

To unravel the concept structure and dynamics of the bioinformatics field, we analyze a set of 7401 publications from the Web of Science and MEDLINE databases, publication years 1981-2004. For delineating this complex, interdisciplinary field, a novel bibliometric retrieval strategy is used. Given that the performance of unsupervised clustering and classification of scientific publications is significantly improved by deeply merging textual contents with the structure of the citation graph, we proceed with a hybrid clustering method based on Fisher's inverse chi-square. The optimal number of clusters is determined by a compound semiautomatic strategy comprising a combination of distance-based and stability-based methods. We also investigate the relationship between number of Latent Semantic Indexing factors, number of clusters, and clustering performance. The HITS and PageRank algorithms are used to determine representative publications in each cluster. Next, we develop a methodology for dynamic hybrid clustering of evolving bibliographic data sets. The same clustering methodology is applied to consecutive periods defined by time windows on the set, and in a subsequent phase chains are formed by matching and tracking clusters through time. Term networks for the eleven resulting cluster chains present the cognitive structure of the field. Finally, we provide a view on how much attention the bioinformatics community has devoted to the different subfields through time.