Text Mining Biomedical Literature for Discovering Gene-to-Gene Relationships: A Comparative Study of Algorithms

Authors:
Ying Liu;Shamkant B. Navathe;Jorge Civera;Venu Dasigi;Ashwin Ram;Brian J. Ciliax;Ray Dingledine
Affiliations:
-;-;-;-;-;-;-
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2005

Citing 10
Cited 7

Vertical partitioning algorithms for database design

ACM Transactions on Database Systems (TODS)
Recent trends in hierarchic document clustering: a critical review

Information Processing and Management: an International Journal
Bayesian classification (AutoClass): theory and results

Advances in knowledge discovery and data mining
Principles of distributed database systems (2nd ed.)

Principles of distributed database systems (2nd ed.)
Data clustering: a review

ACM Computing Surveys (CSUR)
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Modern Information Retrieval

Modern Information Retrieval
Principal curves: learning, design, and applications

Principal curves: learning, design, and applications
Relationship-based clustering and cluster ensembles for high-dimensional data mining

Relationship-based clustering and cluster ensembles for high-dimensional data mining
Comparison of Two Schemes for Automatic Keyword Extraction from MEDLINE for Functional Gene Clustering

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference

Investigation into Biomedical Literature Classification Using Support Vector Machines

CSB '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference
Integrative mining of traditional Chinese medicine literature and MEDLINE for functional gene networks

Artificial Intelligence in Medicine
Text analysis of MEDLINE for discovering functional relationships among genes: evaluation of keyword extraction weighting schemes

International Journal of Data Mining and Bioinformatics
Microarray data analysis with PCA in a DBMS

Proceedings of the 2nd international workshop on Data and text mining in bioinformatics
@Note: A workbench for Biomedical Text Mining

Journal of Biomedical Informatics
Using UMLS to construct a generalized hierarchical concept-based dictionary of brain functions for information extraction from the fMRI literature

Journal of Biomedical Informatics
Visualization of relational structure among scientific articles

VISUAL'07 Proceedings of the 9th international conference on Advances in visual information systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Partitioning closely related genes into clusters has become an important element of practically all statistical analyses of microarray data. A number of computer algorithms have been developed for this task. Although these algorithms have demonstrated their usefulness for gene clustering, some basic problems remain. This paper describes our work on extracting functional keywords from MEDLINE for a set of genes that are isolated for further study from microarray experiments based on their differential expression patterns. The sharing of functional keywords among genes is used as a basis for clustering in a new approach called BEA-PARTITION in this paper. Functional keywords associated with genes were extracted from MEDLINE abstracts. We modified the Bond Energy Algorithm (BEA), which is widely accepted in psychology and database design but is virtually unknown in bioinformatics, to cluster genes by functional keyword associations. The results showed that BEA-PARTITION and hierarchical clustering algorithm outperformed k\hbox{-}{\rm{means}} clustering and self-organizing map by correctly assigning 25 of 26 genes in a test set of four known gene groups. To evaluate the effectiveness of BEA-PARTITION for clustering genes identified by microarray profiles, 44 yeast genes that are differentially expressed during the cell cycle and have been widely studied in the literature were used as a second test set. Using established measures of cluster quality, the results produced by BEA-PARTITION had higher purity, lower entropy, and higher mutual information than those produced by k\hbox{-}{\rm{means}} and self-organizing map. Whereas BEA-PARTITION and the hierarchical clustering produced similar quality of clusters, BEA-PARTITION provides clear cluster boundaries compared to the hierarchical clustering. BEA-PARTITION is simple to implement and provides a powerful approach to clustering genes or to any clustering problem where starting matrices are available from experimental observations.