Text analysis of MEDLINE for discovering functional relationships among genes: evaluation of keyword extraction weighting schemes

Authors:
Ying Liu;Shamkant B. Navathe;Alex Pivoshenko;Venu G. Dasigi;Ray Dingledine;Brian J. Ciliax
Affiliations:
Laboratory for Bioinformatics and Medical Informatics, Department of Computer Science, The University of Texas at Dallas, Richardson, TX 75083-0688, USA.;Georgia Institute of Technology, College of Computing, 801 Atlantic Drive, Atlanta, GA 30322, USA.;Georgia Institute of Technology, College of Computing, 801 Atlantic Drive, Atlanta, GA 30322, USA.;Department of Computer Science, School of Computing and Software Engineering, Southern Polytechnic State University, Marietta, GA 30060, USA.;Department of Pharmacology, Emory University School of Medicine, Atlanta, GA 30322, USA.;Department of Neurology, Emory University School of Medicine, Atlanta, GA 30322, USA
Venue:
International Journal of Data Mining and Bioinformatics
Year:
2006

Citing 7
Cited 2

Vertical partitioning algorithms for database design

ACM Transactions on Database Systems (TODS)
Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Bayesian classification (AutoClass): theory and results

Advances in knowledge discovery and data mining
Data clustering: a review

ACM Computing Surveys (CSUR)
Principal curves: learning, design, and applications

Principal curves: learning, design, and applications
Relationship-based clustering and cluster ensembles for high-dimensional data mining

Relationship-based clustering and cluster ensembles for high-dimensional data mining
Text Mining Biomedical Literature for Discovering Gene-to-Gene Relationships: A Comparative Study of Algorithms

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Integrating flexibility and interactivity in bioinformatics visual programming tools with Focus+Context algorithm

International Journal of Data Mining and Bioinformatics
A Knowledge Mining Approach for Effective Customer Relationship Management

International Journal of Knowledge-Based Organizations

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the key challenges of microarray studies is to derive biological insights from the gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the functional links among genes. However, the quality of the keyword lists significantly affects the clustering results. We compared two keyword weighting schemes: normalised z-score and term frequency inverse document frequency (TFIDF). Two gene sets were tested to evaluate the effectiveness of the weighting schemes for keyword extraction for gene clustering. Using established measures of cluster quality, the results produced from TFIDF-weighted keywords outperformed those produced from normalised z-score weighted keywords. The optimised algorithms should be useful for partitioning genes from microarray lists into functionally discrete clusters.