TMBIO '06 Proceedings of the 1st international workshop on Text mining in bioinformatics
Knowledge discovery based on an implicit and explicit conceptual network
Journal of the American Society for Information Science and Technology
Biomedical knowledge navigation by literature clustering
Journal of Biomedical Informatics
Gene tree labeling using nonnegative matrix factorization on biomedical literature
Computational Intelligence and Neuroscience - Advances in Nonnegative Matrix and Tensor Factorization
Methodological Review: Empirical distributional semantics: Methods and biomedical applications
Journal of Biomedical Informatics
Beyond clustering of array expressions
International Journal of Bioinformatics Research and Applications
GOClonto: An ontological clustering approach for conceptualizing PubMed abstracts
Journal of Biomedical Informatics
Predicting Novel Human Gene Ontology Annotations Using Semantic Analysis
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Wise search engine based on LSI
ADMI'10 Proceedings of the 6th international conference on Agents and data mining interaction
ICCS'11 Proceedings of the 19th international conference on Conceptual structures for discovering knowledge
Social link recommendation by learning hidden topics
Proceedings of the fifth ACM conference on Recommender systems
Literature-based discovery: Beyond the ABCs
Journal of the American Society for Information Science and Technology
Hi-index | 3.84 |
Motivation: A major challenge in the interpretation of high-throughput genomic data is understanding the functional associations between genes. Previously, several approaches have been described to extract gene relationships from various biological databases using term-matching methods. However, more flexible automated methods are needed to identify functional relationships (both explicit and implicit) between genes from the biomedical literature. In this study, we explored the utility of Latent Semantic Indexing (LSI), a vector space model for information retrieval, to automatically identify conceptual gene relationships from titles and abstracts in MEDLINE citations. Results: We found that LSI identified gene-to-gene and keyword-to-gene relationships with high average precision. In addition, LSI identified implicit gene relationships based on word usage patterns in the gene abstract documents. Finally, we demonstrate here that pairwise distances derived from the vector angles of gene abstract documents can be effectively used to functionally group genes by hierarchical clustering. Our results provide proof-of-principle that LSI is a robust automated method to elucidate both known (explicit) and unknown (implicit) gene relationships from the biomedical literature. These features make LSI particularly useful for the analysis of novel associations discovered in genomic experiments. Availability: The 50-gene document collection used in this study can be interactively queried at http://shad.cs.utk.edu/sgo/sgo.html Contact: rhomayouni@utmem.edu Supplementary information: http://shad.cs.utk.edu/sgo/pubs.html