An application of text categorization methods to gene ontology annotation

Authors:
Kazuhiro Seki;Javed Mostafa
Affiliations:
Indiana University, Bloomington, Bloomington, Indiana;Indiana University, Bloomington, Bloomington, Indiana
Venue:
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2005

Citing 7
Cited 8

A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text retrieval conference (TREC) genomics pre-track workshop

Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Supervised term weighting for automated text categorization

Proceedings of the 2003 ACM symposium on Applied computing
Report on TREC 2003 genomics track first-year results and future plans

ACM SIGIR Forum
Biomedical named entity recognition using conditional random fields and rich feature sets

JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications

Combining gene sequence similarity and textual information for gene function annotation in the literature

Information Retrieval
Gene ontology annotation as text categorization: An empirical study

Information Processing and Management: an International Journal
A Bayesian framework for knowledge driven regression model in micro-array data analysis

International Journal of Data Mining and Bioinformatics
Web page classification: Features and algorithms

ACM Computing Surveys (CSUR)
Exploiting Gene Ontology to Conceptualize Biomedical Document Collections

ASWC '08 Proceedings of the 3rd Asian Semantic Web Conference on The Semantic Web
A Bayesian framework for knowledge driven regression model in micro-array data analysis

International Journal of Data Mining and Bioinformatics
Adaptive subjective triggers for opinionated document retrieval

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Automatic extraction of domain-specific stopwords from labeled documents

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes an application of IR and text categorization methods to a highly practical problem in biomedicine, specifically, Gene Ontology (GO) annotation. GO annotation is a major activity in most model organism database projects and annotates gene functions using a controlled vocabulary. As a first step toward automatic GO annotation, we aim to assign GO domain codes given a specific gene and an article in which the gene appears, which is one of the task challenges at the TREC 2004 Genomics Track. We approached the task with careful consideration of the specialized terminology and paid special attention to dealing with various forms of gene synonyms, so as to exhaustively locate the occurrences of the target gene. We extracted the words around the gene occurrences and used them to represent the gene for GO domain code annotation. As a classifier, we adopted a variant of k-Nearest Neighbor (kNN) with supervised term weighting schemes to improve the performance, making our method among the top-performing systems in the TREC official evaluation. Moreover, it is demonstrated that our proposed framework is successfully applied to another task of the Genomics Track, showing comparable results to the best performing system.