A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text retrieval conference (TREC) genomics pre-track workshop
Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Supervised term weighting for automated text categorization
Proceedings of the 2003 ACM symposium on Applied computing
Biomedical named entity recognition using conditional random fields and rich feature sets
JNLPBA '04 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications
Gene ontology annotation as text categorization: An empirical study
Information Processing and Management: an International Journal
A Bayesian framework for knowledge driven regression model in micro-array data analysis
International Journal of Data Mining and Bioinformatics
Web page classification: Features and algorithms
ACM Computing Surveys (CSUR)
Exploiting Gene Ontology to Conceptualize Biomedical Document Collections
ASWC '08 Proceedings of the 3rd Asian Semantic Web Conference on The Semantic Web
A Bayesian framework for knowledge driven regression model in micro-array data analysis
International Journal of Data Mining and Bioinformatics
Adaptive subjective triggers for opinionated document retrieval
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Automatic extraction of domain-specific stopwords from labeled documents
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Hi-index | 0.00 |
This paper describes an application of IR and text categorization methods to a highly practical problem in biomedicine, specifically, Gene Ontology (GO) annotation. GO annotation is a major activity in most model organism database projects and annotates gene functions using a controlled vocabulary. As a first step toward automatic GO annotation, we aim to assign GO domain codes given a specific gene and an article in which the gene appears, which is one of the task challenges at the TREC 2004 Genomics Track. We approached the task with careful consideration of the specialized terminology and paid special attention to dealing with various forms of gene synonyms, so as to exhaustively locate the occurrences of the target gene. We extracted the words around the gene occurrences and used them to represent the gene for GO domain code annotation. As a classifier, we adopted a variant of k-Nearest Neighbor (kNN) with supervised term weighting schemes to improve the performance, making our method among the top-performing systems in the TREC official evaluation. Moreover, it is demonstrated that our proposed framework is successfully applied to another task of the Genomics Track, showing comparable results to the best performing system.