Multistage Gene Normalization and SVM-Based Ranking for Protein Interactor Extraction in Full-Text Articles

Authors:
Hong-Jie Dai;Po-Ting Lai;Richard Tzong-Han Tsai
Affiliations:
National Tsing-Hua University, Hsinchu, Taiwan;Yuan Ze University, Ching-Li, Taiwan;Yuan Ze University, Ching-Li, Taiwan
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2010

Citing 11
Cited 3

The nature of statistical learning theory

The nature of statistical learning theory
The automatic generation of literature abstracts: an approach based on the identification of self-indicating phrases

SIGIR '80 Proceedings of the 3rd annual ACM conference on Research and development in information retrieval
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Rule-based extraction of experimental evidence in the biomedical domain: the KDD Cup 2002 (task 1)

ACM SIGKDD Explorations Newsletter
Distribution of information in biomedical abstracts and full-text publications

Bioinformatics
Discovering patterns to extract protein--protein interactions from full texts

Bioinformatics
Integrating image data into biomedical text categorization

Bioinformatics
Challenges for extracting biomedical knowledge from full text

BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Alignment-based surface patterns for factoid question answering systems

Integrated Computer-Aided Engineering - Selected papers from the IEEE Conference on Information Reuse and Integration (IRI), July 13-15, 2008
Using contextual information to clarify gene normalization ambiguity

IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition

Global ranking via data fusion

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Enhancing search results with semantic annotation using augmented browsing

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Improving Protein-Protein Interaction Pair Ranking with an Integrated Global Association Score

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The interactor normalization task (INT) is to identify genes that play the interactor role in protein-protein interactions (PPIs), to map these genes to unique IDs, and to rank them according to their normalized confidence. INT has two subtasks: gene normalization (GN) and interactor ranking. The main difficulties of INT GN are identifying genes across species and using full papers instead of abstracts. To tackle these problems, we developed a multistage GN algorithm and a ranking method, which exploit information in different parts of a paper. Our system achieved a promising AUC of 0.43471. Using the multistage GN algorithm, we have been able to improve system performance (AUC) by 1.719 percent compared to a one-stage GN algorithm. Our experimental results also show that with full text, versus abstract only, INT AUC performance was 22.6 percent higher.