Learning information extraction rules for protein annotation from unannotated corpora

Authors:
Jee-Hyub Kim;Melanie Hilario
Affiliations:
Artificial Intelligence Lab, University of Geneva, Geneva 4, Switzerland;Artificial Intelligence Lab, University of Geneva, Geneva 4, Switzerland
Venue:
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Year:
2005

Citing 7
Cited 0

An introduction to inductive logic programming and learning language in logic

Learning language in logic
Learning for text categorization and information extraction with ILP

Learning language in logic
A Corpus-Based Learning Method of Compound Noun Indexing Rules for Korean

Information Retrieval
Rule induction

Intelligent data analysis
Bottom-up relational learning of pattern matching rules for information extraction

The Journal of Machine Learning Research
Classifying semantic relations in bioscience texts

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Automatically generating extraction patterns from untagged text

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the number of published papers on proteins increases rapidly, manual protein annotation for biological sequence databases faces the problem of catching up with the speed of publication. Automated information extraction for protein annotation offers a solution to this problem. Generally, information extraction tasks have relied on the availability of pre-defined templates as well as annotated corpora. However, in many real world applications, it is difficult to fulfill this requirement; only relevant sentences for target domains can be easily collected. At the same time, other resources can be harnessed to compensate for this difficulty: natural language processing provides reliable tools for syntactic text analysis, and in bio-medical domains, there is a large amount of background knowledge available, e.g., in the form of ontologies. In this paper, we present a method for learning information extraction rules without pre-defined templates or labor-intensive pre-annotation by exploiting various types of background knowledge in an inductive logic programming framework.