Genetic programming (videotape): the movie
Genetic programming (videotape): the movie
PAT-tree-based keyword extraction for Chinese information retrieval
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric
Journal of the ACM (JACM)
Periods, capitalized words, etc.
Computational Linguistics
Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions
Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Bottom-up relational learning of pattern matching rules for information extraction
The Journal of Machine Learning Research
PAT-trees with the deletion function as the learning device for linguistic patterns
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Literature Extraction of Protein Functions Using Sentence Pattern Mining
IEEE Transactions on Knowledge and Data Engineering
Data & Knowledge Engineering
Finding optimal parameters for edit distance based sequence classification is NP-hard
Proceedings of the KDD-09 Workshop on Statistical and Relational Learning in Bioinformatics
Identifying interaction sentences from biological literature using automatically extracted patterns
BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
An unsupervised method for extracting domain-specific affixes in biological literature
BioNLP '07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing
Comparative experiments on learning information extractors for proteins and their interactions
Artificial Intelligence in Medicine
Hi-index | 0.00 |
An important task in information retrieval is to identify sentences that contain important relationships between key concepts. In this work, we propose a novel approach to automatically extract sentence patterns that contain interactions involving concepts of molecular biology. A pattern is defined in this work as a sequence of specialized Part-of-Speech (POS) tags that capture the structure of key sentences in the scientific literature. Each candidate sentence for the classification task is encoded as a POS array and then aligned to a collection of pre-extracted patterns. The quality of the alignment is expressed as a pairwise alignment score. The most innovative component of this work is the use of a genetic algorithm (GA) to maximize the classification performance of the alignment scoring scheme. The system achieves an average F-score of 0.796 in identifying sentences which describe interactions between co-occurring biological concepts. This performance is mostly affected by the quality of the preprocessing steps such as term identification and POS tagging.