Modern elementary statistics
PAT-tree-based keyword extraction for Chinese information retrieval
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric
Journal of the ACM (JACM)
Periods, capitalized words, etc.
Computational Linguistics
Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions
Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
PAT-trees with the deletion function as the learning device for linguistic patterns
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Comparative experiments on learning information extractors for proteins and their interactions
Artificial Intelligence in Medicine
Finding optimal parameters for edit distance based sequence classification is NP-hard
Proceedings of the KDD-09 Workshop on Statistical and Relational Learning in Bioinformatics
Information Processing and Management: an International Journal
Hi-index | 0.00 |
An important task in information retrieval is to identify sentences that contain important relationships between key concepts. In this work, we propose a novel approach to automatically extract sentence patterns that contain interactions involving concepts of molecular biology. A pattern is defined in this work as a sequence of specialized Part-of-Speech (POS) tags that capture the structure of key sentences in the scientific literature. Each candidate sentence for the classification task is encoded as a POS array and then aligned to a collection of pre-extracted patterns. The quality of the alignment is expressed as a pairwise alignment score. The most innovative component of this work is the use of a Genetic Algorithm (GA) to maximize the classification performance of the alignment scoring scheme. The system achieves an F-score of 0.834 in identifying sentences which describe interactions between biological entities. This performance is mostly affected by the quality of the preprocessing steps such as term identification and POS tagging.