Natural language understanding (2nd ed.)
Natural language understanding (2nd ed.)
Using Combinatory Categorial Grammar to Extract Biomedical Information
IEEE Intelligent Systems
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Comparative experiments on learning information extractors for proteins and their interactions
Artificial Intelligence in Medicine
BioProber2.0: a unified biomedical workbench with mining and probing literatures
Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
Hi-index | 0.00 |
We are developing an information extraction system for life science literature. We are currently focusing on PubMed abstracts and trying to extract named entities and their relationships, especially protein names and protein-protein interactions. We are adopting methods including natural language processing, machine learning, and text processing. But we are not developing a new tagging or parsing technique. Developing a new tagger or a new parser specialized in life science literature is a very complex job. And it is not easy to get a good result by tuning an existing parser or by training it without a sufficient corpus. These all are another research topics and we are trying to extract information, not to develop something to help the extracting job or else. In this paper, we introduce our method to use an existing full parser without training or tuning. After tagging sentences and extracting proteins, we make sentences simple by substituting some words like named entities, nouns into one word. Then parsing errors are reduced and parsing precision is increased by this sentence simplification. We parse the simplified sentences syntactically with an existing syntactic parser and extract protein-protein interactions from its results. We show the effects of sentence simplification and syntactic parsing.