Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions
Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data
Journal of Biomedical Informatics
Term identification in the biomedical literature
Journal of Biomedical Informatics - Special issue: Named entity recognition in biomedicine
Gene name ambiguity of eukaryotic nomenclatures
Bioinformatics
Resolving abbreviations to their senses in Medline
Bioinformatics
Two-phase biomedical NE recognition based on SVMs
BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
An investigation of various information sources for classifying biological names
BioMed '03 Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine - Volume 13
Mining semantically related terms from biomedical literature
ACM Transactions on Asian Language Information Processing (TALIP)
Mining biomedical abstracts: what’s in a term?
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Measuring prediction capacity of individual verbs for the identification of protein interactions
Journal of Biomedical Informatics
Hi-index | 0.00 |
Publishers of biomedical journals increasingly use XML as the underlying document format. We present a modular text-processing pipeline that inserts XML markup into such documents in every processing step, leading to multi-dimensional markup. The markup introduced is used to identify and disambiguate named entities of several semantic types (protein/gene, Gene Ontology terms, drugs and species) and to communicate data from one module to the next. Each module independently adds, changes or removes markup, which allows for modularization and a flexible setup of the processing pipeline. We also describe how the cascaded approach is embedded in a large-scale XML-based application (EBIMed) used for on-line access to biomedical literature. We discuss the lessons learnt so far, as well as the open problems that need to be resolved. In particular, we argue that the pragmatic and tailored solutions allow for reduction in the need for overlapping annotations --- although not completely without cost.