Tagging Sentence Boundaries in Biomedical Literature

Authors:
Weijian Xuan;Stanley J. Watson;Fan Meng
Affiliations:
Molecular and Behavioral Neuroscience Institute and Department of Psychiatry, University of Michigan, Ann Arbor, Michigan 48109,;Molecular and Behavioral Neuroscience Institute and Department of Psychiatry, University of Michigan, Ann Arbor, Michigan 48109,;Molecular and Behavioral Neuroscience Institute and Department of Psychiatry, University of Michigan, Ann Arbor, Michigan 48109,
Venue:
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Year:
2009

Citing 11
Cited 0

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Capturing term dependencies using a language model based on sentence trees

Proceedings of the eleventh international conference on Information and knowledge management
Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Text Segmentation by Topic

ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
Identifying Gene and Protein Names from Biological Texts

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Advances in domain independent linear text segmentation

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Tagging sentence boundaries

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Adaptive sentence boundary disambiguation

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
A maximum entropy approach to identifying sentence boundaries

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Aligning sentences in parallel corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
MITRE: description of the Alembic system used for MUC-6

MUC6 '95 Proceedings of the 6th conference on Message understanding

Quantified Score

Hi-index	0.00

Visualization

Abstract

Identifying sentence boundaries is an indispensable task for most natural language processing (NLP) systems. While extensive efforts have been devoted to mine biomedical text using NLP techniques, few attempts are specifically targeted at disambiguating sentence boundaries in biomedical literature, which has a number of unique features that can reduce the accuracy of algorithms designed for general English genre significantly. In order to increase the accuracy of sentence boundary identification for biomedical literature, we developed a method using a combination of heuristic and statistical strategies. Our approach does not require part-of-speech taggers or training procedures. Experiments with biomedical test corpora show our system significantly outperforms existing sentence boundary determination algorithms, particularly for full text biomedical literature. Our system is very fast and it should also be easily adaptable for sentence boundary determination in scientific literature from non-biomedical fields.