Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Adaptive multilingual sentence boundary disambiguation
Computational Linguistics
A maximum entropy approach to identifying sentence boundaries
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
A knowledge-free method for capitalized word disambiguation
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
MITRE: description of the Alembic system used for MUC-6
MUC6 '95 Proceedings of the 6th conference on Message understanding
Some applications of tree-based modelling to speech and language
HLT '89 Proceedings of the workshop on Speech and Natural Language
Periods, capitalized words, etc.
Computational Linguistics
Formal Methods of Tokenization for Part-of-Speech Tagging
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Meta-evaluation of summaries in a cross-lingual environment using content-based metrics
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Updating an NLP system to fit new domains: an empirical study on the sentence segmentation problem
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Unsupervised Multilingual Sentence Boundary Detection
Computational Linguistics
QCS: A system for querying, clustering and summarizing documents
Information Processing and Management: an International Journal
A New Type of Feature --- Loose N-Gram Feature in Text Categorization
IbPRIA '07 Proceedings of the 3rd Iberian conference on Pattern Recognition and Image Analysis, Part I
Comparing citation contexts for information retrieval
Proceedings of the 17th ACM conference on Information and knowledge management
Tagging Sentence Boundaries in Biomedical Literature
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Retrieval of snippets of web pages converted to plain text: more questions than answers
CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Period disambiguation with maxent model
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Hi-index | 0.00 |
In this paper we tackle sentence boundary disambiguation through a part-of-speech (POS) tagging framework. We describe necessary changes in text tokenization and the implementation of a POS tagger and provide results of an evaluation of this system on two corpora. We also describe an extension of the traditional POS tagging by combining it with the document-centered approach to proper name identification and abbreviation handling. This made the resulting system robust to domain and topic shifts.