Introduction to the theory of neural computation
Introduction to the theory of neural computation
A program for aligning sentences in bilingual corpora
Computational Linguistics - Special issue on using large corpora: I
Computational Linguistics - Special issue on using large corpora: I
A stochastic parts program and noun phrase parser for unrestricted text
ANLC '88 Proceedings of the second conference on Applied natural language processing
A practical part-of-speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
Some applications of tree-based modelling to speech and language
HLT '89 Proceedings of the workshop on Speech and Natural Language
Building a scalable and accurate copy detection mechanism
Proceedings of the first ACM international conference on Digital libraries
Periods, capitalized words, etc.
Computational Linguistics
Automatic Structuring of Written Texts
TSD '99 Proceedings of the Second International Workshop on Text, Speech and Dialogue
Mining free text for structure
Data mining
Adaptive multilingual sentence boundary disambiguation
Computational Linguistics
Experiments on sentence boundary detection
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Advances in domain independent linear text segmentation
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
A maximum entropy approach to identifying sentence boundaries
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Regular expressions for language engineering
Natural Language Engineering
Comma restoration using constituency information
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Improving translation quality of rule-based machine translation
COLING-MTIA '02 Proceedings of the 2002 COLING workshop on Machine translation in Asia - Volume 16
Tagging Sentence Boundaries in Biomedical Literature
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Constructing lexicon with morpho-syntactic features from untagged corpora
ECC'09 Proceedings of the 3rd international conference on European computing conference
Chinese utterance segmentation in spoken language translation
CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Using support vector machines for terrorism information extraction
ISI'03 Proceedings of the 1st NSF/NIJ conference on Intelligence and security informatics
Detecting sentence boundaries in japanese speech transcriptions using a morphological analyzer
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
A case study of using web search statistics: case restoration
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
A chinese sentence segmentation approach based on comma
CLSW'12 Proceedings of the 13th Chinese conference on Chinese Lexical Semantics
Relevant learning objects extraction based on semantic annotation
International Journal of Metadata, Semantics and Ontologies
Hi-index | 0.00 |
Labeling of sentence boundaries is a necessary prerequisite for many natural language processing tasks, including part-of-speech tagging and sentence alignment. End-of-sentence punctuation marks are ambiguous; to disambiguate them most systems use brittle, special-purpose regular expression grammars and exception rules. As an alternative, we have developed an efficient, trainable algorithm that uses a lexicon with part-of-speech probabilities and a feed-forward neural network. This work demonstrates the feasibility of using prior probabilities of part-of-speech assignments, as opposed to words or definite part-of-speech assignments, as contextual information. After training for less than one minute, the method correctly labels over 98.5% of sentence boundaries in a corpus of over 27,000 sentence-boundary marks. We show the method to be efficient and easily adaptable to different text genres, including single-case texts.