Review of neural networks for speech recognition
Neural Computation
Introduction to the theory of neural computation
Introduction to the theory of neural computation
C4.5: programs for machine learning
C4.5: programs for machine learning
Emergent linguistic rules from inducing decision trees: disambiguating discourse clue words
AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Corpus-driven knowledge acquisition for discourse analysis
AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Connectionist Speech Recognition: A Hybrid Approach
Connectionist Speech Recognition: A Hybrid Approach
Machine Learning
A program for aligning sentences in bilingual corpora
Computational Linguistics - Special issue on using large corpora: I
Computational Linguistics - Special issue on using large corpora: I
A stochastic parts program and noun phrase parser for unrestricted text
ANLC '88 Proceedings of the second conference on Applied natural language processing
Adaptive sentence boundary disambiguation
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
A practical part-of-speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
Statistical decision-tree models for parsing
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Neural network approach to word category prediction for English texts
COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3
K-vec: a new approach for aligning parallel texts
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
MITRE: description of the Alembic system used for MUC-6
MUC6 '95 Proceedings of the 6th conference on Message understanding
Some applications of tree-based modelling to speech and language
HLT '89 Proceedings of the workshop on Speech and Natural Language
Semantic classes and syntactic ambiguity
HLT '93 Proceedings of the workshop on Human Language Technology
Document centered approach to text normalization
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Seeing the whole in parts: text summarization for web browsing on handheld devices
Proceedings of the 10th international conference on World Wide Web
Efficient web browsing on handheld devices using page and form summarization
ACM Transactions on Information Systems (TOIS)
Integrated multi-strategic Web document pre-processing for sentence and word boundary detection
Information Processing and Management: an International Journal
Periods, capitalized words, etc.
Computational Linguistics
Universal Segmentation of Text with the Sumo Formalism
NLP '00 Proceedings of the Second International Conference on Natural Language Processing
The rhetorical parsing of unrestricted texts: a surface-based approach
Computational Linguistics
A statistical information extraction system for Turkish
Natural Language Engineering
Language independent morphological analysis
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
A maximum entropy approach to identifying sentence boundaries
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Feature lattices for maximum entropy modelling
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
A formalism for universal segmentation of text
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Automatic corpus-based Thai word extraction with the c4.5 learning algorithm
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Robust document image understanding technologies
Proceedings of the 1st ACM workshop on Hardcopy document processing
A knowledge-free method for capitalized word disambiguation
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Deep Read: a reading comprehension system
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
A decision-based approach to rhetorical parsing
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Scaled log likelihood ratios for the detection of abbreviations in text corpora
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 2
Sentence level discourse parsing using syntactic and lexical information
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Reducing parsing complexity by intra-sentence segmentation based on maximum entropy model
EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Summarization of noisy documents: a pilot study
HLT-NAACL-DUC '03 Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5
Broad coverage paragraph segmentation across languages and domains
ACM Transactions on Speech and Language Processing (TSLP)
Unsupervised Multilingual Sentence Boundary Detection
Computational Linguistics
Expert Systems with Applications: An International Journal
TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
ADROIT: automatic discourse relation organizer of internet-based text
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Sentence boundary detection and the problem with the U.S.
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Teaching applied natural language processing: triumphs and tribulations
TeachNLP '05 Proceedings of the Second ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics
What did they do? Deriving high-level edit histories in Wikis
Proceedings of the 6th International Symposium on Wikis and Open Collaboration
Using SRX standard for sentence segmentation
LTC'09 Proceedings of the 4th conference on Human language technology: challenges for computer science and linguistics
Sentence boundary detection in turkish
ADVIS'04 Proceedings of the Third international conference on Advances in Information Systems
Syntactic analysis of long sentences based on s-clauses
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
WISE'05 Proceedings of the 2005 international conference on Web Information Systems Engineering
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Period disambiguation with maxent model
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Hi-index | 0.00 |
The sentence is a standard textual unit in natual language processing applications. In many language the punctuation mark that indicates the end-of-sentence boundary is ambiguous; thus the tokenizers of most NLP systems must be equipped with special sentence boundary recognition rules for every new text collection.As an alternative, this article presents an efficient, trainable system for sentence boundary disambiguation. The system, called Satz, makes simple estimates of the parts of speech of the tokens immediately preceding and following each punctuation mark, and uses these estimates as input to a machine learning algorithm that then classifies the punctuation mark. Satz is very fast both in training and sentence analysis, and its combined robustness and accuracy surpass existing techniques. The system needs only a small lexicon and training corpus, and has been shown to transfer quickly and easily from English to other languages, as demonstrated on Franch and German.