Making large-scale support vector machine learning practical
Advances in kernel methods
Periods, capitalized words, etc.
Computational Linguistics
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Adaptive multilingual sentence boundary disambiguation
Computational Linguistics
A maximum entropy approach to identifying sentence boundaries
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
MITRE: description of the Alembic system used for MUC-6
MUC6 '95 Proceedings of the 6th conference on Message understanding
NLTK: the Natural Language Toolkit
ETMTNLP '02 Proceedings of the ACL-02 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics - Volume 1
Unsupervised Multilingual Sentence Boundary Detection
Computational Linguistics
Analysis of discourse structure with syntactic dependencies and data-driven shift-reduce parsing
IWPT '09 Proceedings of the 11th International Conference on Parsing Technologies
Say Anything: Using Textual Case-Based Reasoning to Enable Open-Domain Interactive Storytelling
ACM Transactions on Interactive Intelligent Systems (TiiS) - Special Issue on Common Sense for Interactive Systems
Toward developing a very big sign language parallel corpus
ICCHP'12 Proceedings of the 13th international conference on Computers Helping People with Special Needs - Volume Part II
Using discourse information for paraphrase extraction
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
Sub-sentence extraction based on combinatorial optimization
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Multi-document text summarization using topic model and fuzzy logic
MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
Hi-index | 0.00 |
Sentence Boundary Detection is widely used but often with outdated tools. We discuss what makes it difficult, which features are relevant, and present a fully statistical system, now publicly available, that gives the best known error rate on a standard news corpus: Of some 27,000 examples, our system makes 67 errors, 23 involving the word "U.S."