Foundations of statistical natural language processing
Foundations of statistical natural language processing
Information Retrieval
Periods, capitalized words, etc.
Computational Linguistics
Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Adaptive multilingual sentence boundary disambiguation
Computational Linguistics
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
A maximum entropy approach to identifying sentence boundaries
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Scaled log likelihood ratios for the detection of abbreviations in text corpora
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 2
Methods for the qualitative evaluation of lexical association measures
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Some applications of tree-based modelling to speech and language
HLT '89 Proceedings of the workshop on Speech and Natural Language
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
A scalable global model for summarization
ILP '09 Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing
Sentence boundary detection and the problem with the U.S.
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Accurate learning for Chinese function tags from minimal features
ACLstudent '09 Proceedings of the ACL-IJCNLP 2009 Student Research Workshop
Restoring Punctuation and Casing in English Text
AI '09 Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence
Named entity recognition in Wikipedia
People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
Evaluating a statistical CCG parser on Wikipedia
People's Web '09 Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic Resources
Automatic summarisation of discussion fora
Natural Language Engineering
Distinguishing use and mention in natural language
HLT-SRWS '10 Proceedings of the NAACL HLT 2010 Student Research Workshop
Learning simple Wikipedia: a cogitation in ascertaining abecedarian language
CL&W '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics and Writing: Writing Processes and Authoring Aids
Towards semantic microaggregation of categorical data for confidential documents
MDAI'10 Proceedings of the 7th international conference on Modeling decisions for artificial intelligence
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Extracting definitions from brazilian legal texts
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part III
Cross-lingual genre classification
EACL '12 Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Design of a hybrid high quality machine translation system
EACL 2012 Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
NAIST at the HOO 2012 shared task
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
Automatically generated NE tagged corpora for English and Hungarian
NEWS '12 Proceedings of the 4th Named Entity Workshop
Non-syntactic word prediction for AAC
SLPAT '12 Proceedings of the Third Workshop on Speech and Language Processing for Assistive Technologies
Learning multilingual named entity recognition from Wikipedia
Artificial Intelligence
Improving search result summaries by using searcher behavior data
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Discovering collaborative knowledge-intensive processes through e-mail mining
Journal of Network and Computer Applications
Hi-index | 0.01 |
In this article, we present a language-independent, unsupervised approach to sentence boundary detection. It is based on the assumption that a large number of ambiguities in the determination of sentence boundaries can be eliminated once abbreviations have been identified. Instead of relying on orthographic clues, the proposed system is able to detect abbreviations with high accuracy using three criteria that only require information about the candidate type itself and are independent of context: Abbreviations can be defined as a very tight collocation consisting of a truncated word and a final period, abbreviations are usually short, and abbreviations sometimes contain internal periods. We also show the potential of collocational evidence for two other important subtasks of sentence boundary disambiguation, namely, the detection of initials and ordinal numbers. The proposed system has been tested extensively on eleven different languages and on different text genres. It achieves good results without any further amendments or language-specific resources. We evaluate its performance against three different baselines and compare it to other systems for sentence boundary detection proposed in the literature.