A Cache-Based Natural Language Model for Speech Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Identifying unknown proper names in newswire text
Corpus processing for lexical acquisition
Language Model Adaptation Using Mixtures and an Exponentially Decaying Cache
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Adaptive multilingual sentence boundary disambiguation
Computational Linguistics
Automatic rule induction for unknown-word guessing
Computational Linguistics
A knowledge-free method for capitalized word disambiguation
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
MITRE: description of the Alembic system used for MUC-6
MUC6 '95 Proceedings of the 6th conference on Message understanding
Some applications of tree-based modelling to speech and language
HLT '89 Proceedings of the workshop on Speech and Natural Language
HLT '91 Proceedings of the workshop on Speech and Natural Language
Structured information retrieval in XML documents
Proceedings of the 2002 ACM symposium on Applied computing
Probabilistic question answering on the web
Proceedings of the 11th international conference on World Wide Web
Integrated multi-strategic Web document pre-processing for sentence and word boundary detection
Information Processing and Management: an International Journal
Formal Methods of Tokenization for Part-of-Speech Tagging
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Capitalization Recovery for Text
Information Retrieval Techniques for Speech Applications [this book is based on the workshop “Information Retrieval Techniques for Speech Applications”, held as part of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in New Orleans, USA, in September 2001].
Probabilistic question answering on the Web: Research Articles
Journal of the American Society for Information Science and Technology
Reversing controlled document authoring to normalize documents
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Challenges and resources for evaluating geographical IR
Proceedings of the 2005 workshop on Geographic information retrieval
Discovery of implicit and explicit connections between people using email utterance
ECSCW'03 Proceedings of the eighth conference on European Conference on Computer Supported Cooperative Work
Word Particles Applied to Information Retrieval
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
On privacy preservation in text and document-based active learning for named entity recognition
Proceedings of the ACM first international workshop on Privacy and anonymity for very large databases
Rewriting the orthography of sms messages
Natural Language Engineering
Hi-index | 0.01 |
In this paper we present an approach to tackle three important problems of text normalization: sentence boundary disambiguation, disambiguation of capitalized words when they are used in positions where capitalization is expected, and identification of abbreviations. The main feature of our approach is that it uses a minimum of pre-built resources, instead dynamically inferring disambiguation clues from the entire document itself. This makes it domain independent, closely targeted to each individual document and portable to other languages. We thoroughly evaluated this approach on several corpora and it showed high accuracy.