A Winnow-Based Approach to Context-Sensitive Spelling Correction
Machine Learning - Special issue on natural language learning
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Contextual spelling correction using latent semantic analysis
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Combining Trigram-based and feature-based methods for context-sensitive spelling correction
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Labeling images with a computer game
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
AnnoSearch: Image Auto-Annotation by Search
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Extracting personal names from email: applying named entity recognition to informal text
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
N-gram Statistics in English and Chinese: Similarities and Differences
ICSC '07 Proceedings of the International Conference on Semantic Computing
On filtering irrelevant results in peer-to-peer search
Proceedings of the 2008 ACM symposium on Applied computing
On multiword entity ranking in peer-to-peer search
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
While lexical statistics of formal text play a central role in many statistical Natural Language Processing (NLP) and Information Retrieval (IR) tasks, there is little known about lexical statistics of informal and short documents. To learn the unique characteristics of informal text, we construct an N-gram study on P2P data, and present the insights, problems, and differences from formal text. Consequently, we apply a probabilistic model for detecting and correcting spelling problems (not necessarily errors) and propose an enrichment method that makes many P2P files better accessible to relevant user queries. Our enrichment results show an improvement in both recall and precision with only a slight increase in the collection size.