Comparing words, stems, and roots as index terms in an Arabic Information Retrieval System
Journal of the American Society for Information Science
Searching distributed collections with inference networks
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Degraded text recognition using visual and linguistic context
Degraded text recognition using visual and linguistic context
Stemming methodologies over individual query words for an Arabic information retrieval system
Journal of the American Society for Information Science
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Term selection for searching printed Arabic
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Translation-Based Indexing for Cross-Language Retrieval
Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
The Retrieval of Document Images: A Brief Survey
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Probabilistic Retrieval of OCR Degraded Text Using N-Grams
ECDL '97 Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries
A Faster Algorithm for Approximate String Matching
CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
Probabilistic structured query methods
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Towards a single proposal in spelling correction
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
A morphologically sensitive clustering algorithm for identifying Arabic roots
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
An improved error model for noisy channel spelling correction
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Information retrieval system evaluation: effort, sensitivity, and reliability
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
The text retrieval conferences (TRECS)
TIPSTER '98 Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998
Speech and Language Processing (2nd Edition)
Speech and Language Processing (2nd Edition)
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Word-Based correction for retrieval of arabic OCR degraded documents
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Phrase-based query degradation modeling for vocabulary-independent ranked utterance retrieval
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Hi-index | 0.00 |
Due to the existence of large numbers of legacy documents (such as old books and newspapers), improving retrieval effectiveness for OCR'ed documents continues to be an important problem. This article compares the effect of OCR error correction with and without language modeling and the effect of query garbling with weighted structured queries on the retrieval of OCR degraded Arabic documents. The results suggest that moderate error correction does not yield statistically significant improvement in retrieval effectiveness when indexing and searching using n-grams. Also, reversing error correction models to perform query garbling in conjunction with weighted structured queries yields improved retrieval effectiveness. Lastly, using very good error correction that utilizes language modeling yields the best improvement in retrieval effectiveness.