Modern Information Retrieval
The TREC-5 Confusion Track: Comparing Retrieval Methods for Scanned Text
Information Retrieval
IEEE Transactions on Pattern Analysis and Machine Intelligence
Intelligent Text Extraction from PDF Documents
CIMCA '05 Proceedings of the International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce Vol-2 (CIMCA-IAWTIC'06) - Volume 02
Exploring distributional similarity based models for query spelling correction
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Special issue on noisy text analytics
International Journal on Document Analysis and Recognition
A survey of types of text noise and techniques to handle noisy text
Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
Special issue on noisy text analytics
International Journal on Document Analysis and Recognition - Special Issue NOISY
Question answering system for incomplete and noisy data: methods and measures for its evaluation
ECIR'03 Proceedings of the 25th European conference on IR research
Automatic filtering of bilingual corpora for statistical machine translation
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data
Hi-index | 0.00 |
Corpus of Information Retrieval (IR) systems are formed by text documents that often come from rather heterogeneous sources, such as Web sites or OCR (Optical Character Recognition) systems. Faithfully converting these sources into flat text files is not a trivial task, since noise can be easily introduced due to spelling or typeset errors. Importantly, if the size of the corpus is large enough, then redundancy helps in controlling the effects of noise because the same text often appears with and without noise throughout the corpus. Conversely, noise becomes a serious problem in restricted-domain IR where corpus is usually small and it has little or no redundancy. Therefore, noise hinders the retrieval task in restricted domains and erroneous results are likely to be obtained. In order to overcome this situation, this paper presents an approach for using restricted-domain resources, such as Knowledge Organization Systems (KOS), to add noise-tolerance to existing IR systems. To show the suitability of our approach in one real restricted-domain case study, a set of experiments has been carried out for the agricultural domain.