Overview of the second text retrieval conference (TREC-2)
TREC-2 Proceedings of the second conference on Text retrieval conference
The text retrieval conferences (TRECS)
TIPSTER '98 Proceedings of a workshop on held at Baltimore, Maryland: October 13-15, 1998
Improved string matching under noisy channel conditions
Proceedings of the tenth international conference on Information and knowledge management
An Investigation of Mixed-Media Information Retrieval
ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
A Corpus for Comparative Evaluation of OCR Software and Postcorrection Techniques
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Cross-Evaluation: A new model for information system evaluation
Journal of the American Society for Information Science and Technology
Information Processing and Management: an International Journal
The methodology and an application to fight against Unicode attacks
SOUPS '06 Proceedings of the second symposium on Usable privacy and security
Generating semantic annotations for frequent patterns with context analysis
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Named entity transliteration with comparable corpora
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Semantic annotation of frequent patterns
ACM Transactions on Knowledge Discovery from Data (TKDD)
On ranking techniques for desktop search
ACM Transactions on Information Systems (TOIS)
Successfully detecting and correcting false friends using channel profiles
Proceedings of the second workshop on Analytics for noisy unstructured text data
A study of remembered context for information access from personal digital archives
Proceedings of the second international symposium on Information interaction in context
Revisiting N-Gram Based Models for Retrieval in Degraded Large Collections
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Unsupervised named entity transliteration using temporal and phonetic correlation
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Homophones and tonal patterns in English-Chinese transliteration
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Graphemic approximation of phonological context for English-Chinese transliteration
NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Information Processing and Management: an International Journal
An approach for adding noise-tolerance to restricted-domain information retrieval
NLDB'10 Proceedings of the Natural language processing and information systems, and 15th international conference on Applications of natural language to information systems
Machine transliteration survey
ACM Computing Surveys (CSUR)
Comparative information retrieval evaluation for scanned documents
Proceedings of the 15th WSEAS international conference on Computers
Using string comparison in context for improved relevance feedback in different text media
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Artificial Intelligence in Medicine
Hi-index | 0.00 |
A known-item search is a particular information retrieval task in which the system is asked to find a single target document in a large document set. The TREC-5 confusion track used a set of 49 known-item tasks to study the impact of data corruption on retrieval system performance. Two corrupted versions of a 55,600 document corpus whose true content was known were created by applying OCR techniques to page images. The first version of the corpus used the page images as scanned, resulting in an estimated character error rate of approximately 5%. The second version used page images that had been down-sampled, resulting in an estimated character error rate of approximately 20%. The true text and each of the corrupted versions were then searched using the same set of 49 questions. In general, retrieval methods that attempted a probabilistic reconstruction of the original clean text fared better than methods that simply accepted corrupted versions of the query text.