The effects of noisy data on text retrieval
Journal of the American Society for Information Science
A complex document information processing prototype
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A complex document information processing prototype
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
The effect of OCR errors on stylistic text classification
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient logo retrieval through hashing shape context descriptors
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
E-discovery revisited: the need for artificial intelligence beyond information retrieval
Artificial Intelligence and Law
Evaluation of information retrieval for E-discovery
Artificial Intelligence and Law
GROTOAP: ground truth for open access publications
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Hi-index | 0.00 |
Research and development of information access technology for scanned paper documents has been hampered by the lack of public test collections of realistic scope and complexity. As part of a project to create a prototype system for search and mining of masses of document images, we are assembling a 1.5 terabyte dataset to support evaluation of both end-to-end complex document information processing (CDIP) tasks (e.g., text retrieval and data mining) as well as component technologies such as optical character recognition (OCR), document structure analysis, signature matching, and authorship attribution.