Building a test collection for complex document information processing
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
The effect of OCR errors on stylistic text classification
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Building a test collection for complex document information processing
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
We developed a prototype for integrated retrieval and aggregation of diverse information contained in scanned paper documents. Such complex document information processing combines several forms of image processing together with textual/linguistic processing to enable effective analysis of complex document collections, a necessity for a wide range of applications. This is the first system to attempt integrated retrieval from complex documents; we report its current capabilities.