Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Hairetes: A Search Engine for OCR Documents
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Building a test collection for complex document information processing
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A complex document information processing prototype
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Which clustering do you want? inducing your ideal clustering with minimal feedback
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
Recently, interest is growing in non-topical text classification tasks such as genre classification, sentiment analysis, and authorship profiling. We study to what extent OCR errors affect stylistic text classification from scanned documents. We find that even a relatively high level of errors in the OCRed documents does not substantially affect stylistic classification accuracy.