Fundamentals of speech recognition
Fundamentals of speech recognition
Information Extraction: Techniques and Challenges
SCIE '97 International Summer School on Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology
Named entity extraction from noisy input: speech and OCR
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Information access in the presence of OCR errors
Proceedings of the 1st ACM workshop on Hardcopy document processing
Mining knowledge from text using information extraction
ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Statistics for Engineering and the Sciences (5th Edition)
Statistics for Engineering and the Sciences (5th Edition)
Name extraction and formal concept analysis
ICCS'11 Proceedings of the 19th international conference on Conceptual structures for discovering knowledge
Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Hi-index | 0.00 |
OCR error has been shown not to affect the average accuracy of text retrieval or text categorization.Recent studies however have indicated that information extraction is significantly degraded by OCR error. We experimented with information extraction software on two collections, one with OCR-ed documents and another with manually-corrected versions of the former. We discovered a significant reduction in accuracy on the OCR text versus the corrected text. The majority of errors were attributable to zoning problems rather than OCR classification errors.