Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
Evaluation of model-based retrieval effectiveness with OCR text
ACM Transactions on Information Systems (TOIS)
Effects of OCR errors on ranking and feedback using the vector space model
Information Processing and Management: an International Journal
Summarization of imaged documents without OCR
Computer Vision and Image Understanding - Special issue on document image understanding and retrieval
Cut-and-paste text summarization
Cut-and-paste text summarization
Adaptive multilingual sentence boundary disambiguation
Computational Linguistics
Named entity extraction from noisy input: speech and OCR
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A maximum entropy approach to identifying sentence boundaries
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Improving information extraction by modeling errors in speech recognizer output
HLT '01 Proceedings of the first international conference on Human language technology research
Some applications of tree-based modelling to speech and language
HLT '89 Proceedings of the workshop on Speech and Natural Language
Hi-index | 0.00 |
We investigate the problem of summarizing text documents that contain errors as a result of optical character recognition. Each stage in the process is tested, the error effects analyzed, and possible solutions suggested. Our experimental results show that current approaches, which are developed to deal with clean text, suffer significant degradation even with slight increases in the noise level of a document. We conclude by proposing possible ways of improving the performance of noisy document summarization.