Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Style Context with Second-Order Statistics
IEEE Transactions on Pattern Analysis and Machine Intelligence
Accessibility commons: a metadata infrastructure for web accessibility
Proceedings of the 10th international ACM SIGACCESS conference on Computers and accessibility
Social accessibility: achieving accessibility through collaborative metadata authoring
Proceedings of the 10th international ACM SIGACCESS conference on Computers and accessibility
Daisy 3: A Standard for Accessible Multimedia Books
IEEE MultiMedia
What's Next? A Visual Editor for Correcting Reading Order
INTERACT '09 Proceedings of the 12th IFIP TC 13 International Conference on Human-Computer Interaction: Part I
Low Cost Correction of OCR Errors Using Learning in a Multi-Engine Environment
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Google Newspaper Search - Image Processing and Analysis Pipeline
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Image quality assessment based on multiscale geometric analysis
IEEE Transactions on Image Processing
Auditory accessibility of metadata in books: a design for all approach
UAHCI'07 Proceedings of the 4th international conference on Universal access in human-computer interaction: applications and services
Structure extraction from PDF-based book documents
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
IMPACT: centre of competence in text digitisation
Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
ICDAR 2011 Book Structure Extraction Competition
ICDAR '11 Proceedings of the 2011 International Conference on Document Analysis and Recognition
Crowdsourcing platform for workplace accessibility
Proceedings of the 10th International Cross-Disciplinary Conference on Web Accessibility
Age-Based task specialization for crowdsourced proofreading
UAHCI'13 Proceedings of the 7th international conference on Universal Access in Human-Computer Interaction: user and context diversity - Volume 2
Hi-index | 0.00 |
Digitized physical books offer access to tremendous amounts of knowledge, even for people with print-related disabilities. Various projects and standard activities are underway to make all of our past and present books accessible. However digitizing books requires extensive human efforts such as correcting the results of OCR (optical character recognition) and adding structural information such as headings. Some Asian languages need extra efforts for the OCR errors because of their many and varied character sets. Japanese has used more than 10,000 characters compared with a few hundred in English. This heavy workload is inhibiting the creation of accessible digital books. To facilitate digitization, we are developing a new system for processing physical books. We reduce and disperse the human efforts and accelerate conversions by combining automatic inference and human capabilities. Our system preserves the original page images for the entire digitization process to support gradual refinement and distributes the work as micro-tasks. We conducted trials with the Japanese National Diet Library (NDL) to evaluate the required effort for digitizing books with a variety of layouts and years of publication. The results showed old Japanese books had specific problems when correcting the OCR errors and adding structures. Drawing on our results, we discuss further workload reductions and future directions for international digitization systems.