Document area identification for extending books without markers

Authors:
Akihiro Miyata;Ko Fujimura
Affiliations:
NTT Corporation, Yokosuka, Japan;NTT Corporation, Yokosuka, Japan
Venue:
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Year:
2011

Citing 9
Cited 0

PaperLink: a technique for hyperlinking from real paper to electronic content

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
Insight lab: an immersive team environment linking paper, displays, and data

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Printed Embedded Data Graphical User Interfaces

Computer
Backgrounds as Information Carriers for Printed Documents

ICPR '00 Proceedings of the International Conference on Pattern Recognition - Volume 4
Only touching the surface: creating affinities between digital content and paper

CSCW '04 Proceedings of the 2004 ACM conference on Computer supported cooperative work
HOTPAPER: multimedia interaction with paper using mobile phones

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Bokode: imperceptible visual tags for camera based interaction from a distance

ACM SIGGRAPH 2009 papers
Real-Time Retrieval for Images of Documents in Various Languages Using a Web Camera

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
A publishing framework for digitally augmented paper documents: towards cross-media information integration

PCM'06 Proceedings of the 7th Pacific Rim conference on Advances in Multimedia Information Processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

We present a method of document area identification that utilizes consecutive characters in the non-reading direction as search keys. We use this method to develop a prototype system called Kappan. It enables service providers and users to create hyperlinks in books without markers. Existing techniques generally require markers to be printed on the page if a hyperlink is to be created. We consider that utilizing the concept of the search index makes markers unnecessary. Kappan associates indexed text areas in a large number of books with supporting digital contents. The indexed text areas, freely defined by service providers or users, are identified by subjecting images of small areas of the printed page to OCR (Optical Character Recognition) and extracting from the text so recognized highly specific and efficient search keys. Traditional text indexing methods must extract long character sequences from the partial image in order to identify the area exactly given the sheer number of book pages. However, considering that the average OCR error rate is more than 20 percent if the partial image is captured by a camera-equipped cellular phone, it is highly probable that many characters would be misrecognized and area identification would thus fail. In contrast, our indexing method can extract area-specific clues using fewer characters that can identify the area exactly even when the partial image is small and the extracted text contains misrecognized characters. An experiment proves that our method can identify the exact area from more than one million areas with the high accuracy rates of 99 percent and 96 percent for OCR error rates of 0 percent and 22 percent, respectively.