A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images
IEEE Transactions on Pattern Analysis and Machine Intelligence
Classification of newspaper image blocks using texture analysis
Computer Vision, Graphics, and Image Processing
Texture Features for Browsing and Retrieval of Image Data
IEEE Transactions on Pattern Analysis and Machine Intelligence
VisualSEEk: a fully automated content-based image query system
MULTIMEDIA '96 Proceedings of the fourth ACM international conference on Multimedia
A Generic System for Form Dropout
IEEE Transactions on Pattern Analysis and Machine Intelligence
DL '97 Proceedings of the second ACM international conference on Digital libraries
Document Representation and Its Application to Page Decomposition
IEEE Transactions on Pattern Analysis and Machine Intelligence
The indexing and retrieval of document images: a survey
Computer Vision and Image Understanding - Special issue on document image understanding and retrieval
Data on the Web: from relations to semistructured data and XML
Data on the Web: from relations to semistructured data and XML
Practical algorithms for image analysis: description, examples, and code
Practical algorithms for image analysis: description, examples, and code
Use of the Hough transformation to detect lines and curves in pictures
Communications of the ACM
Locating text in complex color images
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
Text Extraction from Gray Scale Document Images Using Edge Information
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Content-based image retrieval: approaches and trends of the new age
Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval
Segregating and extracting overlapping data points in two-dimensional plots
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Tablerank: a ranking algorithm for table search and retrieval
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Generating synopses for document-element search
Proceedings of the 18th ACM conference on Information and knowledge management
oreChem ChemXSeer: a semantic digital library for chemistry
Proceedings of the 10th annual joint conference on Digital libraries
An algorithm search engine for software developers
Proceedings of the 3rd International Workshop on Search-Driven Development: Users, Infrastructure, Tools, and Evaluation
Patent image retrieval: a survey
Proceedings of the 4th workshop on Patent information retrieval
Summarizing figures, tables, and algorithms in scientific publications to augment search results
ACM Transactions on Information Systems (TOIS)
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Hi-index | 0.00 |
Two dimensional plots (2-D) in digital documents on the web are an important source of information that is largely under-utilized. In this paper, we outline how data and text can be extracted automatically from these 2-D plots, thus eliminating a time consuming manual process. Our information extraction algorithm identifies the axes of the figures, extracts text blocks like axes-labels and legends and identifies data points in the figure. It also extracts the units appearing in the axes labels and segments the legends to identify the different lines in the legend, the different symbols and their associated text explanations. Our algorithm also performs the challenging task of separating out overlapping text and data points effectively. Our experiments indicate that these techniques are computationally efficient and provide acceptable accuracy.