Twenty Years of Document Image Analysis in PAMI
IEEE Transactions on Pattern Analysis and Machine Intelligence
Machine Learning for Intelligent Processing of Printed Documents
Journal of Intelligent Information Systems - Special issue on methodologies for intelligent information systems
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
Document Image Analysis: An Executive Briefing
Document Image Analysis: An Executive Briefing
Machine Interpretation of Line Drawing Images: Technical Drawings, Maps, and Diagrams
Machine Interpretation of Line Drawing Images: Technical Drawings, Maps, and Diagrams
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
GREC '01 Selected Papers from the Fourth International Workshop on Graphics Recognition Algorithms and Applications
Issues in Ground-Truthing Graphic Documents
GREC '01 Selected Papers from the Fourth International Workshop on Graphics Recognition Algorithms and Applications
Applications of Support Vector Machines for Pattern Recognition: A Survey
SVM '02 Proceedings of the First International Workshop on Pattern Recognition with Support Vector Machines
A Study on the Document Zone Content Classification Problem
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Distinguishing photographs and graphics on the World Wide Web
CAIVL '97 Proceedings of the 1997 Workshop on Content-Based Access of Image and Video Libraries (CBAIVL '97)
Ambiguity in Visual Language Theory and its Role in Diagram Parsing
VL '99 Proceedings of the IEEE Symposium on Visual Languages
Efficient analysis of complex diagrams using constraint-based parsing
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2
AIDAS: Incremental Logical Structure Discovery in PDF Documents
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Diagram understanding using integration of layout information and textual information
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Document zone content classification and its performance evaluation
Pattern Recognition
Object-level document analysis of PDF files
Proceedings of the 9th ACM symposium on Document engineering
Improving XED for extracting content from Arabic PDFs
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Figure classification in biomedical literature to elucidate disease mechanisms, based on pathways
Artificial Intelligence in Medicine
Security and privacy issues in the Portable Document Format
Journal of Systems and Software
GOAL: towards understanding of graphic objects from architectural to line drawings
GREC'09 Proceedings of the 8th international conference on Graphics recognition: achievements, challenges, and evolution
A fast technique for vectorization of engineering drawings using morphology and digital straightness
Proceedings of the Seventh Indian Conference on Computer Vision, Graphics and Image Processing
Recognition and classification of figures in PDF documents
GREC'05 Proceedings of the 6th international conference on Graphics Recognition: ten Years Review and Future Perspectives
XCDF: a canonical and structured document format
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Understanding Digital Documents Using Gestalt Properties of Isothetic Components
International Journal of Digital Library Systems
Hi-index | 0.00 |
Diagrams are a critical part of virtually all scientificand technical documents. Analyzing diagrams will beimportant for building comprehensive document retrievalsystems. This paper focuses on the extraction andclassification of diagrams from PDF documents. Westudy diagrams available in vector (not raster) format inonline research papers.PDF files are parsed and their vector graphicscomponents installed in a spatial index. Subdiagrams arefound by analyzing white space gaps. A set of statistics isgenerated for each diagram, e.g., the number ofhorizontal lines and vertical lines. The statistics form afeature vector description of the diagram. The vectorsare used in a kernel-based machine learning system(Support Vector Machine). Separating a set of bargraphs from non-bar-graphs gathered from 20,000biology research papers gave a classification accuracy of91.7%. The approach is directly applicable to diagramsvectorized from images.