Creating reusable well-structured PDF as a sequence of component object graphic (COG) elements

Authors:
Steven R. Bagley;David F. Brailsford;Matthew R. B. Hardy
Affiliations:
University of Nottingham, Nottingham, UK;University of Nottingham, Nottingham, UK;University of Nottingham, Nottingham, UK
Venue:
Proceedings of the 2003 ACM symposium on Document engineering
Year:
2003

Citing 1
Cited 14

PostScript language reference (3rd ed.)

PostScript language reference (3rd ed.)

Page composition using PPML as a link-editing script

Proceedings of the 2004 ACM symposium on Document engineering
Encapsulating and manipulating component object graphics (COGs) using SVG

Proceedings of the 2005 ACM symposium on Document engineering
The COG scrapbook

Proceedings of the 2005 ACM symposium on Document engineering
Towards a Canonical and Structured Representation of PDF Documents through Reverse Engineering

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
COG Extractor

Proceedings of the 2006 ACM symposium on Document engineering
A model for mapping between printed and digital document instances

Proceedings of the 2007 ACM symposium on Document engineering
Extracting reusable document components for variable data printing

Proceedings of the 2007 ACM symposium on Document engineering
Tracking sub-page components in document workflows

Proceedings of the eighth ACM symposium on Document engineering
Document engineering approaches toward scalable and structured multimedia, web and printable documents

Multimedia Tools and Applications
Improving XED for extracting content from Arabic PDFs

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Lessons from the dragon: compiling PDF to machine code

Proceedings of the 10th ACM symposium on Document engineering
Reflowable documents composed from pre-rendered atomic components

Proceedings of the 11th ACM symposium on Document engineering
XCDF: a canonical and structured document format

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
No need to justify your choice: pre-compiling line breaks to improve eBook readability

Proceedings of the 2013 ACM symposium on Document engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Portable Document Format (PDF) is a page-oriented, graphically rich format based on PostScript semantics and it is also the format interpreted by the Adobe Acrobat viewers. Although each of the pages in a PDF document is an independent graphic object this property does not necessarily extend to the components (headings, diagrams, paragraphs etc.) within a page. This, in turn, makes the manipulation and extraction of graphic objects on a PDF page into a very difficult and uncertain process.The work described here investigates the advantages of a model wherein PDF pages are created from assemblies of COGs (Component Object Graphics) each with a clearly defined graphic state. The relative positioning of COGs on a PDF page is determined by appropriate 'spacer' objects and a traversal of the tree of COGs and spacers determines the rendering order. The enhanced revisability of PDF documents within the COG model is discussed, together with the application of the model in those contexts which require easy revisability coupled with the ability to maintain and amend PDF document structure.