Extracting reusable document components for variable data printing

Authors:
Steven R. Bagley;David F. Brailsford;James A. Ollis
Affiliations:
University of Nottingham;University of Nottingham;University of Nottingham
Venue:
Proceedings of the 2007 ACM symposium on Document engineering
Year:
2007

Citing 6
Cited 2

PostScript language reference (3rd ed.)

PostScript language reference (3rd ed.)
Substituting outline fonts for bitmap fonts in archived PDF files

Software—Practice & Experience
Creating reusable well-structured PDF as a sequence of component object graphic (COG) elements

Proceedings of the 2003 ACM symposium on Document engineering
A framework for structure, layout & function in documents

Proceedings of the 2005 ACM symposium on Document engineering
Encapsulating and manipulating component object graphics (COGs) using SVG

Proceedings of the 2005 ACM symposium on Document engineering
COG Extractor

Proceedings of the 2006 ACM symposium on Document engineering

Security and privacy issues in the Portable Document Format

Journal of Systems and Software
Lessons from the dragon: compiling PDF to machine code

Proceedings of the 10th ACM symposium on Document engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Variable Data Printing (VDP) has brought new flexibility and dynamism to the printed page. Every printed instance of a specific class of document can now have different degrees of customized content within the document template. This flexibility comes at a cost. If every printed page is potentially different from all others it must be rasterized separately, which is a time-consuming process. Technologies such as PPML (Personalized Print Markup Language) attempt to address this problem by dividing the bitmapped page into components that can be cached at the raster level, thereby speeding up the generation of page instances. A large number of documents are stored in Page Description Languages at a higher level of abstraction than the bitmapped page. Much of this content could be reused within a VDP environment provided that separable document components can be identified and extracted. These components then need to be individually rasterisable so that each high-level component can be related to its low-level (bitmap) equivalent. Unfortunately, the unstructured nature of most Page Description Languages makes it difficult to extract content easily. This paper outlines the problems encountered in extracting component-based content from existing page description formats, such as PostScript, PDF and SVG, and how the differences between the formats affects the ease with which content can be extracted. The techniques are illustrated with reference to a tool called COG Extractor, which extracts content from PDF and SVG and prepares it for reuse.