Color and the computer
The matters that really matter for hypertext usability
HYPERTEXT '89 Proceedings of the second annual ACM conference on Hypertext
The art of navigating through hypertext
Communications of the ACM
Hypertext and hypermedia
Heuristic evaluation of user interfaces
CHI '90 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
User interface evaluation in the real world: a comparison of four techniques
CHI '91 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Structural analysis of hypertexts: identifying hierarchies and useful metrics
ACM Transactions on Information Systems (TOIS)
The computer user as toolsmith: the use, reuse, and organization of computer-based tools
The computer user as toolsmith: the use, reuse, and organization of computer-based tools
Usability Engineering
Logical Structure Analysis of Book Document Images Using Contents Information
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
AIDAS: Incremental Logical Structure Discovery in PDF Documents
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Automatic discovery of logical document structure
Automatic discovery of logical document structure
Logical Structure Analysis and Generation for Structured Documents: A Syntactic Approach
IEEE Transactions on Knowledge and Data Engineering
Xed: A New Tool for eXtracting Hidden Structures from Electronic Documents
DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
Structuring documents according to their table of contents
Proceedings of the 2005 ACM symposium on Document engineering
Towards a Canonical and Structured Representation of PDF Documents through Reverse Engineering
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
ITNG '07 Proceedings of the International Conference on Information Technology
XCDF: a canonical and structured document format
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Hi-index | 0.00 |
We discuss how to reengineer complex PDF-based documents, such as specifications and technical books, so that end users have a better experience with them. Specifications of the object management group (OMG) are our initial targets. Such specifications are dense and intricate to use, and tend to have complicated structures. Our approach includes format conversion, logical structure extraction, text extraction and multi-layer hypertext generation. Logical structure extraction is central, and results in an XML document with a schema tailored to the type of document. Many key concepts of a document are expressed in this schema, including concepts extracted from the patterns of words used in headings. For example in OMG specifications, package relationships and class associations can often be extracted from the wording of headings. When we produce, in the final step, a multilayer hypertext version of the document, these extracted concepts allow a richer user experience.