Quantifying software requirements for supporting archived office documents using emulation

  • Authors:
  • Thomas Reichherzer;Geoffrey Brown

  • Affiliations:
  • Indiana University, Bloomington, IN;Indiana University, Bloomington, IN

  • Venue:
  • Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper addresses the issues associated with building software images to support a collection of archived documents using machine emulators. Emulation has been proposed as a strategy for preservation of digital documents that require their original software for access. The creation of software images is a critical component in archiving documents via emulation. The software images include the operating system, application software, and supporting software artifacts such as fonts and Codecs (Compression-Decompression algorithm). A practical emulation environment to support a digital document requires both an emulator and a software image. This paper considers the issues associated with creating such software images to support Microsoft Office documents. In particular, we discuss a set of software tools and strategies that we developed to automatically analyze the dependencies of Microsoft Office documents on software resources and supporting files. As a proof of concept, the tools and strategies have been applied to establish dependencies of Office documents from a document library containing approximately 200,000 documents and to automatically collect missing resources such as fonts. The software tools are a first step toward an interactive system that aids in the construction of robust emulation environments for preserving digital artifacts. However, they may also be used in other contexts, for example, to support screening of documents for archiving and migration to new platforms to ensure correct visualization.