xlinkit: a consistency checking and smart link generation service
ACM Transactions on Internet Technology (TOIT)
CSL '02 Proceedings of the 16th International Workshop and 11th Annual Conference of the EACSL on Computer Science Logic
Using Regular Tree Automata as XML Schemas
ADL '00 Proceedings of the IEEE Advances in Digital Libraries 2000
Managing inconsistent repositories via prioritized repairs
Proceedings of the 2004 ACM symposium on Document engineering
Preservation-centric and constraint-based migration of digital documents
Proceedings of the 2006 ACM symposium on Document engineering
Describing multistructured XML documents by means of delay nodes
Proceedings of the 2006 ACM symposium on Document engineering
Long-Term Preservation of Digital Documents: Principles and Practices
Long-Term Preservation of Digital Documents: Principles and Practices
Introduction to Automata Theory, Languages, and Computation (3rd Edition)
Introduction to Automata Theory, Languages, and Computation (3rd Edition)
ECBS '07 Proceedings of the 14th Annual IEEE International Conference and Workshops on the Engineering of Computer-Based Systems
Towards extending and using SPARQL for modular document generation
Proceedings of the eighth ACM symposium on Document engineering
Towards constraint-based preservation in systems specification
EUROCAST'07 Proceedings of the 11th international conference on Computer aided systems theory
Hi-index | 0.00 |
Archivists and librarians face an ever increasing amount of digital material. Their task is to preserve its authentic content. In the long run, this requires periodic migrations (from one format to another or from one hardware/software platform to another). Document migrations are challenging tasks where tool-support and a high degree of automation are important. A central aspect is that documents are often mutually related and, hence, a document's semantics has to be considered in its whole context. References between documents are usually formulated in graph- or tree-based query languages like URL or XPath. A typical scenario is web-archiving where websites are stored inside a server infrastructure that can be queried from HTML-files using URLs. Migrating websites will often require link adaptation in order to preserve link consistency. Although automated and "trustworthy" preservation of link consistency is easy to postulate, it is hard to carry out, in particular, if "trustworthy" means "provably working correct". In this paper, we propose a general approach to semantically evaluating and constructing graph queries, which at the same time conform to a regular grammar, appear as part of a document's content, and access a graph structure that is specified using First- Order Predicate Logic (FOPL). In order to do so, we adapt model checking techniques by constructing suitable query automata. We integrate these techniques into our preservation framework [12] and show the feasibility of this approach using an example. We migrate a website to a specific archiving format and demonstrate the automated preservation of link-consistency. The approach shown in this paper mainly contributes to a higher degree of automation in document migration while still maintaining a high degree of "trustworthiness", namely "provable correctness".