Data genome: an abstract model for data evolution

Authors:
Deyou Tang;Jianqing Xi;Yubin Guo;Shunqi Shen
Affiliations:
School of Computer Science & Engineering, South China University of Technology, Guangzhou, China and Department of Computer Science & Technology, Hunan University of Technology, Zhuzhou, China;School of Computer Science & Engineering, South China University of Technology, Guangzhou, China;School of Computer Science & Engineering, South China University of Technology, Guangzhou, China;School of Computer Science & Engineering, South China University of Technology, Guangzhou, China
Venue:
ISICA'07 Proceedings of the 2nd international conference on Advances in computation and intelligence
Year:
2007

Citing 13
Cited 0

Tracing the lineage of view data in a warehousing environment

ACM Transactions on Database Systems (TODS)
Supporting Fine-grained Data Lineage in a Database Visualization Environment

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation

SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
Lineage tracing for general data warehouse transformations

The VLDB Journal — The International Journal on Very Large Data Bases
Archiving scientific data

ACM Transactions on Database Systems (TODS)
Database management for life sciences research

ACM SIGMOD Record
Lineage retrieval for scientific data processing: a survey

ACM Computing Surveys (CSUR)
A survey of data provenance in e-science

ACM SIGMOD Record
On the expressiveness of implicit provenance in query and update languages

ICDT'07 Proceedings of the 11th international conference on Database Theory
Model and algebra for genetic information of data

IEA/AIE'06 Proceedings of the 19th international conference on Advances in Applied Artificial Intelligence: industrial, Engineering and Other Applications of Applied Intelligent Systems
Applying provenance in distributed organ transplant management

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Applying the virtual data provenance model

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Issues in automatic provenance collection

IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern information systems often process data that has been transferred, transformed or integrated from a variety of sources. In many application domains, information concerning the derivation of data items is crucial. Currently, a kind of metadata called data provenance is investigated by many researchers, but collection of provenance information must be maintained explicitly by dataset maintainer or specialized provenance management system. In this paper we investigate the problem of providing support of derivation information for applications in dataset itself. We put forward that every dataset has a unique data genome evolving with the evolution of dataset. Data genome is part of data and records derivation information for data actively. The characteristics of data genome show that the lineage of datasets can be uncovered by analyzing theirs data genomes. We also present computations of data genomes such as clone, transmit, mutate and introject to show how data genome evolves to provide derivation information from dataset itself.