Data extraction and transformation for the data warehouse
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
An overview of data warehousing and OLAP technology
ACM SIGMOD Record
First-class views: a key to user-centered computing
ACM SIGMOD Record
Efficient resumption of interrupted warehouse loads
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Tracing the lineage of view data in a warehousing environment
ACM Transactions on Database Systems (TODS)
Implementation of integrity constraints and views by query modification
SIGMOD '75 Proceedings of the 1975 ACM SIGMOD international conference on Management of data
Supporting Fine-grained Data Lineage in a Database Visualization Environment
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Automatic Data Transformation and Restructuring
Proceedings of the Third International Conference on Data Engineering
Lineage Tracing for General Data Warehouse Transformations
Proceedings of the 27th International Conference on Very Large Data Bases
Managing Derived Data in the Gaea Scientific DBMS
VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
A Data Transformation System for Biological Data Sources
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
SchemaSQL - A Language for Interoperability in Relational Multi-Database Systems
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Recovering Information from Summary Data
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Propagating Integrity Information among Interrelated Databases
Proceedings of the IFIP TC11 Working Group 11.5, Second Working Conference on Integrity and Internal Control in Information Systems: Bridging Business Requirements and Research Results
Practical Lineage Tracing in Data Warehouses
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
An Interactive Framework for Data Cleaning
An Interactive Framework for Data Cleaning
Lineage tracing in data warehouses
Lineage tracing in data warehouses
Optimizing ETL Processes in Data Warehouses
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Lineage retrieval for scientific data processing: a survey
ACM Computing Surveys (CSUR)
Database Security-Concepts, Approaches, and Challenges
IEEE Transactions on Dependable and Secure Computing
ETL queues for active data warehousing
Proceedings of the 2nd international workshop on Information quality in information systems
State-Space Optimization of ETL Workflows
IEEE Transactions on Knowledge and Data Engineering
A survey of data provenance in e-science
ACM SIGMOD Record
Principles of dataspace systems
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Provenance management in curated databases
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
ULDBs: databases with uncertainty and lineage
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Materialized views in probabilistic databases: for information exchange and query optimization
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Boomerang: resourceful lenses for string data
Proceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Engineering Privacy Requirements in Business Intelligence Applications
SDM '08 Proceedings of the 5th VLDB workshop on Secure Data Management
A model of process documentation to determine provenance in mash-ups
ACM Transactions on Internet Technology (TOIT)
Metadata management for integration and analysis of earth observation data
Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication
Efficient provenance storage over nested data collections
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
How to Trace and Revise Identities
ESWC 2009 Heraklion Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications
Proceedings of the ACM twelfth international workshop on Data warehousing and OLAP
The sustainability hub: an information management tool for analysis and decision making
ACM SIGMETRICS Performance Evaluation Review
Data genome: an abstract model for data evolution
ISICA'07 Proceedings of the 2nd international conference on Advances in computation and intelligence
Evaluation of probabilistic threshold queries in MCDB
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
I4E: interactive investigation of iterative information extraction
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
RDFProv: A relational RDF store for querying and managing scientific workflow provenance
Data & Knowledge Engineering
Towards a secure and efficient system for end-to-end provenance
TAPP'10 Proceedings of the 2nd conference on Theory and practice of provenance
Facilitating fine grained data provenance using temporal data model
Proceedings of the Seventh International Workshop on Data Management for Sensor Networks
Enabling revisitation of fine-grained clinical information
Proceedings of the 1st ACM International Health Informatics Symposium
The Foundations for Provenance on the Web
Foundations and Trends in Web Science
Managing lineage and uncertainty under a data exchange setting
SUM'10 Proceedings of the 4th international conference on Scalable uncertainty management
Lineage for Markovian stream event queries
Proceedings of the 10th ACM International Workshop on Data Engineering for Wireless and Mobile Access
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
Provenance-based refresh in data-oriented workflows
Proceedings of the 20th ACM international conference on Information and knowledge management
Middleware non-repudiation service for the data warehouse
Annales UMCS, Informatica
Query language constructs for provenance
Proceedings of the 15th Symposium on International Database Engineering & Applications
Exploring provenance in a distributed job execution system
IPAW'06 Proceedings of the 2006 international conference on Provenance and Annotation of Data
Provenance based conflict handling strategies
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Context provenance to enhance the dependability of ambient intelligence systems
Personal and Ubiquitous Computing
Efficient provenance storage for relational queries
Proceedings of the 21st ACM international conference on Information and knowledge management
Implementing a data lineage tracker
DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
Schema decryption for large extract-transform-load systems
ER'12 Proceedings of the 31st international conference on Conceptual Modeling
WebLab PROV: computing fine-grained provenance links for XML artifacts
Proceedings of the Joint EDBT/ICDT 2013 Workshops
Collaborative data sharing via update exchange and provenance
ACM Transactions on Database Systems (TODS)
Scalable lineage capture for debugging DISC analytics
Proceedings of the 4th annual Symposium on Cloud Computing
Automated data provenance capture in spreadsheets, with case studies
Future Generation Computer Systems
Hi-index | 0.00 |
Data warehousing systems integrate information from operational data sources into a central repository to enable analysis and mining of the integrated information. During the integration process, source data typically undergoes a series of transformations, which may vary from simple algebraic operations or aggregations to complex “data cleansing” procedures. In a warehousing environment, the data lineage problem is that of tracing warehouse data items back to the original source items from which they were derived. We formally define the lineage tracing problem in the presence of general data warehouse transformations, and we present algorithms for lineage tracing in this environment. Our tracing procedures take advantage of known structure or properties of transformations when present, but also work in the absence of such information. Our results can be used as the basis for a lineage tracing tool in a general warehousing setting, and also can guide the design of data warehouses that enable efficient lineage tracing.