DSToolkit: an architecture for flexible dataspace management

Authors:
Cornelia Hedeler;Khalid Belhajjame;Lu Mao;Chenjuan Guo;Ian Arundale;Bernadette Farias Lóscio;Norman W. Paton;Alvaro A. A. Fernandes;Suzanne M. Embury
Affiliations:
School of Computer Science, The University of Manchester, Manchester, UK;School of Computer Science, The University of Manchester, Manchester, UK;School of Computer Science, The University of Manchester, Manchester, UK;School of Computer Science, The University of Manchester, Manchester, UK;School of Computer Science, The University of Manchester, Manchester, UK;Centro de Informatica Cidade Universitria, Universidade Federal de Pernambuco, Recife, PE, Brasil;School of Computer Science, The University of Manchester, Manchester, UK;School of Computer Science, The University of Manchester, Manchester, UK;School of Computer Science, The University of Manchester, Manchester, UK
Venue:
Transactions on Large-Scale Data- and Knowledge-Centered Systems V
Year:
2012

Citing 43
Cited 1

Encapsulation of parallelism in the Volcano query processing system

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Classifying Schematic and Data Heterogeneity in Multidatabase Systems

Computer
On resolving schematic heterogeneity in multidatabase systems

Distributed and Parallel Databases
A product perspective on total data quality management

Communications of the ACM
How to solve it: modern heuristics

How to solve it: modern heuristics
A vision for management of complex models

ACM SIGMOD Record
Quality-driven Integration of Heterogenous Information Systems

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Answering queries using views: A survey

The VLDB Journal — The International Journal on Very Large Data Bases
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Rondo: a programming platform for generic model management

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Data integration through database federation

IBM Systems Journal
The DaQuinCIS architecture: a platform for exchanging and improving data quality in cooperative information systems

Information Systems - Special issue: Data quality in cooperative information systems
Industrial-strength schema matching

ACM SIGMOD Record
Integrating Data from Disparate Sources: A Mass Collaboration Approach

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Schema and ontology matching with COMA++

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
From databases to dataspaces: a new abstraction for information management

ACM SIGMOD Record
Principles of dataspace systems

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Debugging schema mappings with routes

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Matching large schemas: Approaches and evaluation

Information Systems
Model management 2.0: manipulating richer mappings

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
COMA: a system for flexible combination of schema matching approaches

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Interactive generation of integrated schemas

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Pay-as-you-go user feedback for dataspace systems

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Bootstrapping pay-as-you-go data integration systems

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Database Systems: The Complete Book

Database Systems: The Complete Book
Learning to create data-integrating queries

Proceedings of the VLDB Endowment
The design and implementation of OGSA-DQP: A service-based distributed query processor

Future Generation Computer Systems
The ORCHESTRA Collaborative Data Sharing System

ACM SIGMOD Record
AutoMed Model Management

ER '08 Proceedings of the 27th International Conference on Conceptual Modeling
The Harmony Integration Workbench

Journal on Data Semantics XI
Generic schema mappings for composition and query answering

Data & Knowledge Engineering
Efficiently incorporating user feedback into information extraction and integration programs

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Defining and Using Schematic Correspondences for Automatically Generating Schema Mappings

CAiSE '09 Proceedings of the 21st International Conference on Advanced Information Systems Engineering
Dimensions of Dataspaces

BNCOD 26 Proceedings of the 26th British National Conference on Databases: Dataspace: The Final Frontier
A Universal Metamodel and Its Dictionary

Transactions on Large-Scale Data- and Knowledge-Centered Systems I
MISM: A Platform for Model-Independent Solutions to Model Management Problems

Journal on Data Semantics XIV
Feedback-driven result ranking and query refinement for exploring semi-structured data collections

Proceedings of the 13th International Conference on Extending Database Technology
Feedback-based annotation, selection and refinement of schema mappings for dataspaces

Proceedings of the 13th International Conference on Extending Database Technology
P2P query reformulation over both-as-view data transformation rules

DBISP2P'05/06 Proceedings of the 2005/2006 international conference on Databases, information systems, and peer-to-peer computing
Automatically incorporating new sources in keyword search-based data integration

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
OpenII: an open source information integration toolkit

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Utilising the MISM model independent schema management platform for query evaluation

BNCOD'11 Proceedings of the 28th British national conference on Advances in databases
Beauty and the beast: the theory and practice of information integration

ICDT'07 Proceedings of the 11th international conference on Database Theory

Pay-as-you-go data integration for linked data: opportunities, challenges and architectures

SWIM '12 Proceedings of the 4th International Workshop on Semantic Web Information Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The vision of dataspaces is to provide various of the benefits of classical data integration, but with reduced up-front costs. Combining this with opportunities for incremental refinement enables a ‘pay-as-you-go' approach to data integration, resulting in simplified integrated access to distributed data. It has been speculated that model management could provide the basis for Dataspace Management, however, this has not been investigated until now. Here, we present DSToolkit, the first dataspace management system that is based on model management, and therefore, benefits from the flexibility provided by the approach for the management of schemas represented in heterogeneous models, supports the complete dataspace lifecycle, which includes automatic initialisation, maintenance and improvement of a dataspace, and allows the user to provide feedback by annotating result tuples returned as a result of queries the user has posed. The user feedback gathered is utilised for improvement by annotating, selecting and refining mappings. Without the need for additional feedback on a new data source, these techniques can also be applied to determine its perceived quality with respect to already gathered feedback and to identify the best mappings over all sources including the new one.