Rationality of cross-system data duplication: a case study

Authors:
Wiebe Hordijk;Roel Wieringa
Affiliations:
University of Twente, The Netherlands;University of Twente, The Netherlands
Venue:
CAiSE'10 Proceedings of the 22nd international conference on Advanced information systems engineering
Year:
2010

Citing 10
Cited 0

The sciences of the artificial (3rd ed.)

The sciences of the artificial (3rd ed.)
Integration of heterogeneous databases without common domains using queries based on textual similarity

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Ontologies and semantics for seamless connectivity

ACM SIGMOD Record
Semantic integration: a survey of ontology-based approaches

ACM SIGMOD Record
Eliminating fuzzy duplicates in data warehouses

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Merging ontologies requires interlocking institutional worlds

Applied Ontology
Probabilistic Entity Linkage for Heterogeneous Information Spaces

CAiSE '08 Proceedings of the 20th international conference on Advanced Information Systems Engineering
Towards a Compositional Semantic Account of Data Quality Attributes

ER '08 Proceedings of the 27th International Conference on Conceptual Modeling
Design science as nested problem solving

Proceedings of the 4th International Conference on Design Science Research in Information Systems and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Duplication of data across systems in an organization is a problem because it wastes effort and leads to inconsistencies. Researchers have proposed several technical solutions but duplication still occurs in practice. In this paper we report on a case study of how and why duplication occurs in a large organization, and discuss generalizable lessons learned from this. Our case study research questions are why data gets duplicated, what the size of the negative effects of duplication is, and why existing solutions are not used. We frame our findings in terms of design rationale and explain them by providing a causal model. Our findings suggest that next to technological factors, organizational and project factors have a large effect on duplication. We discuss the implications of our findings for technical solutions in general.