The sciences of the artificial (3rd ed.)
The sciences of the artificial (3rd ed.)
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Ontologies and semantics for seamless connectivity
ACM SIGMOD Record
Semantic integration: a survey of ontology-based approaches
ACM SIGMOD Record
Eliminating fuzzy duplicates in data warehouses
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Merging ontologies requires interlocking institutional worlds
Applied Ontology
Probabilistic Entity Linkage for Heterogeneous Information Spaces
CAiSE '08 Proceedings of the 20th international conference on Advanced Information Systems Engineering
Towards a Compositional Semantic Account of Data Quality Attributes
ER '08 Proceedings of the 27th International Conference on Conceptual Modeling
Design science as nested problem solving
Proceedings of the 4th International Conference on Design Science Research in Information Systems and Technology
Hi-index | 0.00 |
Duplication of data across systems in an organization is a problem because it wastes effort and leads to inconsistencies. Researchers have proposed several technical solutions but duplication still occurs in practice. In this paper we report on a case study of how and why duplication occurs in a large organization, and discuss generalizable lessons learned from this. Our case study research questions are why data gets duplicated, what the size of the negative effects of duplication is, and why existing solutions are not used. We frame our findings in terms of design rationale and explain them by providing a causal model. Our findings suggest that next to technological factors, organizational and project factors have a large effect on duplication. We discuss the implications of our findings for technical solutions in general.