Toward automated large-scale information integration and discovery

Authors:
Paul Brown;Peter Haas;Jussi Myllymaki;Hamid Pirahesh;Berthold Reinwald;Yannis Sismanis
Affiliations:
IBM Almaden Research Center;IBM Almaden Research Center;IBM Almaden Research Center;IBM Almaden Research Center;IBM Almaden Research Center;IBM Almaden Research Center
Venue:
Data Management in a Connected World
Year:
2005

Citing 9
Cited 7

Approximating the number of unique values of an attribute without sorting

Information Systems
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
New sampling-based summary statistics for improving approximate query answers

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Learning Stochastic Regular Grammars by Means of a State Merging Method

ICGI '94 Proceedings of the Second International Colloquium on Grammatical Inference and Applications
Inductive Inference, DFAs, and Computational Complexity

AII '89 Proceedings of the International Workshop on Analogical and Inductive Inference
CORDS: automatic discovery of correlations and soft functional dependencies

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Maintaining Implicated Statistics in Constrained Environments

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
BHUNT: automatic discovery of Fuzzy algebraic constraints in relational data

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
The polynomial complexity of fully materialized coalesced cubes

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

XML Mapping technology: making connections in an XML-centric world

IBM Systems Journal
A dip in the reservoir: maintaining sample synopses of evolving datasets

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
GORDIAN: efficient and scalable discovery of composite keys

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
On synopses for distinct-value estimation under multiset operations

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Discovering topical structures of databases

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Top-k generation of integrated schemas based on directed and weighted correspondences

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
An optimal algorithm for the distinct elements problem

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The high cost of data consolidation is the key market inhibitor to the adoption of traditional information integration and data warehousing solutions. In this paper, we outline a next-generation integrated database management system that takes traditional information integration, content management, and data warehouse techniques to the next level: the system will be able to integrate a very large number of information sources and automatically construct a global business view in terms of “Universal Business Objects”. We describe techniques for discovering, unifying, and aggregating data from a large number of disparate data sources. Enabling technologies for our solution are XML, web services, caching, messaging, and portals for real-time dashboarding and reporting.