Elements of information theory
Elements of information theory
Randomized algorithms
Entity-relationship and object-oriented model automatic clustering
Data & Knowledge Engineering
FleXPath: flexible structure and full-text querying for XML
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Making database systems usable
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Fast direction-aware proximity for graph mining
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
BANKS: browsing and keyword searching in relational databases
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Querying complex structured databases
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Automated creation of a forms-based database query interface
Proceedings of the VLDB Endowment
Graph-based concept identification and disambiguation for enterprise search
Proceedings of the 19th international conference on World wide web
Constructing and exploring composite items
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
SnipSuggest: context-aware autocompletion for SQL
Proceedings of the VLDB Endowment
A novel keyword search paradigm in relational databases: Object summaries
Data & Knowledge Engineering
A method for filtering large conceptual schemas
ER'10 Proceedings of the 29th international conference on Conceptual modeling
Automatic example queries for ad hoc databases
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Database-as-a-service for long-tail science
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium
On computing the importance of associations in large conceptual schemas
Conceptual Modelling and Its Theoretical Foundations
iSearch: an interpretation based framework for keyword search in relational databases
KEYS '12 Proceedings of the Third International Workshop on Keyword Search on Structured Data
SODA: generating SQL for business users
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Complex databases are challenging to explore and query by users unfamiliar with their schemas. Enterprise databases often have hundreds of inter-linked tables, so even when extensive documentation is available, new users must spend a considerable amount of time understanding the schema before they can retrieve any information from the database. The problem is aggravated if the documentation is missing or outdated, which may happen with legacy databases. In this paper we identify limitations of previous approaches to address this vexing problem, and propose a principled approach to summarizing the contents of a relational database, so that a user can determine at a glance the type of information it contains, and the main tables in which that information resides. Our approach has three components: First, we define the importance of each table in the database as its stable state value in a random walk over the schema graph, where the transition probabilities depend on the entropies of table attributes. This ensures that the importance of a table depends both on its information content, and on how that content relates to the content of other tables in the database. Second, we define a metric space over the tables in a database, such that the distance function is consistent with an intuitive notion of table similarity. Finally, we use a Weighted k-Center algorithm under this distance function to cluster all tables in the database around the most relevant tables, and return the result as our summary. We conduct an extensive experimental study on a benchmark database, comparing our approach with previous methods, as well as with several hybrid models. We show that our approach not only achieves significantly higher accuracy than the previous state of the art, but is also faster and scales linearly with the size of the schema graph.