Statistical profile estimation in database systems
ACM Computing Surveys (CSUR)
Identifying Extended Entity-Relationship Object Structures in Relational Schemas
IEEE Transactions on Software Engineering
Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Principles of distributed database systems (2nd ed.)
Principles of distributed database systems (2nd ed.)
Data preparation for data mining
Data preparation for data mining
Discovering interesting inclusion dependencies: application to logical database tuning
Information Systems - Databases: Creation, management and utilization
Efficient Algorithms for Mining Inclusion Dependencies
EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Potter's Wheel: An Interactive Data Cleaning System
Proceedings of the 27th International Conference on Very Large Data Bases
Database Schema Matching Using Machine Learning with Feature Selection
CAiSE '02 Proceedings of the 14th International Conference on Advanced Information Systems Engineering
CORDS: automatic discovery of correlations and soft functional dependencies
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
GORDIAN: efficient and scalable discovery of composite keys
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Ontology Matching
Extending dependencies with conditions
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Mining functional dependencies from data
Data Mining and Knowledge Discovery
Conditional functional dependencies for capturing data inconsistencies
ACM Transactions on Database Systems (TODS)
Discovering topical structures of databases
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Discovering data quality rules
Proceedings of the VLDB Endowment
Unary and n-ary inclusion dependency discovery in relational databases
Journal of Intelligent Information Systems
Literature Fingerprinting: A New Method for Visual Literary Analysis
VAST '07 Proceedings of the 2007 IEEE Symposium on Visual Analytics Science and Technology
Data Stream Management
Communications of the ACM
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Discovering Conditional Functional Dependencies
IEEE Transactions on Knowledge and Data Engineering
Creating voiD descriptions for Web-scale data
Web Semantics: Science, Services and Agents on the World Wide Web
Advancing the discovery of unique column combinations
Proceedings of the 20th ACM international conference on Information and knowledge management
Profiler: integrated statistical analysis and visualization for data quality assessment
Proceedings of the International Working Conference on Advanced Visual Interfaces
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
Discovering conditional inclusion dependencies
Proceedings of the 21st ACM international conference on Information and knowledge management
Latent topics in graph-structured data
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
Data profiling comprises a broad range of methods to efficiently analyze a given data set. In a typical scenario, which mirrors the capabilities of commercial data profiling tools, tables of a relational database are scanned to derive metadata, such as data types and value patterns, completeness and uniqueness of columns, keys and foreign keys, and occasionally functional dependencies and association rules. Individual research projects have proposed several additional profiling tasks, such as the discovery of inclusion dependencies or conditional functional dependencies. Data profiling deserves a fresh look for two reasons: First, the area itself is neither established nor defined in any principled way, despite significant research activity on individual parts in the past. Second, more and more data beyond the traditional relational databases are being created and beg to be profiled. The article proposes new research directions and challenges, including interactive and incremental profiling and profiling heterogeneous and non-relational data.