Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
Anchoring data quality dimensions in ontological foundations
Communications of the ACM
A product perspective on total data quality management
Communications of the ACM
Data quality and systems theory
Communications of the ACM
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Semantic integration of semistructured and structured data sources
ACM SIGMOD Record
Assessing data quality for information products
ICIS '99 Proceedings of the 20th international conference on Information Systems
Modern Information Retrieval
Mining database structure; or, how to build a data quality browser
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Data Mining and Knowledge Discovery
Finding Interesting Associations without Support Pruning
IEEE Transactions on Knowledge and Data Engineering
Data Quality Requirements Analysis and Modeling
Proceedings of the Ninth International Conference on Data Engineering
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Managing Data Quality in Cooperative Information Systems
On the Move to Meaningful Internet Systems, 2002 - DOA/CoopIS/ODBASE 2002 Confederated International Conferences DOA, CoopIS and ODBASE 2002
Robust and efficient fuzzy match for online data cleaning
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
A general framework for query answering in data quality-based Cooperative Information Systems
Proceedings of the 2004 international workshop on Information quality in information systems
A framework for analysis of data freshness
Proceedings of the 2004 international workshop on Information quality in information systems
Data quality assessment from the user's perspective
Proceedings of the 2004 international workshop on Information quality in information systems
Information Systems - Special issue: Data quality in cooperative information systems
Sample-Based Quality Estimation of Query Results in Relational Database Environments
IEEE Transactions on Knowledge and Data Engineering
Quality views: capturing and exploiting the user perspective on data quality
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Discovering data quality rules
Proceedings of the VLDB Endowment
Hi-index | 12.05 |
Accuracy is a most important data quality dimension and its assessment is a key issue in data management. Most of current studies focus on how to qualitatively analyze accuracy dimension and the analysis depends heavily on experts' knowledge. Seldom work is given on how to automatically quantify accuracy dimension. Based on Jensen-Shannon divergence (JSD) measure, we propose accuracy of data can be automatically quantified by comparing data with its entity's most approximation in available context. To quickly identify most approximation in large scale data sources, locality-sensitive hashing (LSH) is employed to extract most approximation at multiple levels, namely column, record and field level. Our approach can not only give each data source an objective accuracy score very quickly as long as context member is available but also avoid human's laborious interaction. As an automatic accuracy assessment solution in multiple-source environment, our approach is distinguished, especially for large scale data sources. Theory and experiment show our approach performs well in achieving metadata on accuracy dimension.