An object-oriented data model formalised through hypergraphs
Data & Knowledge Engineering
Reasoning about functional dependencies generalized for semantic data models
ACM Transactions on Database Systems (TODS)
Algorithms for inferring functional dependencies from relations
Data & Knowledge Engineering
On the Structure of Armstrong Relations for Functional Dependencies
Journal of the ACM (JACM)
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient Discovery of Functional Dependencies and Armstrong Relations
EDBT '00 Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology
Designing Functional Dependencies for XML
EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
A cost-based model and effective heuristic for repairing constraints by value modification
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
BHUNT: automatic discovery of Fuzzy algebraic constraints in relational data
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Improving data quality: consistency and accuracy
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Detecting attribute dependencies from query feedback
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Conditional functional dependencies for capturing data inconsistencies
ACM Transactions on Database Systems (TODS)
How to juggle columns: an entropy-based approach for table compression
Proceedings of the Fourteenth International Database Engineering & Applications Symposium
Evaluating semantic relations by exploring ontologies on the semantic web
NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems
Hi-index | 0.00 |
Data quality issues arise in the Semantic Web because data is created by diverse people and/or automated tools. In particular, erroneous triples may occur due to factual errors in the original data source, the acquisition tools employed, misuse of ontologies, or errors in ontology alignment. We propose that the degree to which a triple deviates from similar triples can be an important heuristic for identifying errors. Inspired by functional dependency, which has shown promise in database data quality research, we introduce value-clustered graph functional dependency to detect abnormal data in RDF graphs. To better deal with Semantic Web data, this extends the concept of functional dependency on several aspects. First, there is the issue of scale, since we must consider the whole data schema instead of being restricted to one database relation. Second, it deals with multi-valued properties without explicit value correlations as specified as tuples in databases. Third, it uses clustering to consider classes of values. Focusing on these characteristics, we propose a number of heuristics and algorithms to efficiently discover the extended dependencies and use them to detect abnormal data. Experiments have shown that the system is efficient on multiple data sets and also detects many quality problems in real world data.