Keys and pseudo-keys detection for web datasets cleansing and interlinking
EKAW'12 Proceedings of the 18th international conference on Knowledge Engineering and Knowledge Management
An automatic key discovery approach for data linking
Web Semantics: Science, Services and Agents on the World Wide Web
Hi-index | 0.00 |
Data quality is a critical problem for the Semantic Web. We propose that the degree to which a triple deviates from similar triples can be an important heuristic for identifying errors. Inspired by data dependency, which has shown promise in database data quality research, we introduce Semantic Dependency to assess quality of Semantic Web data. The system first builds a summary graph for finding candidate semantic dependencies. Each semantic dependency has a probability according to its instantiations and is subsequently adjusted based on the inconsistencies among them. Then triples can get a posterior probability of normality based on what semantic dependencies can support each of them. Repeating the iteration above, the proposed approach detects abnormal Semantic Web data. Experiments have shown that the system is efficient on data set with 10M triples and has more than a ten percent F-score improvement over our previous system.