Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
Data Quality Requirements Analysis and Modeling
Proceedings of the Ninth International Conference on Data Engineering
The Journal of Machine Learning Research
A framework for analysis of data freshness
Proceedings of the 2004 international workshop on Information quality in information systems
Corroborate and learn facts from the web
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Measuring article quality in wikipedia: models and evaluation
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Computing trust from revision history
Proceedings of the 2006 International Conference on Privacy, Security and Trust: Bridge the Gap Between PST Technologies and Business Services
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Towards the web of concepts: extracting concepts from large datasets
Proceedings of the VLDB Endowment
T-verifier: Verifying truthfulness of fact statements
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Probabilistic latent semantic analysis
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Hi-index | 0.00 |
Existing approaches assess web article's quality mainly based on syntax, but seldom work is given on how to quantify its quality based on semantics. In this paper we propose a novel Semantic Quality Assessment(SQA) approach to automatically determine data quality in terms of two most important quality dimensions, namely accuracy and completeness. First, alternative context with respect to source article is built by collecting alternative web articles. Second, each alternative article is transformed and represented by semantic corpus and dimension baselines are synthetically generated from these semantic corpora. Finally, quality dimension of source article is determined by comparing its semantic corpus with dimension baseline. Our approach is promising way to assess web article quality by exploiting available collective knowledge. Experiments show that our approach performs well.