Assessing web article quality by harnessing collective intelligence

Authors:
Jingyu Han;Xueping Chen;Kejia Chen;Dawei Jiang
Affiliations:
School of Computer Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing, P.R.China;School of Computer Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing, P.R.China;School of Computer Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing, P.R.China;School of Computing, National University of Singapore, Singapore
Venue:
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Year:
2012

Citing 11
Cited 0

Techniques for automatically correcting words in text

ACM Computing Surveys (CSUR)
Data Quality Requirements Analysis and Modeling

Proceedings of the Ninth International Conference on Data Engineering
Latent dirichlet allocation

The Journal of Machine Learning Research
A framework for analysis of data freshness

Proceedings of the 2004 international workshop on Information quality in information systems
Corroborate and learn facts from the web

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Measuring article quality in wikipedia: models and evaluation

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Computing trust from revision history

Proceedings of the 2006 International Conference on Privacy, Security and Trust: Bridge the Gap Between PST Technologies and Business Services
Automatic quality assessment of content created collaboratively by web communities: a case study of wikipedia

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Towards the web of concepts: extracting concepts from large datasets

Proceedings of the VLDB Endowment
T-verifier: Verifying truthfulness of fact statements

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Existing approaches assess web article's quality mainly based on syntax, but seldom work is given on how to quantify its quality based on semantics. In this paper we propose a novel Semantic Quality Assessment(SQA) approach to automatically determine data quality in terms of two most important quality dimensions, namely accuracy and completeness. First, alternative context with respect to source article is built by collecting alternative web articles. Second, each alternative article is transformed and represented by semantic corpus and dimension baselines are synthetically generated from these semantic corpora. Finally, quality dimension of source article is determined by comparing its semantic corpus with dimension baseline. Our approach is promising way to assess web article quality by exploiting available collective knowledge. Experiments show that our approach performs well.