Web article quality assessment in multi-dimensional space

Authors:
Jingyu Han;Xiong Fu;Kejia Chen;Chuandong Wang
Affiliations:
School of Computer Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing, China;School of Computer Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing, China;School of Computer Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing, China;School of Computer Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing, China
Venue:
WAIM'11 Proceedings of the 12th international conference on Web-age information management
Year:
2011

Citing 7
Cited 0

Data quality assessment

Communications of the ACM - Supporting community and building social capital
Data Quality Requirements Analysis and Modeling

Proceedings of the Ninth International Conference on Data Engineering
Data Quality in Web Information Systems

ER '02 Proceedings of the 21st International Conference on Conceptual Modeling
Measuring article quality in wikipedia: models and evaluation

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Computing trust from revision history

Proceedings of the 2006 International Conference on Privacy, Security and Trust: Bridge the Gap Between PST Technologies and Business Services
Methodologies for data quality assessment and improvement

ACM Computing Surveys (CSUR)
Automatic quality assessment of content created collaboratively by web communities: a case study of wikipedia

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries

Quantified Score

Hi-index	0.01

Visualization

Abstract

Nowadays user-generated content (UGC) such as Wikipedia, is emerging on the web at an explosive rate, but its data quality varies dramatically. How to effectively rate the article's quality is the focus of research and industry communities. Considering that each quality class demonstrates its specific characteristics on different quality dimensions, we propose to learn the web quality corpus by taking different quality dimensions into consideration. Each article is regarded as an aggregation of sections and each section's quality is modelled using Dynamic Bayesian Network(DBN) with reference to accuracy, completeness and consistency. Each quality class is represented by three dimension corpora, namely accuracy corpus, completeness corpus and consistency corpus. Finally we propose two schemes to compute quality ranking. Experiments show our approach performs well.