Probabilistic quality assessment based on article's revision history

Authors:
Jingyu Han;Chuandong Wang;Dawei Jiang
Affiliations:
School of Computer Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing, P.R. China;School of Computer Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing, P.R. China;School of Computing, National University of Singapore, Singapore
Venue:
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
Year:
2011

Citing 8
Cited 1

A Validity Measure for Fuzzy Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Mining database structure; or, how to build a data quality browser

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Data Quality Requirements Analysis and Modeling

Proceedings of the Ninth International Conference on Data Engineering
Data Quality in Web Information Systems

ER '02 Proceedings of the 21st International Conference on Conceptual Modeling
A framework for analysis of data freshness

Proceedings of the 2004 international workshop on Information quality in information systems
Computing trust from revision history

Proceedings of the 2006 International Conference on Privacy, Security and Trust: Bridge the Gap Between PST Technologies and Business Services
Efficient Mining of Closed Repetitive Gapped Subsequences from a Sequence Database

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Automatic quality assessment of content created collaboratively by web communities: a case study of wikipedia

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries

On multiview-based meta-learning for automatic quality assessment of wiki articles

TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

The collaborative efforts of users in social media services such as Wikipedia have led to an explosion in user-generated content and how to automatically tag the quality of the content is an eminent concern now. Actually each article is usually undergoing a series of revision phases and the articles of different quality classes exhibit specific revision cycle patterns. We propose to Assess Quality based on Revision History (AQRH) for a specific domain as follows. First, we borrow Hidden Markov Model (HMM) to turn each article's revision history into a revision state sequence. Then, for each quality class its revision cycle patterns are extracted and are clustered into quality corpora. Finally, article's quality is thereby gauged by comparing the article's state sequence with the patterns of pre-classified documents in probabilistic sense. We conduct experiments on a set of Wikipedia articles and the results demonstrate that our method can accurately and objectively capture web article's quality.