Probabilistically ranking web article quality based on evolution patterns

  • Authors:
  • Jingyu Han;Kejia Chen;Dawei Jiang

  • Affiliations:
  • School of Computer Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing, P.R. China;School of Computer Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing, P.R. China;School of Computing, National University of Singapore, Singapore

  • Venue:
  • Transactions on Large-Scale Data- and Knowledge-Centered Systems VI
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

User-generated content (UGC) is created, updated, and maintained by various web users, and its data quality is a major concern to all users. We observe that each Wikipedia page usually goes through a series of revision stages, gradually approaching a relatively steady quality state and that articles of different quality classes exhibit specific evolution patterns. We propose to assess the quality of a number of web articles using Learning Evolution Patterns (LEP). First, each article's revision history is mapped into a state sequence using the Hidden Markov Model (HMM). Second, evolution patterns are mined for each quality class, and each quality class is characterized by a set of quality corpora. Finally, an article's quality is determined probabilistically by comparing the article with the quality corpora. Our experimental results demonstrate that the LEP approach can capture a web article's quality precisely.