Probabilistic quality assessment based on article's revision history

  • Authors:
  • Jingyu Han;Chuandong Wang;Dawei Jiang

  • Affiliations:
  • School of Computer Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing, P.R. China;School of Computer Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing, P.R. China;School of Computing, National University of Singapore, Singapore

  • Venue:
  • DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The collaborative efforts of users in social media services such as Wikipedia have led to an explosion in user-generated content and how to automatically tag the quality of the content is an eminent concern now. Actually each article is usually undergoing a series of revision phases and the articles of different quality classes exhibit specific revision cycle patterns. We propose to Assess Quality based on Revision History (AQRH) for a specific domain as follows. First, we borrow Hidden Markov Model (HMM) to turn each article's revision history into a revision state sequence. Then, for each quality class its revision cycle patterns are extracted and are clustered into quality corpora. Finally, article's quality is thereby gauged by comparing the article's state sequence with the patterns of pre-classified documents in probabilistic sense. We conduct experiments on a set of Wikipedia articles and the results demonstrate that our method can accurately and objectively capture web article's quality.