Probabilistically ranking web article quality based on evolution patterns

Authors:
Jingyu Han;Kejia Chen;Dawei Jiang
Affiliations:
School of Computer Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing, P.R. China;School of Computer Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing, P.R. China;School of Computing, National University of Singapore, Singapore
Venue:
Transactions on Large-Scale Data- and Knowledge-Centered Systems VI
Year:
2012

Citing 18
Cited 0

A Validity Measure for Fuzzy Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Techniques for automatically correcting words in text

ACM Computing Surveys (CSUR)
Anchoring data quality dimensions in ontological foundations

Communications of the ACM
Assessing data quality for information products

ICIS '99 Proceedings of the 20th international conference on Information Systems
Machine Learning

Machine Learning
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Data Quality Requirements Analysis and Modeling

Proceedings of the Ninth International Conference on Data Engineering
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Data Quality in Web Information Systems

ER '02 Proceedings of the 21st International Conference on Conceptual Modeling
A framework for analysis of data freshness

Proceedings of the 2004 international workshop on Information quality in information systems
Mining periodic patterns with gap requirement from sequences

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Sample-Based Quality Estimation of Query Results in Relational Database Environments

IEEE Transactions on Knowledge and Data Engineering
Measuring article quality in wikipedia: models and evaluation

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Computing trust from revision history

Proceedings of the 2006 International Conference on Privacy, Security and Trust: Bridge the Gap Between PST Technologies and Business Services
Search Engines: Information Retrieval in Practice

Search Engines: Information Retrieval in Practice
Efficient Mining of Closed Repetitive Gapped Subsequences from a Sequence Database

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Automatic quality assessment of content created collaboratively by web communities: a case study of wikipedia

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

User-generated content (UGC) is created, updated, and maintained by various web users, and its data quality is a major concern to all users. We observe that each Wikipedia page usually goes through a series of revision stages, gradually approaching a relatively steady quality state and that articles of different quality classes exhibit specific evolution patterns. We propose to assess the quality of a number of web articles using Learning Evolution Patterns (LEP). First, each article's revision history is mapped into a state sequence using the Hidden Markov Model (HMM). Second, evolution patterns are mined for each quality class, and each quality class is characterized by a set of quality corpora. Finally, an article's quality is determined probabilistically by comparing the article with the quality corpora. Our experimental results demonstrate that the LEP approach can capture a web article's quality precisely.