A Validity Measure for Fuzzy Clustering
IEEE Transactions on Pattern Analysis and Machine Intelligence
Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
Anchoring data quality dimensions in ontological foundations
Communications of the ACM
Assessing data quality for information products
ICIS '99 Proceedings of the 20th international conference on Information Systems
Machine Learning
Mining Sequential Patterns: Generalizations and Performance Improvements
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Data Quality Requirements Analysis and Modeling
Proceedings of the Ninth International Conference on Data Engineering
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Data Quality in Web Information Systems
ER '02 Proceedings of the 21st International Conference on Conceptual Modeling
A framework for analysis of data freshness
Proceedings of the 2004 international workshop on Information quality in information systems
Mining periodic patterns with gap requirement from sequences
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Sample-Based Quality Estimation of Query Results in Relational Database Environments
IEEE Transactions on Knowledge and Data Engineering
Measuring article quality in wikipedia: models and evaluation
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Computing trust from revision history
Proceedings of the 2006 International Conference on Privacy, Security and Trust: Bridge the Gap Between PST Technologies and Business Services
Search Engines: Information Retrieval in Practice
Search Engines: Information Retrieval in Practice
Efficient Mining of Closed Repetitive Gapped Subsequences from a Sequence Database
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Hi-index | 0.00 |
User-generated content (UGC) is created, updated, and maintained by various web users, and its data quality is a major concern to all users. We observe that each Wikipedia page usually goes through a series of revision stages, gradually approaching a relatively steady quality state and that articles of different quality classes exhibit specific evolution patterns. We propose to assess the quality of a number of web articles using Learning Evolution Patterns (LEP). First, each article's revision history is mapped into a state sequence using the Hidden Markov Model (HMM). Second, evolution patterns are mined for each quality class, and each quality class is characterized by a set of quality corpora. Finally, an article's quality is determined probabilistically by comparing the article with the quality corpora. Our experimental results demonstrate that the LEP approach can capture a web article's quality precisely.