Machine Learning
Incorporating quality metrics in centralized/distributed information retrieval on the World Wide Web
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning
Machine Learning
AIMQ: a methodology for information quality assessment
Information and Management
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
Bias and the probability of generalization
IIS '97 Proceedings of the 1997 IASTED International Conference on Intelligent Information Systems (IIS '97)
Building Text Classifiers Using Positive and Unlabeled Examples
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Studying cooperation and conflict between authors with history flow visualizations
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Collaborative Authoring on the Web: A Genre Analysis of Online Encyclopedias
HICSS '05 Proceedings of the Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS'05) - Track 4 - Volume 04
Document quality models for web ad hoc retrieval
Proceedings of the 14th ACM international conference on Information and knowledge management
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Beyond accuracy: what data quality means to data consumers
Journal of Management Information Systems
A content-driven reputation system for the wikipedia
Proceedings of the 16th international conference on World Wide Web
Cooperation and quality in wikipedia
Proceedings of the 2007 international symposium on Wikis
Measuring article quality in wikipedia: models and evaluation
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Finding high-quality content in social media
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Information quality work organization in wikipedia
Journal of the American Society for Information Science and Technology
Size matters: word count as a measure of quality on wikipedia
Proceedings of the 17th international conference on World Wide Web
One-Class Classification by Combining Density and Class Probability Estimation
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Computing trust from revision history
Proceedings of the 2006 International Conference on Privacy, Security and Trust: Bridge the Gap Between PST Technologies and Business Services
Overview and Framework for Data and Information Quality Research
Journal of Data and Information Quality (JDIQ)
User generated content: how good is it?
Proceedings of the 3rd workshop on Information credibility on the web
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Identifying featured articles in wikipedia: writing style matters
Proceedings of the 19th international conference on World wide web
Automatic vandalism detection in Wikipedia
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
A topic-specific web search system focusing on quality pages
ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
Quality-biased ranking of web documents
Proceedings of the fourth ACM international conference on Web search and data mining
Towards automatic quality assurance in Wikipedia
Proceedings of the 20th international conference companion on World wide web
Detection of text quality flaws as a one-class classification problem
Proceedings of the 20th ACM international conference on Information and knowledge management
A breakdown of quality flaws in Wikipedia
Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality
Cluster-based one-class ensemble for classification problems in information retrieval
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Tell me more: an actionable quality model for Wikipedia
Proceedings of the 9th International Symposium on Open Collaboration
Proceedings of the 9th International Symposium on Open Collaboration
Proceedings of the 23rd international conference on World wide web
Hi-index | 0.00 |
The detection and improvement of low-quality information is a key concern in Web applications that are based on user-generated content; a popular example is the online encyclopedia Wikipedia. Existing research on quality assessment of user-generated content deals with the classification as to whether the content is high-quality or low-quality. This paper goes one step further: it targets the prediction of quality flaws, this way providing specific indications in which respects low-quality content needs improvement. The prediction is based on user-defined cleanup tags, which are commonly used in many Web applications to tag content that has some shortcomings. We apply this approach to the English Wikipedia, which is the largest and most popular user-generated knowledge source on the Web. We present an automatic mining approach to identify the existing cleanup tags, which provides us with a training corpus of labeled Wikipedia articles. We argue that common binary or multiclass classification approaches are ineffective for the prediction of quality flaws and hence cast quality flaw prediction as a one-class classification problem. We develop a quality flaw model and employ a dedicated machine learning approach to predict Wikipedia's most important quality flaws. Since in the Wikipedia setting the acquisition of significant test data is intricate, we analyze the effects of a biased sample selection. In this regard we illustrate the classifier effectiveness as a function of the flaw distribution in order to cope with the unknown (real-world) flaw-specific class imbalances. The flaw prediction performance is evaluated with 10,000 Wikipedia articles that have been tagged with the ten most frequent quality flaws: provided test data with little noise, four flaws can be detected with a precision close to 1.