Automatic quality assessment of content created collaboratively by web communities: a case study of wikipedia

Authors:
Daniel Hasan Dalip;Marcos André Gonçalves;Marco Cristo;Pável Calado
Affiliations:
Federal University of Minas Gerais, Belo Horizonte, Brazil;Federal University of Minas Gerais, Belo Horizonte, Brazil;FUCAPI - Analysis, Research and Tech. Innovation Center, Manaus, Brazil;Instituto Superior Técnico/INESC-ID, Porto Salvo, Portugal
Venue:
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Year:
2009

Citing 15
Cited 19

Perspectives on electronic publishing: standards, solutions, and more

Perspectives on electronic publishing: standards, solutions, and more
The nature of statistical learning theory

The nature of statistical learning theory
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
IR evaluation methods for retrieving highly relevant documents

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning

Machine Learning
Web Wisdom; How to Evaluate and Create Information Quality on the Webb

Web Wisdom; How to Evaluate and Create Information Quality on the Webb
The webgraph framework I: compression techniques

Proceedings of the 13th international conference on World Wide Web
How do users evaluate the credibility of Web sites?: a study with over 2,500 participants

Proceedings of the 2003 conference on Designing for user experiences
Evolution of Networks: From Biological Nets to the Internet and WWW (Physics)

Evolution of Networks: From Biological Nets to the Internet and WWW (Physics)
A content-driven reputation system for the wikipedia

Proceedings of the 16th international conference on World Wide Web
Know your neighbors: web spam detection using the web topology

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Measuring article quality in wikipedia: models and evaluation

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Identifying video spammers in online social networks

AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Extracting trust from domain analysis: a case study on the wikipedia project

ATC'06 Proceedings of the Third international conference on Autonomic and Trusted Computing

On measuring the quality of Wikipedia articles

Proceedings of the 4th workshop on Information credibility
GreenWiki: a tool to support users' assessment of the quality of Wikipedia articles

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Estratégias para comunicar qualidade na Wikipedia

Proceedings of the IX Symposium on Human Factors in Computing Systems
Investigando a comunicação sobre qualidade de artigos na Wikipedia para seus usuários

Proceedings of the IX Symposium on Human Factors in Computing Systems
Probabilistic quality assessment based on article's revision history

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
Web article quality assessment in multi-dimensional space

WAIM'11 Proceedings of the 12th international conference on Web-age information management
Automatic Assessment of Document Quality in Web Collaborative Digital Libraries

Journal of Data and Information Quality (JDIQ)
Detection of text quality flaws as a one-class classification problem

Proceedings of the 20th ACM international conference on Information and knowledge management
Characterizing Wikipedia pages using edit network motif profiles

Proceedings of the 3rd international workshop on Search and mining user-generated contents
A breakdown of quality flaws in Wikipedia

Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality
Assessing web article quality by harnessing collective intelligence

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
On the Relationship between Novelty and Popularity of User-Generated Content

ACM Transactions on Intelligent Systems and Technology (TIST)
Predicting quality flaws in user-generated content: the case of wikipedia

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Assessing the quality of textual features in social media

Information Processing and Management: an International Journal
On multiview-based meta-learning for automatic quality assessment of wiki articles

TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
Probabilistically ranking web article quality based on evolution patterns

Transactions on Large-Scale Data- and Knowledge-Centered Systems VI
Classifying Wikipedia articles using network motif counts and ratios

Proceedings of the Eighth Annual International Symposium on Wikis and Open Collaboration
Tell me more: an actionable quality model for Wikipedia

Proceedings of the 9th International Symposium on Open Collaboration
What makes a good biography?: multidimensional quality analysis based on wikipedia article feedback data

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

The old dream of a universal repository containing all the human knowledge and culture is becoming possible through the Internet and the Web. Moreover, this is happening with the direct collaborative, participation of people. Wikipedia is a great example. It is an enormous repository of information with free access and edition, created by the community in a collaborative manner. However, this large amount of information, made available democratically and virtually without any control, raises questions about its relative quality. In this work we explore a significant number of quality indicators, some of them proposed by us and used here for the first time, and study their capability to assess the quality of Wikipedia articles. Furthermore, we explore machine learning techniques to combine these quality indicators into one single assessment judgment. Through experiments, we show that the most important quality indicators are the easiest ones to extract, namely, textual features related to length, structure and style. We were also able to determine which indicators did not contribute significantly to the quality assessment. These were, coincidentally, the most complex features, such as those based on link analysis. Finally, we compare our combination method with state-of-the-art solution and show significant improvements in terms of effective quality prediction.