Automatic Assessment of Document Quality in Web Collaborative Digital Libraries

Authors:
Daniel Hasan Dalip;Marcos André Gonçalves;Marco Cristo;Pável Calado
Affiliations:
Universidade Federal de Minas Gerais;Universidade Federal de Minas Gerais;Federal University of Amazonas;Instituto Superior Técnico/INESC-ID
Venue:
Journal of Data and Information Quality (JDIQ)
Year:
2011

Citing 29
Cited 4

Perspectives on electronic publishing: standards, solutions, and more

Perspectives on electronic publishing: standards, solutions, and more
The nature of statistical learning theory

The nature of statistical learning theory
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
IR evaluation methods for retrieving highly relevant documents

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
What makes Web sites credible?: a report on a large quantitative study

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Machine Learning

Machine Learning
Web Wisdom; How to Evaluate and Create Information Quality on the Webb

Web Wisdom; How to Evaluate and Create Information Quality on the Webb
A Unified Loss Function in Bayesian Framework for Support Vector Regression

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
An introduction to variable and feature selection

The Journal of Machine Learning Research
The webgraph framework I: compression techniques

Proceedings of the 13th international conference on World Wide Web
How do users evaluate the credibility of Web sites?: a study with over 2,500 participants

Proceedings of the 2003 conference on Designing for user experiences
A framework for authorship identification of online messages: Writing-style features and classification techniques

Journal of the American Society for Information Science and Technology
Evolution of Networks: From Biological Nets to the Internet and WWW (Physics)

Evolution of Networks: From Biological Nets to the Internet and WWW (Physics)
A content-driven reputation system for the wikipedia

Proceedings of the 16th international conference on World Wide Web
Cooperation and quality in wikipedia

Proceedings of the 2007 international symposium on Wikis
Measuring article quality in wikipedia: models and evaluation

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Finding high-quality content in social media

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Harnessing the wisdom of crowds in wikipedia: quality through coordination

Proceedings of the 2008 ACM conference on Computer supported cooperative work
Computing trust from revision history

Proceedings of the 2006 International Conference on Privacy, Security and Trust: Bridge the Gap Between PST Technologies and Business Services
So you know you're getting the best possible information: a tool that increases Wikipedia credibility

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
QuWi: quality control in Wikipedia

Proceedings of the 3rd workshop on Information credibility on the web
Automatically characterizing resource quality for educational digital libraries

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Automatic quality assessment of content created collaboratively by web communities: a case study of wikipedia

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Detecting spammers and content promoters in online video social networks

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Learning when training data are costly: the effect of class distribution on tree induction

Journal of Artificial Intelligence Research
Assessing the quality of Wikipedia articles with lifecycle based metrics

Proceedings of the 5th International Symposium on Wikis and Open Collaboration
Detecting Wikipedia vandalism with active learning and statistical language models

Proceedings of the 4th workshop on Information credibility
On measuring the quality of Wikipedia articles

Proceedings of the 4th workshop on Information credibility
Extracting trust from domain analysis: a case study on the wikipedia project

ATC'06 Proceedings of the Third international conference on Autonomic and Trusted Computing

On multiview-based meta-learning for automatic quality assessment of wiki articles

TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
Exploiting user feedback to learn to rank answers in q&a forums: a case study with stack overflow

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
How do metrics of link analysis correlate to quality, relevance and popularity in wikipedia?

Proceedings of the 19th Brazilian symposium on Multimedia and the web
An investigation of the relationship between the amount of extra-textual data and the quality of Wikipedia articles

Proceedings of the 19th Brazilian symposium on Multimedia and the web

Quantified Score

Hi-index	0.00

Visualization

Abstract

The old dream of a universal repository containing all of human knowledge and culture is becoming possible through the Internet and the Web. Moreover, this is happening with the direct collaborative participation of people. Wikipedia is a great example. It is an enormous repository of information with free access and open edition, created by the community in a collaborative manner. However, this large amount of information, made available democratically and virtually without any control, raises questions about its quality. In this work, we explore a significant number of quality indicators and study their capability to assess the quality of articles from three Web collaborative digital libraries. Furthermore, we explore machine learning techniques to combine these quality indicators into one single assessment. Through experiments, we show that the most important quality indicators are those which are also the easiest to extract, namely, the textual features related to the structure of the article. Moreover, to the best of our knowledge, this work is the first that shows an empirical comparison between Web collaborative digital libraries regarding the task of assessing article quality.