Automatic Assessment of Document Quality in Web Collaborative Digital Libraries

  • Authors:
  • Daniel Hasan Dalip;Marcos André Gonçalves;Marco Cristo;Pável Calado

  • Affiliations:
  • Universidade Federal de Minas Gerais;Universidade Federal de Minas Gerais;Federal University of Amazonas;Instituto Superior Técnico/INESC-ID

  • Venue:
  • Journal of Data and Information Quality (JDIQ)
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The old dream of a universal repository containing all of human knowledge and culture is becoming possible through the Internet and the Web. Moreover, this is happening with the direct collaborative participation of people. Wikipedia is a great example. It is an enormous repository of information with free access and open edition, created by the community in a collaborative manner. However, this large amount of information, made available democratically and virtually without any control, raises questions about its quality. In this work, we explore a significant number of quality indicators and study their capability to assess the quality of articles from three Web collaborative digital libraries. Furthermore, we explore machine learning techniques to combine these quality indicators into one single assessment. Through experiments, we show that the most important quality indicators are those which are also the easiest to extract, namely, the textual features related to the structure of the article. Moreover, to the best of our knowledge, this work is the first that shows an empirical comparison between Web collaborative digital libraries regarding the task of assessing article quality.