Evidence of quality of textual features on the web 2.0

  • Authors:
  • Flavio Figueiredo;Fabiano Belém;Henrique Pinto;Jussara Almeida;Marcos Gonçalves;David Fernandes;Edleno Moura;Marco Cristo

  • Affiliations:
  • Federal University of Minas Gerais, Belo Horizonte - Minas Gerais, Brazil;Federal University of Minas Gerais, Belo Horizonte - Minas Gerais, Brazil;Federal University of Minas Gerais, Belo Horizonte - Minas Gerais, Brazil;Federal University of Minas Gerais, Belo Horizonte - Minas Gerais, Brazil;Federal University of Minas Gerais, Belo Horizonte - Minas Gerais, Brazil;Federal University of Minas Gerais, Belo Horizonte - Minas Gerais, Brazil;Federal University of Amazonas, Manaus - Amazonas, Brazil;FUCAPI, Manaus - Amazonas, Brazil

  • Venue:
  • Proceedings of the 18th ACM conference on Information and knowledge management
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The growth of popularity of Web 2.0 applications greatly increased the amount of social media content available on the Internet. However, the unsupervised, user-oriented nature of this source of information, and thus, its potential lack of quality, have posed a challenge to information retrieval (IR) services. Previous work focuses mostly only on tags, although a consensus about its effectiveness as supporting information for IR services has not yet been reached. Moreover, other textual features of the Web 2.0 are generally overseen by previous research. In this context, this work aims at assessing the relative quality of distinct textual features available on the Web 2.0. Towards this goal, we analyzed four features (title, tags, description and comments) in four popular applications (CiteULike, Last.FM, Yahoo! Video, and Youtube). Firstly, we characterized data from these applications in order to extract evidence of quality of each feature with respect to usage, amount of content, descriptive and discriminative power as well as of content diversity across features. Afterwards, a series of classification experiments were conducted as a case study for quality evaluation. Characterization and classification results indicate that: 1) when considered separately, tags is the most promising feature, achieving the best classification results, although its absence in a non-negligible fraction of objects may affect its potential use; and 2) each feature may bring different pieces of information, and combining their contents can improve classification.