Detecting and exploiting stability in evolving heterogeneous information spaces

  • Authors:
  • George Papadakis;George Giannakopoulos;Claudia Niederée;Themis Palpanas;Wolfgang Nejdl

  • Affiliations:
  • National Technical University of Athens, Greece & L3S Research Center, Germany, Athens, Greece;SKEL - NCSR Demokritos, Athens, Greece;L3S Research Center, Germany, Hannover, Germany;University of Trento, Italy, Trento, Italy;L3S Research Center, Germany, Hannover, Germany

  • Venue:
  • Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Individuals contribute content on the Web at an unprecedented rate, accumulating immense quantities of (semi-)structured data. Wisdom of the Crowds theory advocates that such information (or parts of it) is constantly overwritten, updated, or even deleted by other users, with the goal of rendering it more accurate, or up-to-date. This is particularly true for the collaboratively edited, semi-structured data of entity repositories, whose entity profiles are consistently kept fresh. Therefore, their core information that remain stable with the passage of time, despite being reviewed by numerous users, are particularly useful for the description of an entity. Based on the above hypothesis, we introduce a classification scheme that predicts, on the basis of statistical and content patterns, whether an attribute (i.e., name-value pair) is going to be modified in the future. We apply our scheme on a large, real-world, versioned dataset and verify its effectiveness. Our thorough experimental study also suggests that reducing entity profiles to their stable parts conveys significant benefits to two common tasks in computer science: information retrieval and information integration.