The notion of diversity in graphical entity summarisation on semantic knowledge graphs

  • Authors:
  • Marcin Sydow;Mariusz Pikuła;Ralf Schenkel

  • Affiliations:
  • Web Mining Lab, Polish-Japanese Institute of Information Technology, Warsaw, Poland and Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland;Web Mining Lab, Polish-Japanese Institute of Information Technology, Warsaw, Poland;Saarland University and MPI for Informatics, Saarbrücken, Germany

  • Venue:
  • Journal of Intelligent Information Systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Given an entity represented by a single node q in semantic knowledge graph D, the Graphical Entity Summarisation problem (GES) consists in selecting out of D a very small surrounding graph S that constitutes a generic summary of the information concerning the entity q with given limit on size of S. This article concerns the role of diversity in this quite novel problem. It gives an overview of the diversity concept in information retrieval, and proposes how to adapt it to GES. A measure of diversity for GES, called ALC, is defined and two algorithms presented, baseline, diversity-oblivious PRECIS and diversity-aware DIVERSUM. A reported experiment shows that DIVERSUM actually achieves higher values of the ALC diversity measure than PRECIS. Next, an objective evaluation experiment demonstrates that diversity-aware algorithm is superior to the diversity-oblivious one in terms of fact selection. More precisely, DIVERSUM clearly achieves higher recall than PRECIS on ground truth reference entity summaries extracted from Wikipedia. We also report another intrinsic experiment, in which the output of diversity-aware algorithm is significantly preferred by human expert evaluators. Importantly, the user feedback clearly indicates that the notion of diversity is the key reason for the preference. In addition, the experiment is repeated twice on an anonymous sample of broad population of Internet users by means of a crowd-sourcing platform, that further confirms the results mentioned above.