Automatically suggesting topics for augmenting text documents

  • Authors:
  • Robert West;Doina Precup;Joelle Pineau

  • Affiliations:
  • McGill University, Montréal, PQ, Canada;McGill University, Montréal, PQ, Canada;McGill University, Montréal, PQ, Canada

  • Venue:
  • CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
  • Year:
  • 2010

Quantified Score

Hi-index 0.02

Visualization

Abstract

We present a method for automated topic suggestion. Given a plain-text input document, our algorithm produces a ranking of novel topics that could enrich the input document in a meaningful way. It can thus be used to assist human authors, who often fail to identify important topics relevant to the context of the documents they are writing. Our approach marries two algorithms originally designed for linking documents to Wikipedia articles, proposed by Milne and Witten [15] and West et al. [22]. While neither of them can suggest novel topics by itself, their combination does have this capability. The key step towards finding missing topics consists in generalizing from a large background corpus using principal component analysis. In a quantitative evaluation we conclude that our method achieves the precision of human editors when input documents are Wikipedia articles, and we complement this result with a qualitative analysis showing that the approach also works well on other types of input documents.