GrawlTCQ: terminology and corpora building by ranking simultaneously terms, queries and documents using graph random walks

  • Authors:
  • Clément de Groc;Xavier Tannier;Javier Couto

  • Affiliations:
  • Univ. Paris Sud, LIMSI-CNRS;Univ. Paris Sud, LIMSI-CNRS;Syllabs, MoDyCo

  • Venue:
  • TextGraphs-6 Proceedings of TextGraphs-6: Graph-based Methods for Natural Language Processing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present GrawlTCQ, a new bootstrapping algorithm for building specialized terminology, corpora and queries, based on a graph model. We model links between documents, terms and queries, and use a random walk with restart algorithm to compute relevance propagation. We have evaluated GrawlTCQ on an AFP English corpus of 57,441 news over 10 categories. For corpora building, GrawlTCQ outperforms the BootCaT tool, which is vastly used in the domain. For 1,000 documents retrieved, we improve mean precision by 25%. GrawlTCQ has also shown to be faster and more robust than BootCaT over iterations.