Proceedings of the 11th international conference on World Wide Web
Identifying and Filtering Near-Duplicate Documents
COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
Building Minority Language Corpora by Learning to Generate Web Search Queries
Knowledge and Information Systems
Object-level ranking: bringing order to Web objects
WWW '05 Proceedings of the 14th international conference on World Wide Web
Language-Independent Set Expansion of Named Entities Using the Web
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Personalizing PageRank for word sense disambiguation
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Experiments on pseudo relevance feedback using graph random walks
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Hi-index | 0.00 |
In this paper, we present GrawlTCQ, a new bootstrapping algorithm for building specialized terminology, corpora and queries, based on a graph model. We model links between documents, terms and queries, and use a random walk with restart algorithm to compute relevance propagation. We have evaluated GrawlTCQ on an AFP English corpus of 57,441 news over 10 categories. For corpora building, GrawlTCQ outperforms the BootCaT tool, which is vastly used in the domain. For 1,000 documents retrieved, we improve mean precision by 25%. GrawlTCQ has also shown to be faster and more robust than BootCaT over iterations.