Entity-centric document filtering: boosting feature mapping through meta-features

Authors:
Mianwei Zhou;Kevin Chen-Chuan Chang
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, IL, USA;University of Illinois at Urbana-Champaign, Urbana, IL, USA
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 13
Cited 0

Stochastic gradient boosting

Computational Statistics & Data Analysis - Nonlinear methods and data mining
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Learning and evaluating classifiers under sample selection bias

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Adapting ranking SVM to document retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Co-clustering based classification for out-of-domain documents

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Query by document

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Reducing long queries using query quality predictors

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Transfer learning via dimensionality reduction

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
LETOR: A benchmark collection for research on learning to rank for information retrieval

Information Retrieval
A Survey on Transfer Learning

IEEE Transactions on Knowledge and Data Engineering
Query by document via a decomposition-based two-level retrieval approach

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper studies the entity-centric document filtering task -- given an entity represented by its identification page (e.g., an Wikpedia page), how to correctly identify its relevant documents. In particular, we are interested in learning an entity-centric document filter based on a small number of training entities, and the filter can predict document relevance for a large set of unseen entities at query time. Towards characterizing the relevance of a document, the problem boils down to learning keyword importance for the query entities. Since the same keyword will have very different importance for different entities, we abstract the entity-centric document filtering problem as a transfer learning problem, and the challenge becomes how to appropriately transfer the keyword importance learned from training entities to query entities. Based on the insight that keywords sharing some similar "properties" should have similar importance for their respective entities, we propose a novel concept of meta-feature to map keywords from different entities. To realize the idea of meta-feature-based feature mapping, we develop and contrast two different models, LinearMapping and BoostMapping. Experiments on three different datasets confirm the effectiveness of our proposed models, which show significant improvement compared with four state-of-the-art baseline methods.