Efficient feature weighting methods for ranking

Authors:
Hwanjo Yu;Jinoh Oh;Wook-Shin Han
Affiliations:
POSTECH, Pohang, South Korea;POSTECH, Pohang, South Korea;Kyungbuk National University, Daegu, South Korea
Venue:
Proceedings of the 18th ACM conference on Information and knowledge management
Year:
2009

Citing 23
Cited 1

A practical approach to feature selection

ML92 Proceedings of the ninth international workshop on Machine learning
Estimating attributes: analysis and extensions of RELIEF

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
OHSUMED: an interactive retrieval evaluation and new large test collection for research

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Theoretical and Empirical Analysis of ReliefF and RReliefF

Machine Learning
An introduction to variable and feature selection

The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Benchmarking Attribute Selection Techniques for Discrete Class Data Mining

IEEE Transactions on Knowledge and Data Engineering
An efficient boosting algorithm for combining preferences

The Journal of Machine Learning Research
Query chains: learning to rank from implicit feedback

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
SVM selective sampling for ranking with application to data retrieval

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Learning to rank using gradient descent

ICML '05 Proceedings of the 22nd international conference on Machine learning
Iterative RELIEF for feature weighting

ICML '06 Proceedings of the 23rd international conference on Machine learning
Adapting ranking SVM to document retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Enabling soft queries for data retrieval

Information Systems
Feature selection in a kernel space

Proceedings of the 24th international conference on Machine learning
Ranking with multiple hyperplanes

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
AdaRank: a boosting algorithm for information retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Feature selection for ranking

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Fast learning of document ranking functions with the committee perceptron

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining

Power and Performance Management of GPUs Based Cluster

International Journal of Cloud Applications and Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Feature weighting or selection is a crucial process to identify an important subset of features from a data set. Removing irrelevant or redundant features can improve the generalization performance of ranking functions in information retrieval. Due to fundamental differences between classification and ranking, feature weighting methods developed for classification cannot be readily applied to feature weighting for ranking. A state of the art feature selection method for ranking, called GAS, has been recently proposed, which exploits importance of each feature and similarity between every pair of features. However, GAS must compute the similarity scores of all pairs of features, thus it is not scalable for high-dimensional data and its performance degrades on nonlinear ranking functions. This paper proposes novel algorithms, RankWrapper and RankFilter, which is scalable for high-dimensional data and also performs reasonably well on nonlinear ranking functions. RankWrapper and RankFilter are designed based on the key idea of Relief algorithm. Relief is a feature selection algorithm for classification, which exploits the notions of hits (data points within the same class) and misses (data points from different classes) for classification. However, there is no such notion of hits or misses in ranking. The proposed algorithms instead utilize the ranking distances of nearest data points in order to identify the key features for ranking. Our extensive experiments show that RankWrapper and RankFilter generate higher accuracy overall than the GAS and traditional Relief algorithms adapted for ranking, and run substantially faster than the GAS on high dimensional data.