A practical approach to feature selection
ML92 Proceedings of the ninth international workshop on Machine learning
Estimating attributes: analysis and extensions of RELIEF
ECML-94 Proceedings of the European conference on machine learning on Machine Learning
OHSUMED: an interactive retrieval evaluation and new large test collection for research
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Theoretical and Empirical Analysis of ReliefF and RReliefF
Machine Learning
An introduction to variable and feature selection
The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Benchmarking Attribute Selection Techniques for Discrete Class Data Mining
IEEE Transactions on Knowledge and Data Engineering
An efficient boosting algorithm for combining preferences
The Journal of Machine Learning Research
Query chains: learning to rank from implicit feedback
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
SVM selective sampling for ranking with application to data retrieval
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Learning to rank using gradient descent
ICML '05 Proceedings of the 22nd international conference on Machine learning
Iterative RELIEF for feature weighting
ICML '06 Proceedings of the 23rd international conference on Machine learning
Adapting ranking SVM to document retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Enabling soft queries for data retrieval
Information Systems
Feature selection in a kernel space
Proceedings of the 24th international conference on Machine learning
Ranking with multiple hyperplanes
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
AdaRank: a boosting algorithm for information retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Fast learning of document ranking functions with the committee perceptron
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Power and Performance Management of GPUs Based Cluster
International Journal of Cloud Applications and Computing
Hi-index | 0.00 |
Feature weighting or selection is a crucial process to identify an important subset of features from a data set. Removing irrelevant or redundant features can improve the generalization performance of ranking functions in information retrieval. Due to fundamental differences between classification and ranking, feature weighting methods developed for classification cannot be readily applied to feature weighting for ranking. A state of the art feature selection method for ranking, called GAS, has been recently proposed, which exploits importance of each feature and similarity between every pair of features. However, GAS must compute the similarity scores of all pairs of features, thus it is not scalable for high-dimensional data and its performance degrades on nonlinear ranking functions. This paper proposes novel algorithms, RankWrapper and RankFilter, which is scalable for high-dimensional data and also performs reasonably well on nonlinear ranking functions. RankWrapper and RankFilter are designed based on the key idea of Relief algorithm. Relief is a feature selection algorithm for classification, which exploits the notions of hits (data points within the same class) and misses (data points from different classes) for classification. However, there is no such notion of hits or misses in ranking. The proposed algorithms instead utilize the ranking distances of nearest data points in order to identify the key features for ranking. Our extensive experiments show that RankWrapper and RankFilter generate higher accuracy overall than the GAS and traditional Relief algorithms adapted for ranking, and run substantially faster than the GAS on high dimensional data.