The nature of statistical learning theory
The nature of statistical learning theory
Inducing Features of Random Fields
IEEE Transactions on Pattern Analysis and Machine Intelligence
Computational Statistics & Data Analysis - Nonlinear methods and data mining
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
ACM SIGIR Forum
Query type classification for web document retrieval
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Categorizing web queries according to geographical locality
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Understanding user goals in web search
Proceedings of the 13th international conference on World Wide Web
Automatic identification of user goals in Web search
WWW '05 Proceedings of the 14th international conference on World Wide Web
Improving Automatic Query Classification via Semi-Supervised Learning
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Q2C@UST: our winning solution to query classification in KDDCUP 2005
ACM SIGKDD Explorations Newsletter
A comparison of algorithms for maximum entropy parameter estimation
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Statistical modeling and conceptualization of visual patterns
IEEE Transactions on Pattern Analysis and Machine Intelligence
Classifying search queries using the Web as a source of knowledge
ACM Transactions on the Web (TWEB)
Mining search engine clickthrough log for matching N-gram features
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Exploring features for the automatic identification of user goals in web search
Information Processing and Management: an International Journal
Understanding and predicting personal navigation
Proceedings of the fourth ACM international conference on Web search and data mining
Recipe recommendation using ingredient networks
Proceedings of the 3rd Annual ACM Web Science Conference
Hi-index | 0.00 |
It is important yet hard to identify navigational queries in Web search due to a lack of sufficient information in Web queries, which are typically very short. In this paper we study several machine learning methods, including naive Bayes model, maximum entropy model, support vector machine (SVM), and stochastic gradient boosting tree (SGBT), for navigational query identification in Web search. To boost the performance of these machine techniques, we exploit several feature selection methods and propose coupling feature selection with classification approaches to achieve the best performance. Different from most prior work that uses a small number of features, in this paper, we study the problem of identifying navigational queries with thousands of available features, extracted from major commercial search engine results, Web search user click data, query log, and the whole Web's relational content. A multi-level feature extraction system is constructed.Our results on real search data show that 1) Among all the features we tested, user click distribution features are the most important set of features for identifying navigational queries. 2) In order to achieve good performance, machine learning approaches have to be coupled with good feature selection methods. We find that gradient boosting tree, coupled with linear SVM feature selection is most effective. 3) With carefully coupled feature selection and classification approaches, navigational queries can be accurately identified with 88.1% F1 score, which is 33% error rate reduction compared to the best uncoupled system, and 40% error rate reduction compared to a well tuned system without feature selection.