The role of context and adaptation in user interfaces
International Journal of Man-Machine Studies
Characterizing browsing strategies in the World-Wide Web
Proceedings of the Third International World-Wide Web conference on Technology, tools and applications
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Analysis of a very large web search engine query log
ACM SIGIR Forum
The production of ‘context’ in information seeking research: a metatheoretical view
Information Processing and Management: an International Journal - Special issue on Information Seeking In Context (ISIC)
Probabilistic query expansion using query logs
Proceedings of the 11th international conference on World Wide Web
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Improving Short-Text Classification using Unlabeled Data for Classification Problems
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Unsupervised word sense disambiguation rivaling supervised methods
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Support vector machine learning for interdependent and structured output spaces
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Shallow parsing with conditional random fields
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Improving Automatic Query Classification via Semi-Supervised Learning
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Semi-supervised conditional random fields for improved sequence segmentation and labeling
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Defining a session on Web search engines: Research Articles
Journal of the American Society for Information Science and Technology
Simple, robust, scalable semi-supervised learning via expectation regularization
Proceedings of the 24th international conference on Machine learning
Learning query intent from regularized click graphs
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs
Proceedings of the 17th ACM conference on Information and knowledge management
Context-aware query classification
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
ICTAI '09 Proceedings of the 2009 21st IEEE International Conference on Tools with Artificial Intelligence
Semi-Supervised Learning
Softmax-margin CRFs: training log-linear models with cost functions
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Joint question clustering and relevance prediction for open domain non-factoid question answering
Proceedings of the 23rd international conference on World wide web
Hi-index | 0.00 |
Topical classification of user queries is critical for general-purpose web search systems. It is also a challenging task, due to the sparsity of query terms and the lack of labeled queries. On the other hand, search contexts embedded in query sessions and unlabeled queries free on the web have not been fully utilized in most query classification systems. In this work, we leverage these information to improve query classification accuracy. We first incorporate search contexts into our framework using a Conditional Random Field (CRF) model. Discriminative training of CRFs is favored over the traditional maximum likelihood training because of its robustness to noise. We then adapt self-training with our model to exploit the information in unlabeled queries. By investigating different confidence measurements and model selection strategies, we effectively avoid the error-reinforcing nature of self-training. In extensive experiments on real search logs, we have averaged around 20% improvement in classification accuracy over other state-of-the-art baselines.