Supervised semantic indexing

Authors:
Bing Bai;Jason Weston;David Grangier;Ronan Collobert;Kunihiko Sadamasa;Yanjun Qi;Olivier Chapelle;Kilian Weinberger
Affiliations:
NEC Labs America, INC, Princeton, NJ, USA;NEC Labs America, INC, Princeton, NJ, USA;NEC Labs America, INC, Princeton, NJ, USA;NEC Labs America, INC, Princeton, NJ, USA;NEC Labs America, INC, Princeton, NJ, USA;NEC Labs America, INC, Princeton, NJ, USA;Yahoo! Research, Santa Clara, CA, USA;Yahoo! Research, Santa Clara, CA, USA
Venue:
Proceedings of the 18th ACM conference on Information and knowledge management
Year:
2009

Citing 22
Cited 11

Translating collocations for bilingual lexicons: a statistical approach

Computational Linguistics
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval

Modern Information Retrieval
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Latent dirichlet allocation

The Journal of Machine Learning Research
Supervised Latent Semantic Indexing for Document Categorization

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Inferring document similarity from hyperlinks

Proceedings of the 14th ACM international conference on Information and knowledge management
Learning to rank using gradient descent

ICML '05 Proceedings of the 22nd international conference on Machine learning
The rate adapting poisson model for information retrieval and object recognition

ICML '06 Proceedings of the 23rd international conference on Machine learning
Learning to rank: from pairwise approach to listwise approach

Proceedings of the 24th international conference on Machine learning
A support vector method for optimizing average precision

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A knowledge-based search engine powered by wikipedia

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Wikipedia-Based Kernels for Text Categorization

SYNASC '07 Proceedings of the Ninth International Symposium on Symbolic and Numeric Algorithms for Scientific Computing
A Discriminative Kernel-Based Approach to Rank Images from Text Queries

IEEE Transactions on Pattern Analysis and Machine Intelligence
Fast solvers and efficient implementations for distance metric learning

Proceedings of the 25th international conference on Machine learning
Enhancing text clustering by leveraging Wikipedia semantics

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Query-drift prevention for robust query expansion

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Semantic hashing

International Journal of Approximate Reasoning
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A neural network for text representation

ICANN'05 Proceedings of the 15th international conference on Artificial neural networks: formal models and their applications - Volume Part II
Automatic extraction of semantic relationships for wordnet by means of pattern learning from wikipedia

NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems

Best-effort semantic document search on GPUs

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Decomposing background topics from keywords by principal component pursuit

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Learning similarity function for rare queries

Proceedings of the fourth ACM international conference on Web search and data mining
On inferring image label information using rank minimization for supervised concept embedding

SCIA'11 Proceedings of the 17th Scandinavian conference on Image analysis
A method for noise-robust context-aware pattern discovery and recognition from categorical sequences

Pattern Recognition
Sentiment classification based on supervised latent n-gram analysis

Proceedings of the 20th ACM international conference on Information and knowledge management
From sBoW to dCoT marginalized encoders for text representation

Proceedings of the 21st ACM international conference on Information and knowledge management
Learning to match images in large-scale collections

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I
Data-driven vehicle identification by image matching

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume 2
Accelerating text mining workloads in a MapReduce-based distributed GPU environment

Journal of Parallel and Distributed Computing
Learning bilinear model for matching queries and documents

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this article we propose Supervised Semantic Indexing (SSI), an algorithm that is trained on (query, document) pairs of text documents to predict the quality of their match. Like Latent Semantic Indexing (LSI), our models take account of correlations between words (synonymy, polysemy). However, unlike LSI our models are trained with a supervised signal directly on the ranking task of interest, which we argue is the reason for our superior results. As the query and target texts are modeled separately, our approach is easily generalized to different retrieval tasks, such as online advertising placement. Dealing with models on all pairs of words features is computationally challenging. We propose several improvements to our basic model for addressing this issue, including low rank (but diagonal preserving) representations, and correlated feature hashing (CFH). We provide an empirical study of all these methods on retrieval tasks based on Wikipedia documents as well as an Internet advertisement task. We obtain state-of-the-art performance while providing realistically scalable methods.