Improving similarity measures for short segments of text

Authors:
Wen-Tau Yih;Christopher Meek
Affiliations:
Microsoft Research, Redmond, WA;Microsoft Research, Redmond, WA
Venue:
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Year:
2007

Citing 12
Cited 26

Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
Automatic feedback using past queries: social searching?

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Learning Algorithms for Keyphrase Extraction

Information Retrieval
Domain-Specific Keyphrase Extraction

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Sequential conditional Generalized Iterative Scaling

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Learning to rank using gradient descent

ICML '05 Proceedings of the 22nd international conference on Machine learning
Finding advertising keywords on web pages

Proceedings of the 15th international conference on World Wide Web
A web-based kernel function for measuring the similarity of short text snippets

Proceedings of the 15th international conference on World Wide Web
Generating query substitutions

Proceedings of the 15th international conference on World Wide Web
Coherent keyphrase extraction via web mining

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Similarity measures for short segments of text

ECIR'07 Proceedings of the 29th European conference on IR research

Learning to classify short and sparse text & web with hidden topics from large-scale data collections

Proceedings of the 17th international conference on World Wide Web
Consistent phrase relevance measures

Proceedings of the 2nd International Workshop on Data Mining and Audience Intelligence for Advertising
Towards a Novel Association Measure via Web Search Results Mining

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Web Search Clustering and Labeling with Hidden Topics

ACM Transactions on Asian Language Information Processing (TALIP)
Large-scale computation of distributional similarities for queries

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Clustering queries for better document ranking

Proceedings of the 18th ACM conference on Information and knowledge management
Learning term-weighting functions for similarity measures

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Precomputing search features for fast and accurate query classification

Proceedings of the third ACM international conference on Web search and data mining
Growing related words from seed via user behaviors: a re-ranking based approach

ACLstudent '10 Proceedings of the ACL 2010 Student Research Workshop
Organizing query completions for web search

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
German encyclopedia alignment based on information retrieval techniques

ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
Empirical study of topic modeling in Twitter

Proceedings of the First Workshop on Social Media Analytics
User Behaviors in Related Word Retrieval and New Word Detection: A Collaborative Perspective

ACM Transactions on Asian Language Information Processing (TALIP)
Transferring topical knowledge from auxiliary long texts for short text clustering

Proceedings of the 20th ACM international conference on Information and knowledge management
Summarizing and extracting online public opinion from blog search results

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Short text classification improved by learning multi-granularity topics

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
CluChunk: clustering large scale user-generated content incorporating chunklet information

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Measuring semantic relatedness using multilingual representations

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
TCSST: transfer classification of short & sparse text using external data

Proceedings of the 21st ACM international conference on Information and knowledge management
Extended information inference model for unsupervised categorization of web short texts

Journal of Information Science
Multimodal alignment of scholarly documents and their presentations

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Enhancing short text clustering with small external repositories

AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
Short text classification by detecting information path

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Exploiting topic tracking in real-time tweet streams

Proceedings of the 2013 international workshop on Mining unstructured big data using natural language processing
Improving short text classification using public search engines

IUKM'13 Proceedings of the 2013 international conference on Integrated Uncertainty in Knowledge Modelling and Decision Making
An efficient Particle Swarm Optimization approach to cluster short texts

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we improve previous work on measuring the similarity of short segments of text in two ways. First, we introduce a Web-relevance similarity measure and demonstrate its effectiveness. This measure extends the Web-kernel similarity function introduced by Sahami and Heilman (2006) by using relevance weighted inner-product of term occurrences rather than TF×IDF. Second, we show that one can further improve the accuracy of similarity measures by using a machine learning approach. Our methods outperform other state-of-the-art methods in a general query suggestion task for multiple evaluation metrics.