A short text modeling method combining semantic and statistical information

Authors:
Liu Wenyin;Xiaojun Quan;Min Feng;Bite Qiu
Affiliations:
Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Kowloon Tong, Hong Kong;Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Kowloon Tong, Hong Kong;Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Kowloon Tong, Hong Kong;Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Kowloon Tong, Hong Kong
Venue:
Information Sciences: an International Journal
Year:
2010

Citing 19
Cited 4

Progress in the application of natural language processing to information retrieval tasks

The Computer Journal - Special issue on information retrieval
Contextual correlates of synonymy

Communications of the ACM
Concept decompositions for large sparse text data using clustering

Machine Learning
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Object-Process Methodology: A Holistic Systems Paradigm

Object-Process Methodology: A Holistic Systems Paradigm
An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources

IEEE Transactions on Knowledge and Data Engineering
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Image Retrieval Using Multiple Evidence Ranking

IEEE Transactions on Knowledge and Data Engineering
Retrieving collocations from text: Xtract

Computational Linguistics - Special issue on using large corpora: I
Improving text categorization using the importance of sentences

Information Processing and Management: an International Journal
Literature Extraction of Protein Functions Using Sentence Pattern Mining

IEEE Transactions on Knowledge and Data Engineering
A web-based kernel function for measuring the similarity of short text snippets

Proceedings of the 15th international conference on World Wide Web
Sentence Similarity Based on Semantic Nets and Corpus Statistics

IEEE Transactions on Knowledge and Data Engineering
Eigenfaces for recognition

Journal of Cognitive Neuroscience
Supervised classification of share price trends

Information Sciences: an International Journal
Exploiting noun phrases and semantic relationships for text document clustering

Information Sciences: an International Journal
LexRank: graph-based lexical centrality as salience in text summarization

Journal of Artificial Intelligence Research
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Techniques for improving web retrieval effectiveness

Information Processing and Management: an International Journal

Ensemble of feature sets and classification algorithms for sentiment classification

Information Sciences: an International Journal
Multi-view learning via probabilistic latent semantic analysis

Information Sciences: an International Journal
GenDocSum+MCLR: Generic document summarization based on maximum coverage and less redundancy

Expert Systems with Applications: An International Journal
A rule-based human interpretation system for semantic textual similarity task

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation

Quantified Score

Hi-index	0.07

Visualization

Abstract

A novel modeling method for a collection of short text snippets is presented in this paper to measure the similarity between pairs of snippets. The method takes account of both the semantic and statistical information within the short text snippets, and consists of three steps. Given a set of raw short text snippets, it first establishes the initial similarity between words by using a lexical database. The method then iteratively calculates both word similarity and short text similarity. Finally, a proximity matrix is constructed based on word similarity and used to convert the raw text snippets into vectors. Word similarity and text clustering experiments show that the proposed short text modeling method improves the performance of existing text-related information retrieval (IR) techniques.