Corpus-based and knowledge-based measures of text semantic similarity

Authors:
Rada Mihalcea;Courtney Corley;Carlo Strapparava
Affiliations:
Department of Computer Science, University of North Texas;Department of Computer Science, University of North Texas;Istituto per la Ricerca Scientifica e Tecnologica, ITC - irst
Venue:
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Year:
2006

Citing 19
Cited 94

Using WordNet to disambiguate word senses for text retrieval

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic text structuring and summarization

Information Processing and Management: an International Journal - Special issue: methods and tools for the automatic construction of hypertext
Term-weighting approaches in automatic text retrieval

Readings in information retrieval
Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone

SIGDOC '86 Proceedings of the 5th annual international conference on Systems documentation
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Automatic word sense discrimination

Computational Linguistics - Special issue on word sense disambiguation
Discovery of inference rules for question-answering

Natural Language Engineering
Verbs semantics and lexical selection

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Why inverse document frequency?

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Automatic evaluation of summaries using N-gram co-occurrence statistics

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Sentence alignment for monolingual comparable corpora

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Finding predominant word senses in untagged text

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Paraphrasing with bilingual parallel corpora

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Automatic evaluation of text coherence: models and representations

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Using measures of semantic relatedness for word sense disambiguation

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
The PASCAL recognising textual entailment challenge

MLCW'05 Proceedings of the First international conference on Machine Learning Challenges: evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment

Categorizing and ranking search engine's results by semantic similarity

Proceedings of the 2nd international conference on Ubiquitous information management and communication
Semantic text similarity using corpus-based word similarity and string similarity

ACM Transactions on Knowledge Discovery from Data (TKDD)
iSpreadRank: Ranking sentences for extraction-based summarization using feature weight propagation in the sentence similarity network

Expert Systems with Applications: An International Journal
The Evaluation of Sentence Similarity Measures

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Addressing the Variability of Natural Language Expression in Sentence Similarity with Semantic Structure of the Sentences

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Towards a Novel Association Measure via Web Search Results Mining

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Toward communicating simple sentences using pictorial representations

Machine Translation
An API for measuring the relatedness of words in Wikipedia

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Using web-search results to measure word-group similarity

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Text-to-text semantic similarity for automatic short answer grading

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Omiotis: A Thesaurus-Based Measure of Text Relatedness

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Knowledge derived from wikipedia for computing semantic relatedness

Journal of Artificial Intelligence Research
Wikipedia-based semantic interpretation for natural language processing

Journal of Artificial Intelligence Research
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Automatically selecting answer templates to respond to customer emails

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Answering learners' questions by retrieving question paraphrases from social Q&A sites

EANL '08 Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications
Diagnosing meaning errors in short answers to reading comprehension questions

EANL '08 Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications
Machine learning based semantic inference: experiments and observations at RTE-3

RTE '07 Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing
Analyzing Interactive QA Dialogues Using Logistic Regression Models

AI*IA '09: Proceedings of the XIth International Conference of the Italian Association for Artificial Intelligence Reggio Emilia on Emergent Perspectives in Artificial Intelligence
Part of Speech (POS) Tag Sets Reduction and Analysis Using Rough Set Techniques

RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Learning term-weighting functions for similarity measures

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Random walks for text semantic similarity

TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
Syntactic impact on sentence similarity measure in archive-based QA system

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Semantic similarity measures for Malay sentences

ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
Sentence similarity measurement based on shallow parsing

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
Sentence similarity measure based on events and content words

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
Benchmarking short text semantic similarity

International Journal of Intelligent Information and Database Systems
Syntactic/semantic structures for textual entailment recognition

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A framework for figurative language detection based on sense differentiation

ACLstudent '10 Proceedings of the ACL 2010 Student Research Workshop
Identification of Sentence-to-Sentence Relations Using a Textual Entailer

Research on Language and Computation
Text relatedness based on a word thesaurus

Journal of Artificial Intelligence Research
Learning the relative usefulness of questions in community QA

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
A utility-driven approach to question ranking in social QA

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Finding top-k similar pairs of objects annotated with terms from an ontology

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
The automatic assessment of free text answers using a modified BLEU algorithm

Computers & Education
Comparative evaluation of ontology-based Automatic Reference Tracking (ART)

International Journal of Networking and Virtual Organisations
Clustering Web video search results based on integration of multiple features

World Wide Web
Word sense disambiguation-based sentence similarity

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Text mining for automatic image tagging

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
SyMSS: A syntax-based measure for short-text semantic similarity

Data & Knowledge Engineering
Learning to grade short answer questions using semantic similarity measures and dependency graph alignments

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Improving question recommendation by exploiting information need

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Using a Wikipedia-based semantic relatedness measure for document clustering

TextGraphs-6 Proceedings of TextGraphs-6: Graph-based Methods for Natural Language Processing
A semantic analysis approach for assessing professionalism using free-form text entered online

Computers in Human Behavior
Pure high-order word dependence mining via information geometry

ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Wordnet based word sense disambiguation

ICCCI'11 Proceedings of the Third international conference on Computational collective intelligence: technologies and applications - Volume Part II
Using semantic distance to automatically suggest transfer course equivalencies

IUNLPBEA '11 Proceedings of the 6th Workshop on Innovative Use of NLP for Building Educational Applications
Transferring topical knowledge from auxiliary long texts for short text clustering

Proceedings of the 20th ACM international conference on Information and knowledge management
Semi-supervised semantic role labeling via structural alignment

Computational Linguistics
Utilization of ontology in health for archetypes constraint enforcement

ICCSA'10 Proceedings of the 2010 international conference on Computational Science and Its Applications - Volume Part III
Comparison of the baseline knowledge-, corpus-, and web-based similarity measures for semantic relations extraction

GEMS '11 Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics
Structured lexical similarity via convolution kernels on dependency trees

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Computational models for incongruity detection in humour

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Using concept-level random walk model and global inference algorithm for answer summarization

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Decision support for improved service effectiveness using domain aware text mining

Knowledge-Based Systems
Fine-grained topic detection in news search results

Proceedings of the 27th Annual ACM Symposium on Applied Computing
Recognising sentence similarity using similitude and dissimilarity features

International Journal of Advanced Intelligence Paradigms
Learning a concept-based document similarity measure

Journal of the American Society for Information Science and Technology
Enhancement of co-authorship networks with content-similarity information

Proceedings of the International Conference on Advances in Computing, Communications and Informatics
GoRelations: an intuitive query system for DBpedia

JIST'11 Proceedings of the 2011 joint international conference on The Semantic Web
Text similarity using google tri-grams

Canadian AI'12 Proceedings of the 25th Canadian conference on Advances in Artificial Intelligence
Towards efficient similar sentences extraction

IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
Target word selection in English to Persian translation using unsupervised approach

International Journal of Artificial Intelligence and Soft Computing
Exploring content features for automated speech scoring

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Re-examining machine translation metrics for paraphrase identification

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
UKP: computing semantic textual similarity by combining multiple content similarity measures

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
TakeLab: systems for measuring semantic text similarity

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
PolyUCOMP: combining semantic vectors with skip bigrams for semantic textual similarity

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
A rule-based human interpretation system for semantic textual similarity task

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
DSS: text similarity using lexical alignments of form, distributional semantics and grammatical relations

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
A simple unsupervised latent semantics based approach for sentence similarity

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
UNITOR: combining semantic text similarity functions through SV regression

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
FBK: machine translation evaluation and word similarity metrics for semantic textual similarity

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
BUAP: three approaches for semantic textual similarity

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
UNT: a supervised synergistic approach to semantic text similarity

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
University_of_Sheffield: two approaches to semantic text similarity

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
A study of hybrid similarity measures for semantic relation extraction

HYBRID '12 Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data
Modeling sentences in the latent space

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
A comparison of vector-based representations for semantic composition

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
TCSST: transfer classification of short & sparse text using external data

Proceedings of the 21st ACM international conference on Information and knowledge management
Exploring dictionary-based semantic relatedness in labeled tree data

Information Sciences: an International Journal
Computing similarity between items in a digital library of cultural heritage

Journal on Computing and Cultural Heritage (JOCCH)
Computing text semantic relatedness using the contents and links of a hypertext encyclopedia

Artificial Intelligence
Test collection recycling for semantic text similarity

Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
An integrated semantic-based approach in concept based video retrieval

Multimedia Tools and Applications
Mining research abstracts for exploration of research communities

Proceedings of the 5th ACM COMPUTE Conference: Intelligent & scalable system technologies
Customer review summarization approach using Twitter and SentiWordNet

Proceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics
Mining pure high-order word associations via information geometry for information retrieval

ACM Transactions on Information Systems (TOIS)
Building structures from classifiers for passage reranking

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Data reduction for continuum of care: an exploratory study using the predicate-argument structure to pre-process radiology sentences for measurement of semantic similarity

UAHCI'13 Proceedings of the 7th international conference on Universal Access in Human-Computer Interaction: applications and services for quality of life - Volume Part III
An ontology-based similarity measure for biomedical data - Application to radiology reports

Journal of Biomedical Informatics
Supervised hypothesis discovery using syllogistic patterns in the biomedical literature

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Semantic Approach to Web-Based Discovery of Unknowns to Enhance Intelligence Gathering

International Journal of Information Retrieval Research
Exploiting discourse information to identify paraphrases

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a method for measuring the semantic similarity of texts, using corpus-based and knowledge-based measures of similarity. Previous work on this problem has focused mainly on either large documents (e.g. text classification, information retrieval) or individual words (e.g. synonymy tests). Given that a large fraction of the information available today, on the Web and elsewhere, consists of short text snippets (e.g. abstracts of scientific documents, imagine captions, product descriptions), in this paper we focus on measuring the semantic similarity of short texts. Through experiments performed on a paraphrase data set, we show that the semantic similarity method out-performs methods based on simple lexical matching, resulting in up to 13% error rate reduction with respect to the traditional vector-based similarity metric.