Semantic text similarity using corpus-based word similarity and string similarity

Authors:
Aminul Islam;Diana Inkpen
Affiliations:
University of Ottawa, ON, Canada;University of Ottawa, ON, Canada
Venue:
ACM Transactions on Knowledge Discovery from Data (TKDD)
Year:
2008

Citing 32
Cited 41

A bit-string longest-common-subsequence algorithm

Information Processing Letters
Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone

SIGDOC '86 Proceedings of the 5th annual international conference on Systems documentation
Data integration using similarity joins and a word-based information representation language

ACM Transactions on Information Systems (TOIS)
Contextual correlates of synonymy

Communications of the ACM
Text Information Retrieval Systems

Text Information Retrieval Systems
Determining Semantic Similarity among Entity Classes from Different Ontologies

IEEE Transactions on Knowledge and Data Engineering
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources

IEEE Transactions on Knowledge and Data Engineering
Image Retrieval Using Multiple Evidence Ranking

IEEE Transactions on Knowledge and Data Engineering
Automatic word sense discrimination

Computational Linguistics - Special issue on word sense disambiguation
Bitext maps and alignment via pattern recognition

Computational Linguistics
Improving text categorization using the importance of sentences

Information Processing and Management: an International Journal
Verbs semantics and lexical selection

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Efficient similarity-based operations for data integration

Data & Knowledge Engineering
Corpus-Based Schema Matching

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Algorithmic detection of semantic similarity

WWW '05 Proceedings of the 14th international conference on World Wide Web
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Automatic evaluation of summaries using N-gram co-occurrence statistics

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Techniques for improving web retrieval effectiveness

Information Processing and Management: an International Journal
Sentence Similarity Based on Semantic Nets and Corpus Statistics

IEEE Transactions on Knowledge and Data Engineering
Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Characterising measures of lexical distributional similarity

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Measuring semantic similarity between words using web search engines

Proceedings of the 16th international conference on World Wide Web
Applications of corpus-based semantic similarity and word segmentation to database schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Corpus-based and knowledge-based measures of text semantic similarity

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
LexRank: graph-based lexical centrality as salience in text summarization

Journal of Artificial Intelligence Research
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Measuring the semantic similarity of texts

EMSEE '05 Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment
Automatic evaluation of text coherence: models and representations

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Classification of RSS-Formatted documents using full text similarity measures

ICWE'05 Proceedings of the 5th international conference on Web Engineering
Text similarity computing based on standard deviation

ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
N-gram similarity and distance

SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval

A Novel Composite Kernel Approach to Chinese Entity Relation Extraction

ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
Msuggest: a semantic recommender framework for traditional chinese medicine book search engine

Proceedings of the 18th ACM conference on Information and knowledge management
Real-word spelling correction using Google web 1Tn-gram data set

Proceedings of the 18th ACM conference on Information and knowledge management
Real-word spelling correction using Google Web IT 3-grams

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Benchmarking short text semantic similarity

International Journal of Intelligent Information and Database Systems
A comparative analysis of similarity measurement techniques through SimReq framework

Proceedings of the 7th International Conference on Frontiers of Information Technology
Multilingual novelty detection

Expert Systems with Applications: An International Journal
Text relatedness based on a word thesaurus

Journal of Artificial Intelligence Research
The automatic assessment of free text answers using a modified BLEU algorithm

Computers & Education
A machine learning approach to speech act classification using function words

KES-AMSTA'10 Proceedings of the 4th KES international conference on Agent and multi-agent systems: technologies and applications, Part II
Clustering Web video search results based on integration of multiple features

World Wide Web
Word sense disambiguation-based sentence similarity

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
SyMSS: A syntax-based measure for short-text semantic similarity

Data & Knowledge Engineering
Correcting different types of errors in texts

Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence
Using properties to compare both words and clauses

KES-AMSTA'11 Proceedings of the 5th KES international conference on Agent and multi-agent systems: technologies and applications
Redundancy and collaboration in wikibooks

INTERACT'11 Proceedings of the 13th IFIP TC 13 international conference on Human-computer interaction - Volume Part I
Mining slang and urban opinion words and phrases from cQA services: an optimization approach

Proceedings of the fifth ACM international conference on Web search and data mining
Supporting collaboration in Wikipedia between language communities

Proceedings of the 4th international conference on Intercultural Collaboration
Unsupervised similarity-based word sense disambiguation using context vectors and sentential word importance

ACM Transactions on Speech and Language Processing (TSLP)
Extracting keyphrase set with high diversity and coverage using structural SVM

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Recognising sentence similarity using similitude and dissimilarity features

International Journal of Advanced Intelligence Paradigms
GenDocSum+MCLR: Generic document summarization based on maximum coverage and less redundancy

Expert Systems with Applications: An International Journal
A multi-classifier approach to dialogue act classification using function words

Transactions on Computational Collective Intelligence VII
Text similarity using google tri-grams

Canadian AI'12 Proceedings of the 25th Canadian conference on Advances in Artificial Intelligence
Towards efficient similar sentences extraction

IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
TakeLab: systems for measuring semantic text similarity

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
System description of Semantic Textual Similarity (STS) in the SemEval-2012 (Task 6)

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
A simple unsupervised latent semantics based approach for sentence similarity

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
UNITOR: combining semantic text similarity functions through SV regression

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
DERI&UPM: pushing corpus based relatedness to similarity: shared task system description

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Modeling sentences in the latent space

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
DESAMC+DocSum: Differential evolution with self-adaptive mutation and crossover parameters for multi-document summarization

Knowledge-Based Systems
Mining sentiment terminology through time

Proceedings of the 21st ACM international conference on Information and knowledge management
Test collection recycling for semantic text similarity

Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
An integrated semantic-based approach in concept based video retrieval

Multimedia Tools and Applications
Web 2.0 environmental scanning and adaptive decision support for business mergers and acquisitions

MIS Quarterly
Recommendation of text tags using linked data

Proceedings of the 3rd International Workshop on Semantic Search Over the Web
Combining co-clustering with noise detection for theme-based summarization

ACM Transactions on Speech and Language Processing (TSLP)
A new benchmark dataset with production methodology for short text semantic similarity algorithms

ACM Transactions on Speech and Language Processing (TSLP)
Multi-level sequence alignment: a trade-off between speed and accuracy in similar text searching

Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
Enhancing sentence-level clustering with ranking-based clustering framework for theme-based summarization

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a method for measuring the semantic similarity of texts using a corpus-based measure of semantic word similarity and a normalized and modified version of the Longest Common Subsequence (LCS) string matching algorithm. Existing methods for computing text similarity have focused mainly on either large documents or individual words. We focus on computing the similarity between two sentences or two short paragraphs. The proposed method can be exploited in a variety of applications involving textual knowledge representation and knowledge discovery. Evaluation results on two different data sets show that our method outperforms several competing methods.