A bit-string longest-common-subsequence algorithm
Information Processing Letters
YAP3: improved detection of similarities in computer program and other texts
SIGCSE '96 Proceedings of the twenty-seventh SIGCSE technical symposium on Computer science education
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
An Information-Theoretic Definition of Similarity
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Discovery of inference rules for question-answering
Natural Language Engineering
Automatic retrieval and clustering of similar words
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Moses: open source toolkit for statistical machine translation
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Corpus-based and knowledge-based measures of text semantic similarity
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Computing semantic relatedness using Wikipedia-based explicit semantic analysis
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Using information content to evaluate semantic similarity in a taxonomy
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Plagiarism detection across distant language pairs
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
The PASCAL recognising textual entailment challenge
MLCW'05 Proceedings of the First international conference on Machine Learning Challenges: evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment
Plagiarism detection using stopword n-grams
Journal of the American Society for Information Science and Technology
SemEval-2012 task 6: a pilot on semantic textual similarity
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Building structures from classifiers for passage reranking
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
A new benchmark dataset with production methodology for short text semantic similarity algorithms
ACM Transactions on Speech and Language Processing (TSLP)
Knowledge-based graph document modeling
Proceedings of the 7th ACM international conference on Web search and data mining
Hi-index | 0.00 |
We present the UKP system which performed best in the Semantic Textual Similarity (STS) task at SemEval-2012 in two out of three metrics. It uses a simple log-linear regression model, trained on the training data, to combine multiple text similarity measures of varying complexity. These range from simple character and word n-grams and common subsequences to complex features such as Explicit Semantic Analysis vector comparisons and aggregation of word similarity based on lexical-semantic resources. Further, we employ a lexical substitution system and statistical machine translation to add additional lexemes, which alleviates lexical gaps. Our final models, one per dataset, consist of a log-linear combination of about 20 features, out of the possible 300+ features implemented.