Soft cardinality + ML: learning adaptive similarity functions for cross-lingual textual entailment

Authors:
Sergio Jimenez;Claudia Becerra;Alexander Gelbukh
Affiliations:
Universidad Nacional, de Colombia, Bogota, Ciudad Universitaria, edificio, oficina;Universidad Nacional de Colombia, Bogota;CIC-IPN Av. Juan Dios Bátiz, Av. Mendizábal, Col. Nueva Industrial Vallejo, DF, México
Venue:
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Year:
2012

Citing 22
Cited 1

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Support-Vector Networks

Machine Learning
Learning String-Edit Distance

IEEE Transactions on Pattern Analysis and Machine Intelligence
Algorithms for the Longest Common Subsequence Problem

Journal of the ACM (JACM)
A vector space model for automatic indexing

Communications of the ACM
Supervised term weighting for automated text categorization

Proceedings of the 2003 ACM symposium on Applied computing
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Measures of distributional similarity

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Introduction to Machine Learning (Adaptive Computation and Machine Learning)

Introduction to Machine Learning (Adaptive Computation and Machine Learning)
A comprehensive comparative study on term weighting schemes for text categorization with support vector machines

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
NLTK: the Natural Language Toolkit

ETMTNLP '02 Proceedings of the ACL-02 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics - Volume 1
Adaptive Name Matching in Information Integration

IEEE Intelligent Systems
Generalized Mongue-Elkan Method for Approximate Text String Comparison

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
WordNet::Similarity: measuring the relatedness of concepts

HLT-NAACL--Demonstrations '04 Demonstration Papers at HLT-NAACL 2004
A machine learning approach to textual entailment recognition

Natural Language Engineering
Measuring the semantic similarity of texts

EMSEE '05 Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment
Learning textual entailment using SVMs and string similarity measures

RTE '07 Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Paraphrase recognition using machine learning to combine similarity measures

ACLstudent '09 Proceedings of the ACL-IJCNLP 2009 Student Research Workshop
Text comparison using soft cardinality

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Quantum latent semantic analysis

ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Semeval-2012 task 8: cross-lingual textual entailment for content synchronization

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation

Semeval-2012 task 8: cross-lingual textual entailment for content synchronization

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a novel approach for building adaptive similarity functions based on cardinality using machine learning. Unlike current approaches that build feature sets using similarity scores, we have developed these feature sets with the cardinalities of the commonalities and differences between pairs of objects being compared. This approach allows the machine-learning algorithm to obtain an asymmetric similarity function suitable for directional judgments. Besides using the classic set cardinality, we used soft cardinality to allow flexibility in the comparison between words. Our approach used only the information from the surface of the text, a stop-word remover and a stemmer to address the cross-lingual textual entailment task 8 at SEMEVAL 2012. We have the third best result among the 29 systems submitted by 10 teams. Additionally, this paper presents better results compared with the best official score.