Soft cardinality: a parameterized similarity function for text comparison

  • Authors:
  • Sergio Jimenez;Claudia Becerra;Alexander Gelbukh

  • Affiliations:
  • Universidad Nacional de Colombia, Bogota, Ciudad Universitaria, edificio, oficina;Universidad Nacional de Colombia, Bogota;CIC-IPN Av. Juan Dios Bátiz, Av. Mendizábal, Col. Nueva Industrial Vallejo, DF, México

  • Venue:
  • SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present an approach for the construction of text similarity functions using a parameterized resemblance coefficient in combination with a softened cardinality function called soft cardinality. Our approach provides a consistent and recursive model, varying levels of granularity from sentences to characters. Therefore, our model was used to compare sentences divided into words, and in turn, words divided into q-grams of characters. Experimentally, we observed that a performance correlation function in a space defined by all parameters was relatively smooth and had a single maximum achievable by "hill climbing." Our approach used only surface text information, a stop-word remover, and a stemmer to tackle the semantic text similarity task 6 at SEMEVAL 2012. The proposed method ranked 3rd (average), 5th (normalized correlation), and 15th (aggregated correlation) among 89 systems submitted by 31 teams.