Use of elliptic curves in term discrimination

  • Authors:
  • Darnes Vilariño;David Pinto;Carlos Balderas;Mireya Tovar;Beatriz Beltrán;Sofia Paniagua

  • Affiliations:
  • Benemérita Universidad Autónoma de Puebla, Mexico;Benemérita Universidad Autónoma de Puebla, Mexico;Benemérita Universidad Autónoma de Puebla, Mexico;Benemérita Universidad Autónoma de Puebla, Mexico;Benemérita Universidad Autónoma de Puebla, Mexico;Benemérita Universidad Autónoma de Puebla, Mexico

  • Venue:
  • MCPR'11 Proceedings of the Third Mexican conference on Pattern recognition
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Detection of discriminant terms allow us to improve the performance of natural language processing systems. The goal is to be able to find the possible term contribution in a given corpus and, thereafter, to use the terms of high contribution for representing the corpus. In this paper we present various experiments that use elliptic curves with the purpose of discovering discriminant terms of a given textual corpus. Different experiments led us to use the mean and variance of the corpus terms for determining the parameters of a Weierstrass reduced equation (elliptic curve). We use the elliptic curves in order to graphically visualize the behavior of the corpus vocabulary. Thereafter, we use the elliptic curve parameters in order to cluster those terms that share characteristics. These clusters are then used as discriminant terms in order to represent the original document collection. Finally, we evaluated all these corpus representations in order to determine those terms that best discrimine each document.