Mining numbers in text using suffix arrays and clustering based on dirichlet process mixture models

  • Authors:
  • Minoru Yoshida;Issei Sato;Hiroshi Nakagawa;Akira Terada

  • Affiliations:
  • University of Tokyo, Tokyo;University of Tokyo, Tokyo;University of Tokyo, Tokyo;Japan Airlines, Tokyo, Japan

  • Venue:
  • PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a system that enables us to search with ranges of numbers Both queries and resulting strings can be both strings and numbers (e.g., “200–800 dollars”) The system is based on suffix-arrays augmented with treatment of number information to provide search for numbers by words, and vice versa Further, the system performs clustering based on a Dirichlet Process Mixture of Gaussians to treat extracted collection of numbers appropriately.