Mining numbers in text using suffix arrays and clustering based on dirichlet process mixture models

Authors:
Minoru Yoshida;Issei Sato;Hiroshi Nakagawa;Akira Terada
Affiliations:
University of Tokyo, Tokyo;University of Tokyo, Tokyo;University of Tokyo, Tokyo;Japan Airlines, Tokyo, Japan
Venue:
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Year:
2010

Citing 3
Cited 0

Suffix arrays: a new method for on-line string searches

SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
Mining the Web: Discovering Knowledge from HyperText Data

Mining the Web: Discovering Knowledge from HyperText Data
Gram-free synonym extraction via suffix arrays

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a system that enables us to search with ranges of numbers Both queries and resulting strings can be both strings and numbers (e.g., “200–800 dollars”) The system is based on suffix-arrays augmented with treatment of number information to provide search for numbers by words, and vice versa Further, the system performs clustering based on a Dirichlet Process Mixture of Gaussians to treat extracted collection of numbers appropriately.