Mining reference tables for automatic text segmentation

  • Authors:
  • Eugene Agichtein;Venkatesh Ganti

  • Affiliations:
  • Columbia University;Microsoft Research

  • Venue:
  • Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatically segmenting unstructured text strings into structured records is necessary for importing the information contained in legacy sources and text collections into a data warehouse for subsequent querying, analysis, mining and integration. In this paper, we mine tables present in data warehouses and relational databases to develop an automatic segmentation system. Thus, we overcome limitations of existing supervised text segmentation approaches, which require comprehensive manually labeled training data. Our segmentation system is robust, accurate, and efficient, and requires no additional manual effort. Thorough evaluation on real datasets demonstrates the robustness and accuracy of our system, with segmentation accuracy exceeding state of the art supervised approaches.