A semi-supervised algorithm for pattern discovery in information extraction from textual data

  • Authors:
  • Tianhao Wu;William M. Pottenger

  • Affiliations:
  • Computer Science and Engineering, Lehigh University;Computer Science and Engineering, Lehigh University

  • Venue:
  • PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this article we present a semi-supervised algorithm for pattern discovery in information extraction from textual data. The patterns that are discovered take the form of regular expressions that generate regular languages. We term our approach 'semi-supervised' because it requires significantly less effort to develop a training set than other approaches. From the training data our algorithm automatically generates regular expressions that can be used on previously unseen data for information extraction. Our experiments show that the algorithm has good testing performance on many features that are important in the fight against terrorism.