Learning Information Extraction Rules for Semi-Structured and Free Text

  • Authors:
  • Stephen Soderland

  • Affiliations:
  • Department Computer Science and Engineering, University of Washington, Seattle, WA 98195-2350. soderlan@cs.washington.edu

  • Venue:
  • Machine Learning - Special issue on natural language learning
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

A wealth of on-line text information can be made available toautomatic processing by information extraction (IE) systems. Each IEapplication needs a separate set of rules tuned to the domain andwriting style. WHISK helps to overcome this knowledge-engineeringbottleneck by learning text extraction rules automatically.WHISK is designed to handle text styles ranging from highly structuredto free text, including text that is neither rigidly formatted norcomposed of grammatical sentences. Such semi-structured text haslargely been beyond the scope of previous systems. When used inconjunction with a syntactic analyzer and semantic tagging, WHISK canalso handle extraction from free text such as news stories.