Pattern based approaches to pre-processing structured text: a newsfeed example

  • Authors:
  • Paul Bogg

  • Affiliations:
  • University of Technology Sydney

  • Venue:
  • ICCS'03 Proceedings of the 2003 international conference on Computational science
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text documents presenting a structured format allow the reader the ability to quickly run their eye over the page and read information relevant to them. By presenting information in this manner, the author allows ease of information extraction by the reader. If the structure used throughout the document involves a pattern or set of patterns to describe the text, then if text pre-processing methods can identify the patterns involved, those methods can also extract the same text as that of the naked eye. This extraction of meaningful text can then be used for further text mining applications. This paper describes a text pre-processing program that identifies text patterns and extracts the appropriate text.