Unstructured data bases or very efficient text searching

  • Authors:
  • Gaston H. Gonnet

  • Affiliations:
  • University of Waterloo

  • Venue:
  • PODS '83 Proceedings of the 2nd ACM SIGACT-SIGMOD symposium on Principles of database systems
  • Year:
  • 1983

Quantified Score

Hi-index 0.02

Visualization

Abstract

We present several algorithms to search data bases that consist of text. The algorithms apply mostly to very large data bases that are difficult to structure.We describe algorithms which search the original data base without transformation and hence could be used as general text searching algorithms. We also describe algorithms requiring pre-processing, the best of them achieving a logarithmic behaviour. These efficient algorithms solve the "plagiarism" problem among n papers.The problem of misspellings, ambiguous spellings, simple errors, endings, positional information, etc is nicely treated using signature functions.