On helmholtz's principle for documents processing

  • Authors:
  • Alexander A. Balinsky;Helen Y. Balinsky;Steven J. Simske

  • Affiliations:
  • Cardiff School of Mathematics, Cardiff, United Kingdom;Hewlett-Packard, Bristol, United Kingdom;Hewlett-Packard , Fort Collins, CO, USA

  • Venue:
  • Proceedings of the 10th ACM symposium on Document engineering
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Keyword extraction is a fundamental problem in text data mining and document processing. A large number of document processing applications directly depend on the quality and speed of keyword extraction algorithms. In this article, a novel approach to rapid change detection in data stream. and documents is developed. It is based on ideas from image processing and especially on the Helmholtz Principle from the Gestalt Theory of human perception. Applied to the problem of keywords extraction, it delivers fast and effective tools to identify meaningful keywords using parameter-free methods. We also define a level of meaningfulness of the keywords which can be used to modify the set of keywords depending on application needs.