Automatically generating high quality metadata by analyzing the document code of common file types

  • Authors:
  • Lars Fredrik Høimyr Edvardsen;Ingeborg Torvik Sølvberg;Trond Aalberg;Hallvard Trætteberg

  • Affiliations:
  • Intelligent Communication AS/The Norwegian University of Science and Technology, Oslo, Norway;The Norwegian University of Science and Technology, Trondheim, Norway;The Norwegian University of Science and Technology, Trondheim, Norway;The Norwegian University of Science and Technology, Trondheim, Norway

  • Venue:
  • Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

A major challenge for content management in intranets and other large scale document storage and retrieval services is the generation of high quality metadata. Manual generation of metadata is resource demanding and is often viewed by collection managers and document authors as inefficient use of their time, and there is a desire for other ways to create the needed metadata. Automatic Metadata Generation (AMG) is methods for generating metadata without manual interaction using computer program(s) to interpret the document and possibly the document context. Current AMG research has been limited to collection of similarly formatted documents. The research presented in this paper expands the field of AMG by presenting an approach that is independent of a common visualization scheme; AMG based on document code analysis. This is done by showing AMG possibilities from Latex, Word and PowerPoint documents and how this approach can significantly increase the quality of the generated metadata. This by avoiding common quality reducing factors as missing completeness, low accuracy, logical consistency and coherence and timeliness by giving AMG algorithms direct access to the user specified intellectual content and the file formatting. This research shows how this AMG approach can be combined with other AMG approaches, drawing on their strengths in order to achieve the desired high quality metadata entities.