Content independent metadata production as a machine learning problem

  • Authors:
  • Sahar Changuel;Nicolas Labroche

  • Affiliations:
  • CNRS, UMR7606, LIP6, Université Pierre et Marie Curie, Paris 6, France;CNRS, UMR7606, LIP6, Université Pierre et Marie Curie, Paris 6, France

  • Venue:
  • MLDM'12 Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Metadata provide a high-level description of digital library resources and represent the key to enable the discovery and selection of suitable resources. However the growth in size and diversity of digital collections makes manual metadata extraction an expensive task. This paper proposes a new content independent method to automatically generate metadata in order to characterize resources in a given learning objects repository. The key idea is to rely on few existing metadata to learn predictive models of metadata values. The proposed method is content independent and handles resources in different formats: text, image, video, Java applet, etc. Two classical machine learning approaches are studied in this paper: in the first approach a supervised machine learning technique classify each value of a metadata field to be predicted according to the other a-priori filled metadata fields. The second approach used the FP-Growth algorithm to discover relationships between the different metadata fields as association rules. Experiments on two well-known educational data repositories show that both approaches can enhance metadata extraction and can even fill subjective metadata fields that are difficult to extract from the content of a resource, such as the difficulty of a resource.