Automatic tag recommendation for metadata annotation using probabilistic topic modeling

  • Authors:
  • Suppawong Tuarob;Line C. Pouchard;C. Lee Giles

  • Affiliations:
  • The Pennsylvania State University, University Park, PA, USA;Oak Ridge National Laboratory, Oak Ridge, TN, USA;The Pennsylvania State University, University Park, PA, USA

  • Venue:
  • Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The increase of the complexity and advancement in ecological and environmental sciences encourages scientists across the world to collect data from multiple places, times, and thematic scales to verify their hypotheses. Accumulated over time, such data not only increases in amount, but also in the diversity of the data sources spread around the world. This poses a huge challenge for scientists who have to manually search for information. To alleviate such problems, ONEMercury has recently been implemented as part of the DataONE project to serve as a portal for accessing environmental and observational data across the globe. ONEMercury harvests metadata from the data hosted by multiple repositories and makes it searchable. However, harvested metadata records sometimes are poorly annotated or lacking meaningful keywords, which could affect effective retrieval. Here, we develop algorithms for automatic annotation of metadata. We transform the problem into a tag recommendation problem with a controlled tag library, and propose two variants of an algorithm for recommending tags. Our experiments on four datasets of environmental science metadata records not only show great promises on the performance of our method, but also shed light on the different natures of the datasets.