Using a web-based categorization approach to generate thematic metadata from texts

  • Authors:
  • Chien-Chung Huang;Shui-Lung Chuang;Lee-Feng Chien

  • Affiliations:
  • Academia Sinica, Taiwan;Academia Sinica, Taiwan;Academia Sinica, Taiwan

  • Venue:
  • ACM Transactions on Asian Language Information Processing (TALIP)
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Conventional tools for automatic metadata creation mostly extract named entities or text segments from texts and annotate them with information about persons, locations, dates, and so on. However, this kind of entity type information is often insufficient for machines to understand the facts contained in the texts, thus precluding the possibility of implementing more advanced, intelligent applications, such as concept-based search. In this work, we try to create more refined thematic metadata inherent in texts. Based on Web resource mining, our approach acquires training corpora necessary to describe both the thematic categories and the metadata extracted from the texts. The approach then finds the corresponding relationships among them by means of categorization and thus generates thematic metadata for the textual data. Experimental results confirm the potential and wide adaptability of our approach.