Automatic feature thesaurus enrichment: extracting generic terms from digital gazetteer

  • Authors:
  • Jun Wang;Ning Ge

  • Affiliations:
  • Peking University, Beijing, China;Peking University, Beijing, China

  • Venue:
  • Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

ADL Gazetteer is a digitalized worldwide gazetteer developed in the Alexandria Digital Library (ADL) Project, which contains millions of geographic names (placenames). The placenames are indexed with type terms from the ADL Feature Type Thesaurus (FTT), a hierarchical category scheme. The paper proposes a two-step method to enrich the category scheme automatically: to discover frequent generic terms by detecting phase boundaries with a mutual information-based method, and to correlate the generic terms with the relevant type terms by hierarchical clustering. The correlation pair established can then be used to supplement the FTT with the generic terms found. The extensive experiments conducted on millions of ADLG placenames demonstrated the effectiveness of the proposed methods. Besides the thesaurus enrichment, the potential applications of this research include: to suggest likely type terms when categorizing new placenames, and to help users choose likely search terms.