Conducting term alignment of a dataset without data provider identification

  • Authors:
  • Tetsuya Yoshida

  • Affiliations:
  • Hokkaido University, Graduate School of Information Science and Technology, Sapporo, Hokkaido, Japan

  • Venue:
  • ICCOMP'06 Proceedings of the 10th WSEAS international conference on Computers
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes to extend our previous approach for term alignment of datasets so that term alignment can be conducted even when the identification of data provider is not known for each data item. Our previous method can conduct term alignment w.r.t. the usage of terms in data items, provided the identification of data provider is known for each data item. However, this assumption may not always hold, since data can be collected and uploaded from anonymous providers. To tackle this problem, this paper proposes a new method which partitions a dataset into subsets of data items. To seek for a better partition, decision trees are constructed for the subsets. By defining a distance between decision trees, the quality of a partition is measured in terms of the stability of the structure of the constructed decision trees w.r.t. the defined distance. Our previous method for term alignment is then applied for the partitioned subsets to conduct term alignment. The implementation of the proposed method has been conducted and its effectiveness is evaluated through experiments with new evaluation measures in the context of our term alignment.