Distance-function design and fusion for sequence data

  • Authors:
  • Yi Wu;Edward Y. Chang

  • Affiliations:
  • University of California, Santa Barbara, CA;University of California, Santa Barbara, CA

  • Venue:
  • Proceedings of the thirteenth ACM international conference on Information and knowledge management
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Sequence-data mining plays a key role in many scientific studies and real-world applications such as bioinformatics, data stream, and sensor networks, where sequence data are processed and their semantics interpreted. In this paper we address two relevant issues: sequence-data representation, and representation-to-semantics mapping. For representation, since the best one is dependent upon the application being used and even the type of query, we propose representing sequence data in multiple views. For each representation, we propose methods to construct a valid kernel as the distance function to measure similarity between sequences. For mapping, we then find the best combination of the individual distance functions, which measure similarity of different views, to depict the target semantics. We propose a super-kernel function-fusion scheme to achieve the optimal mapping. Through theoretical analysis and empirical studies on UCI and real world datasets, we show our approach of multi-view representation and fusion to be mathematically valid and very effective for practical purposes.