Distance-function design and fusion for sequence data

Authors:
Yi Wu;Edward Y. Chang
Affiliations:
University of California, Santa Barbara, CA;University of California, Santa Barbara, CA
Venue:
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Year:
2004

Citing 20
Cited 7

Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
The nature of statistical learning theory

The nature of statistical learning theory
Efficiently supporting ad hoc queries in large datasets of time sequences

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Matching and indexing sequences of different lengths

CIKM '97 Proceedings of the sixth international conference on Information and knowledge management
An eigenspace update algorithm for image analysis

Graphical Models and Image Processing
Adaptive query processing for time-series data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Segment-based approach for subsequence searches in sequence databases

Proceedings of the 2001 ACM symposium on Applied computing
Locally adaptive dimensionality reduction for indexing large time series databases

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Choosing Multiple Parameters for Support Vector Machines

Machine Learning
Efficient Retrieval of Similar Time Sequences Under Time Warping

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Diffusion Kernels on Graphs and Other Discrete Input Spaces

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Using Signature Files for Querying Time-Series Data

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Identifying Representative Trends in Massive Time Series Data Sets Using Sketches

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
A Database Index to Large Biological Sequences

Proceedings of the 27th International Conference on Very Large Data Bases
On the need for time series data mining benchmarks: a survey and empirical demonstration

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A symbolic representation of time series, with implications for streaming algorithms

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Multi-camera spatio-temporal fusion and biased sequence-data learning for security surveillance

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Gradient-Based Optimization of Hyperparameters

Neural Computation
Searching on the secondary structure of protein sequences

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Classifier Fusion Using Shared Sampling Distribution for Boosting

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Fast time series classification using numerosity reduction

ICML '06 Proceedings of the 23rd international conference on Machine learning
Enhanced 1-NN time series classification using badness of records

Proceedings of the 2nd international conference on Ubiquitous information management and communication
Learning spectral graph transformations for link prediction

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Combining discrete SVM and fixed cardinality warping distances for multivariate time series classification

Pattern Recognition
The link prediction problem in bipartite networks

IPMU'10 Proceedings of the Computational intelligence for knowledge-based systems design, and 13th international conference on Information processing and management of uncertainty
An approach to dimensionality reduction in time series

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sequence-data mining plays a key role in many scientific studies and real-world applications such as bioinformatics, data stream, and sensor networks, where sequence data are processed and their semantics interpreted. In this paper we address two relevant issues: sequence-data representation, and representation-to-semantics mapping. For representation, since the best one is dependent upon the application being used and even the type of query, we propose representing sequence data in multiple views. For each representation, we propose methods to construct a valid kernel as the distance function to measure similarity between sequences. For mapping, we then find the best combination of the individual distance functions, which measure similarity of different views, to depict the target semantics. We propose a super-kernel function-fusion scheme to achieve the optimal mapping. Through theoretical analysis and empirical studies on UCI and real world datasets, we show our approach of multi-view representation and fusion to be mathematically valid and very effective for practical purposes.