Matching samples of multiple views

Authors:
Abhishek Tripathi;Arto Klami;Matej Orešič;Samuel Kaski
Affiliations:
Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland;Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Helsinki, Finland;Quantitative Biology and Bioinformatics, VTT Technical Research Centre of Finland, Espoo, Finland;Helsinki Institute for Information Technology HIIT, Aalto University and University of Helsinki, Helsinki, Finland
Venue:
Data Mining and Knowledge Discovery
Year:
2011

Citing 15
Cited 3

A shortest augmenting path algorithm for dense and sparse linear assignment problems

Computing
On Algorithms For Permuting Large Entries to the Diagonal of a Sparse Matrix

SIAM Journal on Matrix Analysis and Applications
Modeling annotated data

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Bitext maps and alignment via pattern recognition

Computational Linguistics
Sentence alignment for monolingual comparable corpora

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Canonical Correlation Analysis: An Overview with Application to Learning Methods

Neural Computation
Using KCCA for Japanese---English cross-language information retrieval and document classification

Journal of Intelligent Information Systems
Manifold alignment using Procrustes analysis

Proceedings of the 25th international conference on Machine learning
A Hilbert Space Embedding for Distributions

ALT '07 Proceedings of the 18th international conference on Algorithmic Learning Theory
Probabilistic approach to detecting dependencies between data sets

Neurocomputing
Assignment Problems

Assignment Problems
Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models

Bioinformatics
Manifold alignment without correspondence

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Infinite factorization of multiple non-parametric views

Machine Learning
Estimation of mixture models using Co-EM

ECML'05 Proceedings of the 16th European conference on Machine Learning

Cross-species translation of multi-way biomarkers

ICANN'11 Proceedings of the 21th international conference on Artificial neural networks - Volume Part I
Bayesian Canonical correlation analysis

The Journal of Machine Learning Research
Neighborhood Correlation Analysis for Semi-paired Two-View Data

Neural Processing Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multi-view learning studies how several views, different feature representations, of the same objects could be best utilized in learning. In other words, multi-view learning is analysis of co-occurrence data, where the observations are co-occurrences of samples in the views. Standard multi-view learning such as joint density modeling cannot be done in the absence of co-occurrence, when the views are observed separately and the identities of objects are not known. As a practical example, joint analysis of mRNA and protein concentrations requires mapping between genes and proteins. We introduce a data-driven approach for learning the correspondence of the observations in the different views, in order to enable joint analysis also in the absence of known co-occurrence. The method finds a matching that maximizes statistical dependency between the views, which is particularly suitable for multi-view methods such as canonical correlation analysis which has the same objective. We apply the method to translational metabolomics, to identify differences and commonalities in metabolic processes in different species or tissues. The metabolite identities and roles in the different species are not generally known, and it is necessary to search for a matching. In this paper we show, using different metabolomics measurement batches as the views so that the ground truth is known, that the metabolite identities can be reliably matched by a consensus of several matching solutions.