A scalable supervised algorithm for dimensionality reduction on streaming data

Authors:
Jun Yan;Benyu Zhang;Shuicheng Yan;Ning Liu;Qiang Yang;Qiansheng Cheng;Hua Li;Zheng Chen;Wei-Ying Ma
Affiliations:
LMAM, Department of Information Science, School of Mathematical Science, Peking University, Beijing 100871, PR China;Microsoft Research Asia, 49, Zhichun Road, Beijing 100080, PR China;LMAM, Department of Information Science, School of Mathematical Science, Peking University, Beijing 100871, PR China;Department of Mathematics, Tsinghua University, Beijing 100084, PR China;Department of Computer Science, Hong Kong University of Science and Technology, Hong Kong;LMAM, Department of Information Science, School of Mathematical Science, Peking University, Beijing 100871, PR China;Department of Mathematics, School of Mathematical Science, Peking University, Beijing 100871, PR China;Microsoft Research Asia, 49, Zhichun Road, Beijing 100080, PR China;Microsoft Research Asia, 49, Zhichun Road, Beijing 100080, PR China
Venue:
Information Sciences: an International Journal
Year:
2006

Citing 19
Cited 7

Random sampling for histogram construction: how much is enough?

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
PCA versus LDA

IEEE Transactions on Pattern Analysis and Machine Intelligence
Sublinear time approximate clustering

SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Space-efficient online computation of quantile summaries

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries

Proceedings of the 27th International Conference on Very Large Data Bases
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports

Proceedings of the 27th International Conference on Very Large Data Bases
Finding Frequent Items in Data Streams

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Stable distributions, pseudorandom generators, embeddings and data stream computation

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Incremental PCA or On-Line Visual Learning and Recognition

ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 3 - Volume 3
Streaming-Data Algorithms for High-Quality Clustering

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Candid Covariance-Free Incremental Principal Component Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
IMMC: incremental maximum margin criterion

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Sufficient dimensionality reduction with irrelevance statistics

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence

Evaluation and classification of otoneurological data with new data analysis methods based on machine learning

Information Sciences: an International Journal
Network intrusion detection: Evaluating cluster, discriminant, and logit analysis

Information Sciences: an International Journal
Fractal dimension applied to plant identification

Information Sciences: an International Journal
Mining frequent itemsets over data streams using efficient window sliding techniques

Expert Systems with Applications: An International Journal
Supervised subspace projections for constructing ensembles of classifiers

Information Sciences: an International Journal
Incremental learning of complete linear discriminant analysis for face recognition

Knowledge-Based Systems
Mining Top-K Rank Frequent Patterns in Data Streams A Tree Based Approach with Ternary Function and Ternary Feature Vector

Proceedings of the Second International Conference on Innovative Computing and Cloud Computing

Quantified Score

Hi-index	0.07

Visualization

Abstract

Algorithms on streaming data have attracted increasing attention in the past decade. Among them, dimensionality reduction algorithms are greatly interesting due to the desirability of real tasks. Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) are two of the most widely used dimensionality reduction approaches. However, PCA is not optimal for general classification problems because it is unsupervised and ignores valuable label information for classification. On the other hand, the performance of LDA is degraded when encountering limited available low-dimensional spaces and singularity problem. Recently, Maximum Margin Criterion (MMC) was proposed to overcome the shortcomings of PCA and LDA. Nevertheless, the original MMC algorithm could not satisfy the streaming data model to handle large-scale high-dimensional data set. Thus an effective, efficient and scalable approach is needed. In this paper, we propose a supervised incremental dimensionality reduction algorithm and its extension to infer adaptive low-dimensional spaces by optimizing the maximum margin criterion. Experimental results on a synthetic dataset and real datasets demonstrate the superior performance of our proposed algorithm on streaming data.