A new optimization criterion for generalized discriminant analysis on undersampled problems

Authors:
Jieping Ye;Ravi Janardan;Cheong Hee Park;Haesun Park
Affiliations:
-;-;-;-
Venue:
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Year:
2003

Citing 6
Cited 1

Algorithms for clustering data

Algorithms for clustering data
Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Using linear algebra for intelligent information retrieval

SIAM Review
Concept decompositions for large sparse text data using clustering

Machine Learning
Structure Preserving Dimension Reduction for Clustered Text Data Based on the Generalized Singular Value Decomposition

SIAM Journal on Matrix Analysis and Applications
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)

Efficient Nonlinear Dimension Reduction for Clustered Data Using Kernel Functions

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

A new optimization criterion for discriminant analysis ispresented. The new criterion extends the optimization criteriaof the classical linear discriminant analysis (LDA) byintroducing the pseudo-inverse when the scatter matricesare singular. It is applicable regardless of the relative sizesof the data dimension and sample size, overcoming a limitationof the classical LDA. Recently, a new algorithm calledLDA/GSVD for structure-preserving dimension reductionhas been introduced, which extends the classical LDA tovery high-dimensional undersampled problems by using thegeneralized singular value decomposition (GSVD). The solutionfrom the LDA/GSVD algorithm is a special case of thesolution for our generalized criterion in this paper, which isalso based on GSVD.We also present an approximate solution for our GSVD-basedsolution, which reduces computational complexity byfinding sub-clusters of each cluster, and using their centroidsto capture the structure of each cluster. This reducedproblem yields much smaller matrices of which the GSVDcan be applied efficiently. Experiments on text data, withup to 7000 dimensions, show that the approximation algorithmproduces results that are close to those produced bythe exact algorithm.