Random projection in dimensionality reduction: applications to image and text data

Authors:
Ella Bingham;Heikki Mannila
Affiliations:
Helsinki University of Technology, FIN-02015 HUT, Finland;Helsinki University of Technology, FIN-02015 HUT, Finland
Venue:
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2001

Citing 16
Cited 120

The Johnson-Lindenstrauss Lemma and the sphericity of some graphs

Journal of Combinatorial Theory Series A
Discrete cosine transform: algorithms, advantages, applications

Discrete cosine transform: algorithms, advantages, applications
Two algorithms for nearest-neighbor search in high dimensions

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Latent semantic indexing: a probabilistic analysis

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
EM algorithms for PCA and SPCA

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
A new method for similarity indexing of market basket data

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Database-friendly random projections

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
An Introduction to Wavelets

IEEE Computational Science & Engineering
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Experiments with Random Projection

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Random Projection: A New Approach to VLSI Layout

FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Learning Mixtures of Gaussians

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
An Algorithmic Theory of Learning: Robust Concepts and Random Projection

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Polynomial time approximation schemes for geometric k-clustering

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science

Properties of Embedding Methods for Similarity Searching in Metric Spaces

IEEE Transactions on Pattern Analysis and Machine Intelligence
Experiments with random projections for machine learning

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
LSISOM – A Latent Semantic Indexing Approach to Self-Organizing Maps of Document Collections

Neural Processing Letters
Subspace clustering for high dimensional data: a review

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Locality preserving indexing for document representation

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
On scaling latent semantic indexing for large peer-to-peer systems

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
An Efficient Subspace Sampling Framework for High-Dimensional Data Reduction, Selectivity Estimation, and Nearest-Neighbor Search

IEEE Transactions on Knowledge and Data Engineering
A Generic Scheme for Color Image Retrieval Based on the Multivariate Wald-Wolfowitz Test

IEEE Transactions on Knowledge and Data Engineering
A multinomial clustering model for fast simulation of computer architecture designs

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Automatic bilingual lexicon acquisition using random indexing of parallel corpora

Natural Language Engineering
PRISM: indexing multi-dimensional data in P2P networks using reference vectors

Proceedings of the 13th annual ACM international conference on Multimedia
Online algorithm for the self-organizing map of symbol strings

Neural Networks - 2004 Special issue: New developments in self-organizing systems
Clustering quality based feature selection method

Machine Graphics & Vision International Journal
Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform

Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Very sparse random projections

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Visualizing coercible simulations

WSC '04 Proceedings of the 36th conference on Winter simulation
Activation-Based Recursive Self-Organising Maps: A General Formulation and Empirical Results

Neural Processing Letters
Deterministic projection by growing cell structure networks for visualization of high-dimensionality datasets

Journal of Biomedical Informatics - Special section: JAMA commentaries
Using bag-of-concepts to improve the performance of support vector machines in text categorization

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Engineering efficient metric indexes

Pattern Recognition Letters
Efficient index-based KNN join processing for high-dimensional data

Information and Software Technology
Random projection and orthonormality for lossy image compression

Image and Vision Computing
Dimensionality reduction for long duration and complex spatio-temporal queries

Proceedings of the 2007 ACM symposium on Applied computing
Embeddings of surfaces, curves, and moving points in euclidean space

SCG '07 Proceedings of the twenty-third annual symposium on Computational geometry
Very sparse stable random projections for dimension reduction in lα (0

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Classification in an informative sample subspace

Pattern Recognition
Learning video preferences from video content

Proceedings of the 8th international workshop on Multimedia data mining: (associated with the ACM SIGKDD 2007)
Unsupervised Topic Detection in document collections: an application in marketing and business journals

International Journal of Business Intelligence and Data Mining
Client-Friendly Classification over Random Hyperplane Hashes

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Example based learning for object detection in images

VNBA '08 Proceedings of the 1st ACM workshop on Vision networks for behavior analysis
Enhancing the correntropy MACE filter with random projections

Neurocomputing
Image-mapped data clustering: An efficient technique for clustering large data sets

Intelligent Data Analysis
A generalized adaptive ensemble generation and aggregation approach for multiple classifier systems

Pattern Recognition
Fuzzy ensemble clustering based on random projections for DNA microarray data analysis

Artificial Intelligence in Medicine
Privacy-Preserving Clustering with High Accuracy and Low Time Complexity

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
iSAX: disk-aware mining and indexing of massive time series datasets

Data Mining and Knowledge Discovery
A Domain-Specific Knowledge Space Creation Process for Semantic Associative Search

Proceedings of the 2009 conference on Information Modelling and Knowledge Bases XX
Applying randomized projection to aid prediction algorithms in detecting high-dimensional rogue applications

Proceedings of the 47th Annual Southeast Regional Conference
Fast Spectral Clustering with Random Projection and Sampling

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Coarse-to-fine syntactic machine translation using language projections

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Real-Time Collaborative Filtering Using Extreme Learning Machine

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses

Artificial Intelligence in Medicine
Faster dimension reduction

Communications of the ACM
Managing massive time series streams with multi-scale compressed trickles

Proceedings of the VLDB Endowment
Sorted index numbers for privacy preserving face recognition

EURASIP Journal on Advances in Signal Processing - Special issue on recent advances in biometric systems: a signal processing perspective
Improving document clustering in a learned concept space

Information Processing and Management: an International Journal
Classification of microarrays with kNN: comparison of dimensionality reduction methods

IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Comparing LDA with pLSI as a dimensionality reduction method in document clustering

LKR'08 Proceedings of the 3rd international conference on Large-scale knowledge resources: construction and application
Research on text-reducing method based on the improved KNN algorithm

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 4
Random projections for face detection under resource constraints

ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
From fusion to re-ranking: a semantic approach

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Finding microarray genes using GO ontology

Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India
Learning better data representation using inference-driven metric learning

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Enhanced vector space models for content-based recommender systems

Proceedings of the fourth ACM conference on Recommender systems
An analysis of random projection for changeable and privacy-preserving biometric verification

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Clustering with random indexing K-tree and XML structure

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Clustering of data and nearest neighbors search for pattern recognition with dimensionality reduction using random projections

ICAISC'10 Proceedings of the 10th international conference on Artificial intelligence and soft computing: Part I
Aiding prediction algorithms in detecting high-dimensional malicious applications using a randomized projection technique

Proceedings of the 48th Annual Southeast Regional Conference
Clustering and semantics preservation in cultural heritage information spaces

RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
Two-dimensional random projection

Signal Processing
Noise reduction in LSA-based essay assessment

SMO'05 Proceedings of the 5th WSEAS international conference on Simulation, modelling and optimization
Random projections for linear SVM ensembles

Applied Intelligence
Using randomized projection techniques to aid in detecting high-dimensional malicious applications

Proceedings of the 49th Annual Southeast Regional Conference
XML documents clustering using a tensor space model

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Fast approximate text document clustering using compressive sampling

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Random indexing distributional semantic models for Croatian language

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Cross-language information filtering: word sense disambiguation vs. distributional models

AI*IA'11 Proceedings of the 12th international conference on Artificial intelligence around man and beyond
Kernel sparse representation based classification

Neurocomputing
Parallel corpora and WordSpace models: using a third language as an interlingua to enrich multilingual resources

International Journal of Information and Communication Technology
Refining local descriptors by embedding semantic information for visual categorization

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Importance Sampling for a Monte Carlo Matrix Multiplication Algorithm, with Application to Information Retrieval

SIAM Journal on Scientific Computing
Dimensionality reduction of protein mass spectrometry data using random projection

ICONIP'06 Proceedings of the 13th international conference on Neural Information Processing - Volume Part II
Dynamic integration with random forests

ECML'06 Proceedings of the 17th European conference on Machine Learning
Ensembles based on random projections to improve the accuracy of clustering algorithms

WIRN'05 Proceedings of the 16th Italian conference on Neural Nets
Improving random projections using marginal information

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Nearest neighbor search on vertically partitioned high-dimensional data

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Knowledge discovery in data using formal concept analysis and random projections

International Journal of Applied Mathematics and Computer Science
Sorted random projections for robust rotation-invariant texture classification

Pattern Recognition
The impact of feature extraction on the performance of a classifier: kNN, Naïve Bayes and C4.5

AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence
Comparing dimension reduction techniques for document clustering

AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence
Integrating heterogeneous microarray data sources using correlation signatures

DILS'05 Proceedings of the Second international conference on Data Integration in the Life Sciences
A tight bound on the performance of Fisher's linear discriminant in randomly projected data spaces

Pattern Recognition Letters
Algorithmic paradigms for stability-based cluster validity and model selection statistical methods, with applications to microarray data analysis

Theoretical Computer Science
Novelty detection in projected spaces for structural health monitoring

IDA'10 Proceedings of the 9th international conference on Advances in Intelligent Data Analysis
Ontology based law discovery

Semantic Processing of Legal Texts
Dimensionality reduction by Mixed Kernel Canonical Correlation Analysis

Pattern Recognition
Improving quality of search results clustering with approximate matrix factorisations

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Applying random projection to the classification of malicious applications using data mining algorithms

Proceedings of the 50th Annual Southeast Regional Conference
Randomized Algorithms for Matrices and Data

Foundations and Trends® in Machine Learning
Visualizing cluster structures and their changes over time by two-step application of self-organizing maps

PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
On feature selection with principal component analysis for one-class SVM

Pattern Recognition Letters
Fast sampling word correlations of high dimensional text data (abstract only)

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Spam detection using Random Boost

Pattern Recognition Letters
Overview and evaluation of premise selection techniques for large theory mathematics

IJCAR'12 Proceedings of the 6th international joint conference on Automated Reasoning
Automatic telephone handset identification by sparse representation of random spectral features

Proceedings of the on Multimedia and security
Face segmentation using projection pursuit for texture classification

IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
Distributed high dimensional information theoretical image registration via random projections

Digital Signal Processing
Automatically enhancing locality for tree traversals with traversal splicing

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
CELI: an experiment with cross language textual entailment

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Random direction divisive clustering

Pattern Recognition Letters
Real-time compressive tracking

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
Pairwise similarity of TopSig document signatures

Proceedings of the Seventeenth Australasian Document Computing Symposium
Feature match: an efficient low dimensional PatchMatch technique

Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing
Online multi-modal distance learning for scalable multimedia retrieval

Proceedings of the sixth ACM international conference on Web search and data mining
A comparative study of dimensionality reduction techniques to enhance trace clustering performances

Expert Systems with Applications: An International Journal
Reference point transformation for visualisation

AusDM '09 Proceedings of the Eighth Australasian Data Mining Conference - Volume 101
Improving ESA with document similarity

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Learning in compressed space

Neural Networks
3D shape regression for real-time facial animation

ACM Transactions on Graphics (TOG) - SIGGRAPH 2013 Conference Proceedings
Evaluation of weighted Fisher criteria for large category dimensionality reduction in application to Chinese handwriting recognition

Pattern Recognition
Context-Aware predictions on business processes: an ensemble-based solution

NFMCP'12 Proceedings of the First international conference on New Frontiers in Mining Complex Patterns
Integer partitioning based encryption for privacy preservation in data mining

Proceedings of the First International Conference on Security of Internet of Things
Pointing gesture recognition using compressed sensing for training data reduction

Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication
Applying static analysis to high-dimensional malicious application detection

Proceedings of the 51st ACM Southeast Conference
Enabling low bitrate mobile visual recognition: a performance versus bandwidth evaluation

Proceedings of the 21st ACM international conference on Multimedia
Compression in wireless sensor networks: A survey and comparative evaluation

ACM Transactions on Sensor Networks (TOSN)
Anomaly detection in large-scale data stream networks

Data Mining and Knowledge Discovery
A high-dimensional two-sample test for the mean using random subspaces

Computational Statistics & Data Analysis
Face Alignment by Explicit Shape Regression

International Journal of Computer Vision
Two-factor face authentication using matrix permutation transformation and a user password

Information Sciences: an International Journal

Quantified Score

Hi-index	0.02

Visualization

Abstract

Random projections have recently emerged as a powerful method for dimensionality reduction. Theoretical results indicate that the method preserves distances quite nicely; however, empirical results are sparse. We present experimental results on using random projection as a dimensionality reduction tool in a number of cases, where the high dimensionality of the data would otherwise lead to burden-some computations. Our application areas are the processing of both noisy and noiseless images, and information retrieval in text documents. We show that projecting the data onto a random lower-dimensional subspace yields results comparable to conventional dimensionality reduction methods such as principal component analysis: the similarity of data vectors is preserved well under random projection. However, using random projections is computationally significantly less expensive than using, e.g., principal component analysis. We also show experimentally that using a sparse random matrix gives additional computational savings in random projection.