Classification and outlier detection based on topic based pattern synthesis

Authors:
Samrat Kokkula;Narasimha Murty Musti
Affiliations:
Citrix R&D India Pvt. Ltd, India;Indian Institute of Science, Bangalore, India
Venue:
MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
Year:
2013

Citing 28
Cited 0

Bagging predictors

Machine Learning
Machine Learning for the Detection of Oil Spills in Satellite Radar Images

Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Random Forests

Machine Learning
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
A Bootstrap Technique for Nearest Neighbor Classifier Design

IEEE Transactions on Pattern Analysis and Machine Intelligence
On the Surprising Behavior of Distance Metrics in High Dimensional Spaces

ICDT '01 Proceedings of the 8th International Conference on Database Theory
Enhancing Effectiveness of Outlier Detections for Low Density Patterns

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Latent dirichlet allocation

The Journal of Machine Learning Research
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Editorial: special issue on learning from imbalanced data sets

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Extreme re-balancing for SVMs: a case study

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
On Local Spatial Outliers

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Using AUC and Accuracy in Evaluating Learning Algorithms

IEEE Transactions on Knowledge and Data Engineering
Feature bagging for outlier detection

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Top 10 algorithms in data mining

Knowledge and Information Systems
Local peculiarity factor and its application in outlier detection

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Outlier Detection with Kernel Density Functions

MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Learning Decision Trees for Unbalanced Data

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Imbalanced text classification: A term weighting approach

Expert Systems with Applications: An International Journal
On the Design and Applicability of Distance Functions in High-Dimensional Data Space

IEEE Transactions on Knowledge and Data Engineering
Robust support vector machine training via convex outlier ablation

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Evolutionary sampling and software quality modeling of high-assurance systems

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
A rule-based scheme for filtering examples from majority class in an imbalanced training set

MLDM'03 Proceedings of the 3rd international conference on Machine learning and data mining in pattern recognition
Hellinger distance decision trees are robust and skew-insensitive

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

In several pattern classification problems, we encounter training datasets with an imbalanced class distribution and the presence of outliers, which can hinder the performance of classifiers. In this paper, we propose classification schemes based on the pre-processing of data using Novel Pattern Synthesis (NPS), with the aim to improve performance on such datasets. We provide a formal framework for characterizing the class imbalance and outlier elimination. Specifically, we look into the role of NPS in: Outlier elimination and handling class imbalance problem. In NPS, for every pattern its k-nearest neighbours are found and a weighted average of the neighbours is taken to form a synthesized pattern. It is found that the classification accuracy of minority class increases in the presence of synthesized patterns. However, finding nearest neighbours in high-dimensional datasets is challenging. Hence, we make use of Latent Dirichlet Allocation to reduce the dimensionality of the dataset. An extensive experimental evaluation carried out on 25 real-world imbalanced datasets shows that pre-processing of data using NPS is effective and has a greater impact on the classification accuracy over minority class for imbalanced learning. We also observed that NPS outperforms the state-of-the-art methods for imbalanced classification. Experiments on 9 real-world datasets with outliers, demonstrate that NPS approach not only substantially increases the detection performance, but is also relatively scalable in large datasets in comparison to the state-of-the-art outlier detection methods.