Instance-Based Learning Algorithms
Machine Learning
Machine Learning - Special issue on learning with probabilistic representations
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
Using Model Trees for Classification
Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Theoretical frameworks for data mining
ACM SIGKDD Explorations Newsletter
Asymptotic behaviors of support vector machines with Gaussian kernel
Neural Computation
Mining Customer Value: From Association Rules to Direct Marketing
Data Mining and Knowledge Discovery
Improvements to Platt's SMO Algorithm for SVM Classifier Design
Neural Computation
The Influence of Class Imbalance on Cost-Sensitive Learning: An Empirical Study
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
A Data Complexity Analysis on Imbalanced Datasets and an Alternative Imbalance Recovering Strategy
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Classification of multi class dataset using wavelet power spectrum
Data Mining and Knowledge Discovery
Facing Imbalanced Classes through Aggregation of Classifiers
ICIAP '07 Proceedings of the 14th International Conference on Image Analysis and Processing
Top 10 algorithms in data mining
Knowledge and Information Systems
Mining functional dependencies from data
Data Mining and Knowledge Discovery
cAnt-Miner: An Ant Colony Classification Algorithm to Cope with Continuous Attributes
ANTS '08 Proceedings of the 6th international conference on Ant Colony Optimization and Swarm Intelligence
Filling in the Blanks - Krimp Minimisation for Missing Data
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
IEEE Transactions on Knowledge and Data Engineering
Building a Decision Cluster Forest Model to Classify High Dimensional Data with Multi-classes
ACML '09 Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning
Knowledge Discovery with Support Vector Machines
Knowledge Discovery with Support Vector Machines
DSDE '10 Proceedings of the 2010 International Conference on Data Storage and Data Engineering
Domain-Driven Data Mining: Challenges and Prospects
IEEE Transactions on Knowledge and Data Engineering
An empirical study of classification algorithm evaluation for financial risk prediction
Applied Soft Computing
IEEE Transactions on Pattern Analysis and Machine Intelligence
A Comparison of SVM Kernel Functions for Breast Cancer Detection
CGIV '11 Proceedings of the 2011 Eighth International Conference Computer Graphics, Imaging and Visualization
CLIM: Closed Inclusion Dependency Mining in Databases
ICDMW '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining Workshops
IEEE Internet Computing
Effect of SVM kernel functions on classification of vibration signals of a single point cutting tool
Expert Systems with Applications: An International Journal
Annotating mobile phone location data with activity purposes using machine learning algorithms
Expert Systems with Applications: An International Journal
Hi-index | 12.05 |
As the need to analyze big data sets grows dramatically, the role that classification algorithms play in data mining techniques also increases. Big data analysis requires more of the data sets' characteristics to be included, such as data structure, variety of sources, and the rate of update frequency. In this paper, we evaluate scenarios that examine which data set characteristics most affect the classification algorithms' performance. It is still a complex issue to determine which algorithm is how strong or how weak in relation to which data set. Thus, our research experimentally examines how data set characteristics affect algorithm performance, both in terms of accuracy and in elapsed time. To do so, we use a multiple regression method to evaluate the causality between data set characteristics as independent variables, and performance metrics as dependent variables. We also examine the role that classification algorithms play as moderator in this causality. All benchmark data sets in a UCI database are used that are fit to run the classification algorithm. Based on the results of the experiment, we discuss the requirements of legacy classification algorithms to address big data analysis in a new business intelligence era.