A note on genetic algorithms for large-scale feature selection
Pattern Recognition Letters
Symbolic clustering using a new dissimilarity measure
Pattern Recognition
Machine Learning
Feature Selection: Evaluation, Application, and Small Sample Performance
IEEE Transactions on Pattern Analysis and Machine Intelligence
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
ACM Computing Surveys (CSUR)
Data Mining: An Overview from a Database Perspective
IEEE Transactions on Knowledge and Data Engineering
A divisive information theoretic feature clustering algorithm for text classification
The Journal of Machine Learning Research
Expert Systems with Applications: An International Journal
Breast cancer diagnosis using least square support vector machine
Digital Signal Processing
Support vector machines combined with feature selection for breast cancer diagnosis
Expert Systems with Applications: An International Journal
Data clustering: 50 years beyond K-means
Pattern Recognition Letters
SVM classifier based feature selection using GA, ACO and PSO for siRNA design
ICSI'10 Proceedings of the First international conference on Advances in Swarm Intelligence - Volume Part II
Survey of clustering algorithms
IEEE Transactions on Neural Networks
Hi-index | 12.05 |
With the development of clinical technologies, different tumor features have been collected for breast cancer diagnosis. Filtering all the pertinent feature information to support the clinical disease diagnosis is a challenging and time consuming task. The objective of this research is to diagnose breast cancer based on the extracted tumor features. Feature extraction and selection are critical to the quality of classifiers founded through data mining methods. To extract useful information and diagnose the tumor, a hybrid of K-means and support vector machine (K-SVM) algorithms is developed. The K-means algorithm is utilized to recognize the hidden patterns of the benign and malignant tumors separately. The membership of each tumor to these patterns is calculated and treated as a new feature in the training model. Then, a support vector machine (SVM) is used to obtain the new classifier to differentiate the incoming tumors. Based on 10-fold cross validation, the proposed methodology improves the accuracy to 97.38%, when tested on the Wisconsin Diagnostic Breast Cancer (WDBC) data set from the University of California - Irvine machine learning repository. Six abstract tumor features are extracted from the 32 original features for the training phase. The results not only illustrate the capability of the proposed approach on breast cancer diagnosis, but also shows time savings during the training phase. Physicians can also benefit from the mined abstract tumor features by better understanding the properties of different types of tumors.