Document-Base Extraction for Single-Label Text Classification
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
A Hybrid Statistical Data Pre-processing Approach for Language-Independent Text Classification
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Hybrid DIAAF/RS: statistical textual feature selection for language-independent text classification
ICDM'10 Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects
Hi-index | 0.00 |
Since 1990's, the exponential growth of theseWeb documents has led to a great deal of interestin developing efficient tools and software toassist users in finding relevant information. Textclassification has been proved to be useful inhelping organize and search text information onthe Web. Although there have been existed anumber of text classification algorithms, most ofthem are either inefficient or too complex. In thispaper we present two Odds-Radio-Based textclassification algorithms, which are called ORand TF*OR respectively. We have evaluated ouralgorithm on two text collections and compared itagainst k-NN and SVM. Experimental resultsshow that OR and TF*OR are competitive withk-NN and SVM. Furthermore, OR and TF*OR ismuch simpler and faster than them. The resultsalso indicate that it is not TF but relevancefactors derived from Odds Radio that play thedecisive role in document categorization.