Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Weighted Graph Cuts without Eigenvectors A Multilevel Approach
IEEE Transactions on Pattern Analysis and Machine Intelligence
Improving text classification accuracy using topic modeling over an additional corpus
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Using Wikipedia knowledge to improve text classification
Knowledge and Information Systems
Neural Computing and Applications
Hi-index | 0.00 |
We propose an unsupervised feature generation algorithm using the repositories of human knowledge for effective text categorization. Conventional bag of words (BOW) depends on the presence / absence of keywords to classify the documents. To understand the actual context behind these keywords, we use knowledge concepts / hyperlinks from external knowledge sources through content and structure mining on Wikipedia. Then, the features of knowledge concepts are clustered to generate knowledge cluster vectors with which the input text documents are mapped into a high dimensional feature space and the classification is performed. The simulation results show that the proposed approach identifies associated features in the text collection and yields an improved classification accuracy.