Conceptual representing of documents and query expansion based on ontology
WISM'12 Proceedings of the 2012 international conference on Web Information Systems and Mining
Automatic Item Weight Generation for Pattern Mining and its Application
International Journal of Data Warehousing and Mining
Improved categorical distribution difference feature selection for Chinese document categorization
Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
Hi-index | 0.00 |
TFIDF is a kind of common methods used to select the text feature, but it has many disadvantages. First, the method undervalues that this term can represent the characteristic of the documents of this class if it only frequently appears in the documents belongs to the same class while infrequently in the documents of the other class. Second TFIDF neglects the relations between the feature and the class. The paper proposed the improved TFIDF strategy, and combined with the text classification method of simple distance vector to compare to traditional TFIDF, and obtained the very good classified effect, the experiment proved its feasibility.