Distance Courseware Discrimination Based on Representative Sentence Assaying
DASFAA '01 Proceedings of the 7th International Conference on Database Systems for Advanced Applications
Efficient Genetic Algorithm Based Data Mining Using Feature Selection with Hausdorff Distance
Information Technology and Management
Context modeling and discovery using vector space bases
Proceedings of the 14th ACM international conference on Information and knowledge management
An improved simulated annealing algorithm for the maximum independent set problem
ICIC'06 Proceedings of the 2006 international conference on Intelligent Computing - Volume Part I
LRD: latent relation discovery for vector space expansion and information retrieval
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Representative term based feature selection method for SVM based document classification
KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part I
Hi-index | 0.00 |
Clustering and classification are two important techniques of mining Web information. In this paper, a new adaptive method of mining Chinese documents from the internet is proposed. First, we give an algorithm of clustering documents which combines Genetic Algorithm(GA) and Simulated Annealing(SA) based on Boolean Model. This Algorithm avoids the disadvantage of clustering documents by using pure GA which can not be utilized accurately since GA converges too early and bogs the local optimum. Then, considering that the effect of classification with traditional Vector Space Model(VSM) is not satisfying enough since it is not related to the grades of importance of words, we add the position-factors of key words into VSM and set up a new classifier model to classify Chinese Web documents. Experimental results indicate that this adaptive method can make the process of clustering and classification more accurate and reasonable comparing to the methods which does not have the positions of words considered.