Knowledge acquisition from many-attribute data by genetic programming with clustered terminal symbols

Authors:
Akira Hara;Haruko Tanaka;Takumi Ichimura;Tetsuyuki Takahama
Affiliations:
Graduate School of Information Sciences, Hiroshima City University, 3-4-1, Ozuka-higashi, Asaminami-ku, Hiroshima 731-3194, Japan.;Faculty of Information Sciences, Hiroshima City University, 3-4-1, Ozuka-higashi, Asaminami-ku, Hiroshima 731-3194, Japan.;Faculty of Management and Information Systems, Prefectural University of Hiroshima, 1-1-71, Ujina-higashi, Minami-ku, Hiroshima 731-8558, Japan.;Graduate School of Information Sciences, Hiroshima City University, 3-4-1, Ozuka-higashi, Asaminami-ku, Hiroshima 731-3194, Japan
Venue:
International Journal of Knowledge and Web Intelligence
Year:
2012

Citing 10
Cited 0

Genetic programming: on the programming of computers by means of natural selection

Genetic programming: on the programming of computers by means of natural selection
FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Recommender systems

Communications of the ACM
Solving the multiple instance problem with axis-parallel rectangles

Artificial Intelligence
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Data Mining and Knowledge Discovery with Evolutionary Algorithms

Data Mining and Knowledge Discovery with Evolutionary Algorithms
Link spam alliances

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Detecting spam web pages through content analysis

Proceedings of the 15th international conference on World Wide Web
Combating web spam with trustrank

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
LETOR: A benchmark collection for research on learning to rank for information retrieval

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Rule extraction from database by soft computing methods is important for knowledge acquisition. For example, knowledge from the web pages can be useful for information retrieval. When genetic programming (GP) is applied to rule extraction from a database, the attributes of data are often used for the terminal symbols. However, the real databases have a large number of attributes. Therefore, the size of the terminal set increases and the search space becomes vast. For improving the search performance, we propose new methods for dealing with the large-scale terminal set. In the methods, the terminal symbols are clustered based on the similarities of the attributes. In the beginning of search, by using the clusters for terminals instead of original attributes, the number of terminal symbols can be reduced. Therefore, the search space can be reduced. In the latter stage of search, by using the original attributes for terminal symbols, the local search is performed. We applied our proposed methods to two many-attribute datasets, the classification of molecules as a benchmark problem and the page rank learning for information retrieval. By comparison with the conventional GP, the proposed methods showed the faster evolutional speed and extracted more accurate rules.