Probabilistic and genetic algorithms in document retrieval
Communications of the ACM
Adaptation in natural and artificial systems
Adaptation in natural and artificial systems
Genetic programming: on the programming of computers by means of natural selection
Genetic programming: on the programming of computers by means of natural selection
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Autonomous document classification for business
AGENTS '97 Proceedings of the first international conference on Autonomous agents
Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Finding related pages in the World Wide Web
WWW '99 Proceedings of the eighth international conference on World Wide Web
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Genetic programming based pattern classification with feature space partitioning
Information Sciences: an International Journal
Foundations of genetic programming
Foundations of genetic programming
Using web structure for classifying and describing web pages
Proceedings of the 11th international conference on World Wide Web
Genetic Programming and Data Structures: Genetic Programming + Data Structures = Automatic Programming!
Web classification using support vector machine
Proceedings of the 4th international workshop on Web information and data management
A Study of Approaches to Hypertext Categorization
Journal of Intelligent Information Systems
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Composite Kernels for Hypertext Categorisation
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Combining link-based and content-based methods for web document classification
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Ranking Function Optimization for Effective Web Search by Genetic Programming: An Empirical Study
HICSS '04 Proceedings of the Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS'04) - Track 4 - Volume 4
IEEE Transactions on Knowledge and Data Engineering
A multistrategy approach for digital text categorization from imbalanced documents
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Journal of the American Society for Information Science and Technology
Data classification using genetic parallel programming
GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartII
When are links useful? experiments in text classification
ECIR'03 Proceedings of the 25th European conference on IR research
Application of genetic programming for multicategory patternclassification
IEEE Transactions on Evolutionary Computation
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Multi-evidence, multi-criteria, lazy associative document classification
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Classifying documents with link-based bibliometric measures
Information Retrieval
Hi-index | 0.00 |
This paper shows how citation-based information and structural content (e.g., title, abstract) can be combined to improve classification of text documents into predefined categories. We evaluate different measures of similarity -- five derived from the citation information of the collection, and three derived from the structural content -- and determine how they can be fused to improve classification effectiveness. To discover the best fusion framework, we apply Genetic Programming (GP) techniques. Our experiments with the ACM Computing Classification Scheme, using documents from the ACM Digital Library, indicate that GP can discover similarity functions superior to those based solely on a single type of evidence. Effectiveness of the similarity functions discovered through simple majority voting is better than that of content-based as well as combination-based Support Vector Machine classifiers. Experiments also were conducted to compare the performance between GP techniques and other fusion techniques such as Genetic Algorithms (GA) and linear fusion. Empirical results show that GP was able to discover better similarity functions than GA or other fusion techniques.