Intelligent fusion of structural and citation-based evidence for text classification

  • Authors:
  • Baoping Zhang;Yuxin Chen;Weiguo Fan;Edward A. Fox;Marcos Andrép Gonçalves;Marco Cristo;Pàvel Calado

  • Affiliations:
  • Virginia Tech, Blacksburg, VA;Virginia Tech, Blacksburg, VA;Virginia Tech, Blacksburg, VA;Virginia Tech, Blacksburg, VA;Federal University of Minas Gerais, Belo Horizonte, MG, Brazil;Federal University of Minas Gerais, Belo Horizonte, MG, Brazil;IST/INESC-ID, Lisbon, Portugal

  • Venue:
  • Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper shows how different measures of similarity derived from the citation information and the structural content (e.g., title, abstract) of the collection can be fused to improve classification effectiveness. To discover the best fusion framework, we apply Genetic Programming (GP) techniques. Our experiments with the ACM Computing Classification Scheme, using documents from the ACM Digital Library, indicate that GP can discover similarity functions superior to those based solely on a single type of evidence. Effectiveness of the similarity functions discovered through simple majority voting is better than that of content-based as well as combination-based Support Vector Machine classifiers. Experiments also were conducted to compare the performance between GP techniques and other fusion techniques such as Genetic Algorithms (GA) and linear fusion. Empirical results show that GP was able to discover better similarity functions than other fusion techniques.