Hierarchical training of multiple SVMs for personalized web filtering

  • Authors:
  • Maike Erdmann;Duc Dung Nguyen;Tomoya Takeyoshi;Gen Hattori;Kazunori Matsumoto;Chihiro Ono

  • Affiliations:
  • KDDI R&D Laboratories, Saitama, Japan;Vietnam Academy of Science and Technology, Hanoi, Vietnam;KDDI R&D Laboratories, Saitama, Japan;KDDI R&D Laboratories, Saitama, Japan;KDDI R&D Laboratories, Saitama, Japan;KDDI R&D Laboratories, Saitama, Japan

  • Venue:
  • PRICAI'12 Proceedings of the 12th Pacific Rim international conference on Trends in Artificial Intelligence
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The abundance of information published on the Internet makes filtering of hazardous Web pages a difficult yet important task. Supervised learning methods such as Support Vector Machines can be used to identify hazardous Web content. However, scalability is a big challenge, especially if we have to train multiple classifiers, since different policies exist on what kind of information is hazardous. We therefore propose a transfer learning approach called Hierarchical Training for Multiple SVMs. HTMSVM identifies common data among similar training sets and trains the common data sets first, in order to obtain initial solutions. These initial solutions then reduce the time for training the individual training sets without influencing classification accuracy. In an experiment, in which we trained five Web content filters with 80% of common and 20% of inconsistently labeled training examples, HTMSVM was able to predict hazardous Web pages with a training time of only 26% to 41% compared to LibSVM, but the same classification accuracy (more than 91%).