Domain-specific website recognition using hybrid vector space model

  • Authors:
  • Baoli Dong;Guoning Qi;Xinjian Gu

  • Affiliations:
  • College of Mechanical and Energy Engineering, Zhejiang University, Hangzhou, P.R. China;College of Mechanical and Energy Engineering, Zhejiang University, Hangzhou, P.R. China;College of Mechanical and Energy Engineering, Zhejiang University, Hangzhou, P.R. China

  • Venue:
  • WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Domain-specific website recognition is a key issue for specific web resources available. The same topic websites are similar in the content structures and textual contents. According to vector space model, hybrid vector space model about website topic was proposed. This model exploited text feature instead of tree and graph ways to represent the website link structure. Its vector elements integrated text information about website content and structure characteristics extracted from relevant web pages. The topic of a website was identified through the centroid-based classification algorithm. The experiments of manufacturing-topic website recognition were implemented to verify the performances of this method. The results indicate that this model is suited to feature description of topic-specific websites. Moreover, it has good applicability of website classification on the Web.