Hierarchical user interest modeling for Chinese web pages

  • Authors:
  • Shiying Li;Gongqing Wu;Xuegang Hu

  • Affiliations:
  • Hefei University of Technology, Hefei, China;Hefei University of Technology, Hefei, China;Hefei University of Technology, Hefei, China

  • Venue:
  • Proceedings of the Third International Conference on Internet Multimedia Computing and Service
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

User interest modeling is the core of personalized services. It is applied in the fields of information retrieval, data mining, e-commerce and personalized recommendation to improve the quality of information services. Most of traditional user interest models are built on VSM using keywords as the user interest. However, these models not only ignore the hierarchical granularity relations between keywords, but also ignore the use of domain knowledge hidden the specific concepts of users or the topics of interests. Thus, it is difficult to express the user interests accurately and reasonably in the user interest modeling. Motivated by this, we propose a Graph-based Chinese Phrases Hierarchical Clustering algorithm called GCPHC. It organizes the user interest in a hierarchy tree structure, designs the HowNet-based Maximum Matching Mapping method called HNM3 to map the user interest to topics of ODP, and builds a hierarchical user interest model labeled with the topic for each cluster. To achieve the optimal performance of our algorithm, we take into account of five correlation functions (including AEMI, AEMI3, IT, PS and Support) used in our GCPHC algorithm in cases varying with the data scale and the POS (part of speech). Extensive experimental studies demonstrate that our algorithm with the correlation function AEMI performs as well as that with AEMI3, and outperforms others in the cases with the data scale varying from 20 documents to 30 documents and nouns as terms. In these cases, the average RGC (Rate of Good Clusters) in our algorithm with the correlation function AEMI amounts to 74.7%, which is superior to our algorithm with other correlation functions.