Automatic modeling of user's real world activities from the web for semantic IR

  • Authors:
  • Yusuke Fukazawa;Jun Ota

  • Affiliations:
  • NTT DOCOMO, Inc., Hikari-no-oka, Yokosuka, Kanagawa, Japan;The University of Tokyo, Kashiwanoha, Kashiwa, Chiba, Japan

  • Venue:
  • Proceedings of the 3rd International Semantic Search Workshop
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We have been developing a task-based service navigation system that offers to the user services relevant to the task the user wants to perform. The system allows the user to concretize his/her request in the task-model developed by human-experts. In this study, to reduce the cost of collecting a wide variety of activities, we investigate the automatic modeling of users' real world activities from the web. To extract the widest possible variety of activities with high precision and recall, we investigate the appropriate number of contents and resources to extract. Our results show that we do not need to examine the entire web, which is too time consuming; a limited number of search results (e.g. 900 from among 21,000,000 search results) from blog contents are needed. In addition, to estimate the hierarchical relationships present in the activity model with the lowest possible error rate, we propose a method that divides the representation of activities into a noun part and a verb part, and calculates the mutual information between them. The result shows almost 80% of the hierarchical relationships can be captured by the proposed method.