Incremental mining of information interest for personalized web scanning

  • Authors:
  • Rey-Long Liu;Wan-Jung Lin

  • Affiliations:
  • Department of Information Management, Chung Hua University, No. 707, Sec. 2, Wufu Road, HsinChu, Taiwan 300, Republic of China;Department of Information Management, Chung Hua University, No. 707, Sec. 2, Wufu Road, HsinChu, Taiwan 300, Republic of China

  • Venue:
  • Information Systems
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Businesses and people often organize their information of interest (IOI) into a hierarchy of folders (or categories). The personalized folder hierarchy provides a natural way for each of the users to manage and utilize his/her IOI (a folder corresponds to an interest type). Since the interest is relatively long-term, continuous web scanning is essential. It should be directed by precise and comprehensible specifications of the interest. A precise specification may direct the scanner to those spaces that deserve scanning, while a specification comprehensible to the user may facilitate manual refinement, and a specification comprehensible to information providers (e.g. Internet search engines) may facilitate the identification of proper seed sites to start scanning. However, expressing such specifications is quite difficult (and even implausible) for the user, since each interest type is often implicitly and collectively defined by the content (i.e. documents) of the corresponding folder, which may even evolve over time. In this paper, we present an incremental text mining technique to efficiently identify the user's current interest by mining the user's information folders. The specification mined for each interest type specifies the context of the interest type in conjunctive normal form, which is comprehensible to general users and information providers. The specification is also shown to be more precise in directing the scanner to those sites that are more likely to provide IOI. The user may thus maintain his/her folders and then constantly get IOI, without paying much attention to the difficult tasks of interest specification and seed identification.