Accommodating Individual Preferences in the Categorization of Documents: A Personalized Clustering Approach

  • Authors:
  • Chih-Ping Wei;Roger Chiang;Chia-Chen Wu

  • Affiliations:
  • Institute of Technology Management, National Tsing Hua University, Taiwan;Information Systems, College of Business, University of Cincinnati;Manufacturing Planning Division, United Microelectronics Corporation, Taiwan

  • Venue:
  • Journal of Management Information Systems
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

As electronic commerce and knowledge economy environments proliferate, both individuals and organizations increasingly generate and consume large amounts of online information, typically available as textual documents. To manage this ever-increasing volume of documents, individuals and organizations frequently organize their documents into categories that facilitate document management and subsequent access and browsing. Document clustering is an intentional act that should reflect individual preferences with regard to the semantic coherency and relevant categorization of documents. Hence, effective document clustering must consider individual preferences and needs to support personalization in document categorization. In this paper, we present an automatic document-clustering approach that incorporates an individual's partial clustering as preferential information. Combining two document representation methods, feature refinement and feature weighting, with two clustering methods, precluster-based hierarchical agglomerative clustering (HAC) and atomic-based HAC, we establish four personalized document-clustering techniques. Using a traditional content-based document-clustering technique as a performance benchmark, we find that the proposed personalized document-clustering techniques improve clustering effectiveness, as measured by cluster precision and cluster recall.