Preserving User Preferences in Automated Document-Category Management: An Evolution-Based Approach

Authors:
Chih-Ping Wei;Paul Hu;Yen-Hsien Lee
Affiliations:
Institute of Service Science, National Tsing Hua University, Taiwan;David Eccles School of Business, University of Utah;National Chiayi University, Taiwan
Venue:
Journal of Management Information Systems
Year:
2009

Citing 28
Cited 1

Implementing agglomerative hierarchic clustering algorithms for use in document retrieval

Information Processing and Management: an International Journal
Diversity in the use of electronic mail: a preliminary inquiry

ACM Transactions on Information Systems (TOIS)
Self-organization and associative memory: 3rd edition

Self-organization and associative memory: 3rd edition
Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Some advances in transformation-based part of speech tagging

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Context as a factor in personal information management systems

Journal of the American Society for Information Science
Siteseer: personalized navigation for the Web

Communications of the ACM
Hierarchic document classification using Ward's clustering method

Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
Self-organizing maps

Self-organizing maps
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data clustering: a review

ACM Computing Surveys (CSUR)
Document clustering for electronic meetings: an experimental comparison of two techniques

Decision Support Systems - From information retrieval to knowledge management: enabling technologies and best practices
Partitioning-based clustering for Web document categorization

Decision Support Systems - Special issue on WITS '97
Document clustering with committees

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Document organization using Kohonen's algorithm

Information Processing and Management: an International Journal
Working Knowledge: How Organizations Manage What They Know

Working Knowledge: How Organizations Manage What They Know
Design and Evaluation of a Knowledge Management System

IEEE Software
Using AI in Knowledge Management: Knowledge Bases and Ontologies

IEEE Intelligent Systems
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Reducing Cognitive Load

HICSS '04 Proceedings of the Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS'04) - Track 5 - Volume 5
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Event detection from online news documents for supporting environmental scanning

Decision Support Systems - Special issue: Knowledge management technique
Taxonomy generation for text segments: A practical web-based approach

ACM Transactions on Information Systems (TOIS)
Verifying the proximity and size hypothesis for self-organizing maps

Journal of Management Information Systems - Special section: Exploring the outlands of the MIS discipline
Accommodating Individual Preferences in the Categorization of Documents: A Personalized Clustering Approach

Journal of Management Information Systems
Generating and Browsing Multiple Taxonomies Over a Document Collection

Journal of Management Information Systems
A collaborative filtering-based approach to personalized document clustering

Decision Support Systems

Concept comparison engines: A new frontier of search

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Analysis of prevalent document management practices shows the popular use of categories (e.g., folders) to organize documents for subsequent searches and retrievals. The coherence and distinction of an existing document category can diminish considerably as influxes of new documents arrive over time. The complexity of and effort requirements for document-category management favor an automated approach that can be supported by appropriate document-clustering techniques. A review of the extant literature shows a predominant focus on document content analysis in automated document-category management, which cannot preserve the user's document-grouping preferences. This research develops two advanced evolution-based techniques for preserving user preferences in their management of document categories. The first technique (CE2), which supports the automated evolution of a set of flat (i.e., nonhierarchical) document categories, extends a promising evolution-based technique (category evolution, CE) by addressing its fundamental limitations inherent to the use of holistic measures. The second technique, category hierarchy evolution (CHE), is developed on the basis of CE2 to support scenarios where document categories are organized with a hierarchical structure. Empirical evaluations of the effectiveness of each technique in various category evolution scenarios created using two different document corpora (i.e., news documents from Reuters and research articles from the ACM digital library), as compared with those of associated salient techniques for benchmark purposes, show that CE2 and CHE outperform their respective benchmark techniques. Their performance is reasonably robust and appears more effective when the quality (coherence) of the previously created categories does not deteriorate excessively. According to our results, the evolution-based approach is viable, appealing, and capable of preserving user preferences in automatic reorganizations of document categories.