Supporting user-subjective categorization with self-organizing maps and learning vector quantization

Authors:
Dina Goren-Bar;Tsvi Kuflik
Affiliations:
Department of Information Systems Engineering, Ben-Gurion University of the Negev, P.O. Box 653, Beer-Sheva, 84104, Israel;Department of Information Systems Engineering, Ben-Gurion University of the Negev, P.O. Box 653, Beer-Sheva, 84104, Israel
Venue:
Journal of the American Society for Information Science and Technology
Year:
2005

Citing 20
Cited 4

Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
Fundamentals of neural networks: architectures, algorithms, and applications

Fundamentals of neural networks: architectures, algorithms, and applications
Self-organizing maps

Self-organizing maps
SONIA: a service for organizing networked information autonomously

Proceedings of the third ACM conference on Digital libraries
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
MailCat: an intelligent assistant for organizing e-mail

Proceedings of the third annual conference on Autonomous Agents
Data clustering: a review

ACM Computing Surveys (CSUR)
Document clustering for electronic meetings: an experimental comparison of two techniques

Decision Support Systems - From information retrieval to knowledge management: enabling technologies and best practices
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Neural Networks

Neural Networks
Information Retrieval

Information Retrieval
Machine Learning

Machine Learning
Modern Information Retrieval

Modern Information Retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Guest Editors' Introduction to the Special Issue on Automated Text Categorization

Journal of Intelligent Information Systems
Guest Editors' Introduction: Building Large-Scale Digital Libraries

Computer
Automatic Text Categorization and Its Application to Text Retrieval

IEEE Transactions on Knowledge and Data Engineering
Automatic discovery of similarity relationships through Web mining

Decision Support Systems - Web retrieval and mining
Report on the workshop on Operational Text Classification Systems (OTC-02)

ACM SIGIR Forum

Filtering search results using an optimal set of terms identified by an artificial neural network

Information Processing and Management: an International Journal
A large dataset for the evaluation of ontology matching

The Knowledge Engineering Review
Filtering search results using an optimal set of terms identified by an artificial neural network

Information Processing and Management: an International Journal
A large scale taxonomy mapping evaluation

ISWC'05 Proceedings of the 4th international conference on The Semantic Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Today, most document categorization in organizations is done manually. We save at work hundreds of files and e-mail messages in folders every day. While automatic document categorization has been widely studied, much challenging research still remains to support user-subjective categorization. This study evaluates and compares the application of self-organizing maps (SOMs) and learning vector quantization (LVQ) with automatic document classification, using a set of documents from an organization, in a specific domain, manually classified by a domain expert. After running the SOM and LVQ we requested the user to reclassify documents that were misclassified by the system. Results show that despite the subjective nature of human categorization, automatic document categorization methods correlate well with subjective, personal categorization, and the LVQ method outperforms the SOM. The reclassification process revealed an interesting pattern: About 40% of the documents were classified according to their original categorization, about 35% according to the system's categorization (the users changed the original categorization), and the remainder received a different (new) categorization. Based on these results we conclude that automatic support for subjective categorization is feasible; however, an exact match is probably impossible due to the users' changing categorization behavior.