Research of fast SOM clustering for text information

Authors:
Yuan-Chao Liu;Chong Wu;Ming Liu
Affiliations:
Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China;Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China;Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
Venue:
Expert Systems with Applications: An International Journal
Year:
2011

Citing 40
Cited 6

Data mining methods for knowledge discovery

Data mining methods for knowledge discovery
Grouper: a dynamic clustering interface to Web search results

WWW '99 Proceedings of the eighth international conference on World Wide Web
An investigation of linguistic features and clustering algorithms for topical document clustering

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A vector space model for automatic indexing

Communications of the ACM
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Integrating contextual information to enhance SOM-based text document clustering

Neural Networks - New developments in self-organizing maps
Streaming-Data Algorithms for High-Quality Clustering

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Efficient Phrase-Based Document Indexing for Web Document Clustering

IEEE Transactions on Knowledge and Data Engineering
Word classification and hierarchy using co-occurrence word information

Information Processing and Management: an International Journal
The BankSearch web document dataset: investigating unsupervised clustering and category similarity

Journal of Network and Computer Applications - Special issue on computational intelligence on the internet
2005 Special Issue: Efficient streaming text clustering

Neural Networks - 2005 Special issue: IJCNN 2005
Incremental hierarchical clustering of text documents

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
An Efficient Clustering Algorithm for Small Text Documents

WAIMW '06 Proceedings of the Seventh International Conference on Web-Age Information Management Workshops
Gradual model generator for single-pass clustering

Pattern Recognition
Finding biclusters by random projections

Theoretical Computer Science
A fuzzy clustering approach for finding similar documents using a novel similarity measure

Expert Systems with Applications: An International Journal
Inference and evaluation of the multinomial mixture model for text clustering

Information Processing and Management: an International Journal
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Detecting change in data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A framework for projected clustering of high dimensional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A heuristic-based fuzzy co-clustering algorithm for categorization of high-dimensional data

Fuzzy Sets and Systems
Structure clustering for Chinese patent documents

Expert Systems with Applications: An International Journal
A new approach on search for similar documents with multiple categories using fuzzy clustering

Expert Systems with Applications: An International Journal
A Latent Semantic Indexing-based approach to multilingual document clustering

Decision Support Systems
Incremental clustering of mixed data based on distance hierarchy

Expert Systems with Applications: An International Journal
Construction of supervised and unsupervised learning systems for multilingual text categorization

Expert Systems with Applications: An International Journal
An attentive self-organizing neural model for text mining

Expert Systems with Applications: An International Journal
Clustering of document collection - A weighting approach

Expert Systems with Applications: An International Journal
Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures

Expert Systems with Applications: An International Journal
Using the self organizing map for clustering of text documents

Expert Systems with Applications: An International Journal
Fuzzy ensemble clustering based on random projections for DNA microarray data analysis

Artificial Intelligence in Medicine
The Fuzzy ART algorithm: A categorization method for supplier evaluation and selection

Expert Systems with Applications: An International Journal
Modeling user multiple interests by an improved GCS approach

Expert Systems with Applications: An International Journal
Studying the feasibility of a recommender in a citizen web portal based on user modeling and clustering algorithms

Expert Systems with Applications: An International Journal
A framework for understanding Latent Semantic Indexing (LSI) performance

Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
Clustering high-dimensional data using growing SOM

ISNN'05 Proceedings of the Second international conference on Advances in neural networks - Volume Part II
Incremental clustering of newsgroup articles

IEA/AIE'06 Proceedings of the 19th international conference on Advances in Applied Artificial Intelligence: industrial, Engineering and Other Applications of Applied Intelligent Systems
Comparing dimension reduction techniques for document clustering

AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence
Self organization of a massive document collection

IEEE Transactions on Neural Networks
The growing hierarchical self-organizing map: exploratory analysis of high-dimensional data

IEEE Transactions on Neural Networks

Fast growing self organizing map for text clustering

ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II
ICE - Intelligent Clustering Engine: A clustering gadget for Google Desktop

Expert Systems with Applications: An International Journal
A new approach for data clustering and visualization using self-organizing maps

Expert Systems with Applications: An International Journal
Supervised kernel self-organizing map

IScIDE'12 Proceedings of the third Sino-foreign-interchange conference on Intelligent Science and Intelligent Data Engineering
Probability-based text clustering algorithm by alternately repeating two operations

Journal of Information Science
A research case study: Difficulties and recommendations when using a textual data mining tool

Information and Management

Quantified Score

Hi-index	12.05

Visualization

Abstract

The state-of-the-art text clustering methods suffer from the huge size of documents with high-dimensional features. In this paper, we studied fast SOM clustering technology for Text Information. Our focus is on how to enhance the efficiency of text clustering system whereas high clustering qualities are also kept. To achieve this goal, we separate the system into two stages: offline and online. In order to make text clustering system more efficient, feature extraction and semantic quantization are done offline. Although neurons are represented as numerical vectors in high-dimension space, documents are represented as collections of some important keywords, which is different from many related works, thus the requirement for both time and space in the offline stage can be alleviated. Based on this scenario, fast clustering techniques for online stage are proposed including how to project documents onto output layers in SOM, fast similarity computation method and the scheme of Incremental clustering technology for real-time processing, We tested the system using different datasets, the practical performance demonstrate that our approach has been shown to be much superior in clustering efficiency whereas the clustering quality are comparable to traditional methods.