Information storage and retrieval
Information storage and retrieval
Self-organizing maps
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Modern Information Retrieval
Self organization of a massive document collection
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
A variant of the self-organizing maps algorithm is proposed in this paper for document organization and retrieval. Bigrams are used to encode the available documents and signed ranks are assigned to these bigrams according to their frequencies. A novel metric which is based on the Wilcoxon signed-rank test exploits these ranks in assessing the contextual similarity between documents. This metric replaces the Euclidean distance employed by the self-organizing maps algorithm in identifying the winner neuron. Experiments performed using both algorithms demonstrates a superior performance of the proposed variant against the self-organizing map algorithm regarding the average recallprecision curves.