Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Document representation using global association distance model
ECIR'07 Proceedings of the 29th European conference on IR research
Formal distance vs. association strength in text processing
CIARP'07 Proceedings of the Congress on pattern recognition 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications
Similarity measures in documents using association graphs
CIARP'05 Proceedings of the 10th Iberoamerican Congress conference on Progress in Pattern Recognition, Image Analysis and Applications
Hi-index | 0.00 |
In the information retrieval and document categorization context, both Euclidean distance- and cosine-based similarity models are based on the assumption that term vectors are orthogonal. But this assumption is not true. Term associations are ignored in such similarity models. This paper analyzes the properties of term-document space, term-category space and category-document space. Then, without the assumption of term independence, we propose a new mathematical model to estimate the association between terms and define a 驴-similarity model of documents. Here we make best use of existing category membership represented by corpus as much as possible, and the objective is to improve categorization performance. The empirical results been obtained by k-NN classifier over Reuters-21578 corpus show that utilization of term association can improve the effectiveness of categorization system and 驴-similarity model outperforms than ones without term association.