Some inconsistencies and misnomers in probabilistic information retrieval
SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to classify text from labeled and unlabeled documents
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Visualizing the simple Baysian classifier
Information visualization in data mining and knowledge discovery
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Information Retrieval
Models in information retrieval
Lectures on information retrieval
Text Categorization Based on Regularized Linear Classification Methods
Information Retrieval
Information Visualization and Visual Data Mining
IEEE Transactions on Visualization and Computer Graphics
Visualization Techniques for Mining Large Databases: A Comparison
IEEE Transactions on Knowledge and Data Engineering
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Interactive Visualization and Navigation in Large Data Collections using the Hyperbolic Space
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Ontological user profiling in recommender systems
ACM Transactions on Information Systems (TOIS)
Text categorization for a comprehensive time-dependent benchmark
Information Processing and Management: an International Journal
In Defense of One-Vs-All Classification
The Journal of Machine Learning Research
Feature selection using linear classifier weights: interaction with classification models
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Nomograms for visualization of naive Bayesian classifier
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
An initial evaluation of automated organization for digital library browsing
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Visual explanation of evidence in additive classifiers
IAAI'06 Proceedings of the 18th conference on Innovative applications of artificial intelligence - Volume 2
From visual data exploration to visual data mining: a survey
IEEE Transactions on Visualization and Computer Graphics
Self organization of a massive document collection
IEEE Transactions on Neural Networks
Automatic text categorization based on content analysis with cognitive situation models
Information Sciences: an International Journal
A visualization tool of probabilistic models for information access components
ECDL'09 Proceedings of the 13th European conference on Research and advanced technology for digital libraries
A visual tool for bayesian data analysis: the impact of smoothing on naive bayes text classifiers
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Nonlinear transformation of term frequencies for term weighting in text categorization
Engineering Applications of Artificial Intelligence
Hi-index | 0.00 |
The two-dimensional representation of documents which allows documents to be represented in a two-dimensional Cartesian plane has proved to be a valid visualization tool for Automated Text Categorization (ATC) for understanding the relationships between categories of textual documents, and to help users to visually audit the classifier and identify suspicious training data. This paper analyzes a specific use of this visualization approach in the case of the Naive Bayes (NB) model for text classification and the Binary Independence Model (BIM) for text retrieval. For text categorization, a reformulation of the equation for the decision of classification has to be written in such a way that each coordinate of a document is the sum of two addends: a variable component P(d|c"i), and a constant component P(c"i), the prior of the category. When plotted in the Cartesian plane according to this formulation, the documents that are constantly shifted along the x-axis and the y-axis can be seen. This effect of shifting is more or less evident according to which NB model, Bernoulli or multinomial, is chosen. For text retrieval, the same reformulation can be applied in the case of the BIM model. The visualization helps to understand the decisions that are taken to order the documents, in particular in the case of relevance feedback.