Text compression
Self-organized language modeling for speech recognition
Readings in speech recognition
Representation and learning in information retrieval
Representation and learning in information retrieval
The design and analysis of efficient lossless data compression systems
The design and analysis of efficient lossless data compression systems
Comparing representations in Chinese information retrieval
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Machine Learning - Special issue on learning with probabilistic representations
Machine Learning - Special issue on learning with probabilistic representations
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Employing multiple representations for Chinese information retrieval
Journal of the American Society for Information Science
Context-sensitive learning methods for text categorization
ACM Transactions on Information Systems (TOIS)
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Statistical phrases for vector-space information retrieval (poster abstract)
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Text genre classification with genre-revealing and subject-revealing features
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Learnability of Augmented Naive Bayes in Nonimal Domains
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Feature Engineering for Text Classification
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
On Machine Learning Methods for Chinese Document Categorization
Applied Intelligence
Text Mining: A New Frontier for Lossless Compression
DCC '99 Proceedings of the Conference on Data Compression
Automatic text categorization in terms of genre and author
Computational Linguistics
Automatic detection of text genre
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Thumbs up?: sentiment classification using machine learning techniques
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Authorship verification as a one-class classification problem
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Determining an author's native language by mining a text for errors
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Generalized Naive Bayes Classifiers
ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Discriminatively Trained Markov Model for Sequence Classification
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Machine learning for Arabic text categorization: Research Articles
Journal of the American Society for Information Science and Technology
Effective identification of source code authors using byte-level information
Proceedings of the 28th international conference on Software engineering
Building bridges for web query classification
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Authorship attribution with thousands of candidate authors
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Extracting key-substring-group features for text classification
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Text classification improved through multigram models
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Spam Filtering Using Statistical Data Compression Models
The Journal of Machine Learning Research
A study of local and global thresholding techniques in text categorization
AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Using query contexts in information retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Ontology-supported polarity mining
Journal of the American Society for Information Science and Technology
Personal name classification in web queries
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Examining the significance of high-level programming features in source code author classification
Journal of Systems and Software
Exploring hedge identification in biomedical literature
Journal of Biomedical Informatics
Deep classification in large-scale text hierarchies
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Fast logistic regression for text categorization with variable-length n-grams
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
A Language Modelling Approach to Linking Criminal Styles with Offender Characteristics
NLDB '08 Proceedings of the 13th international conference on Natural Language and Information Systems: Applications of Natural Language to Information Systems
Tensor Space Models for Authorship Identification
SETN '08 Proceedings of the 5th Hellenic conference on Artificial Intelligence: Theories, Models and Applications
A Web-Based Self-training Approach for Authorship Attribution
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Adapting information retrieval to query contexts
Information Processing and Management: an International Journal
Neural networks letter: LAGO on the unit sphere
Neural Networks
A survey of modern authorship attribution methods
Journal of the American Society for Information Science and Technology
Using the Web as corpus for self-training text categorization
Information Retrieval
A statistical approach to crosslingual natural language tasks
Journal of Algorithms
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Wikipedia-based semantic interpretation for natural language processing
Journal of Artificial Intelligence Research
Automatic dimensionality selection from the scree plot via the use of profile likelihood
Computational Statistics & Data Analysis
A Language Modelling approach to linking criminal styles with offender characteristics
Data & Knowledge Engineering
Combining global and local information for enhanced deep classification
Proceedings of the 2010 ACM Symposium on Applied Computing
Mining police digital archives to link criminal styles with offender characteristics
ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
An approach to indexing and clustering news stories using continuous language models
NLDB'10 Proceedings of the Natural language processing and information systems, and 15th international conference on Applications of natural language to information systems
Toward a semantic granularity model for domain-specific information retrieval
ACM Transactions on Information Systems (TOIS)
An alternative approach for statistical single-label document classification of newspaper articles
Journal of Information Science
Local histograms of character N-grams for authorship attribution
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Authorship attribution using word sequences
CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications
On compression-based text classification
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Automatically determining an anonymous author's native language
ISI'05 Proceedings of the 2005 IEEE international conference on Intelligence and Security Informatics
Improving tweet stream classification by detecting changes in word probability
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Buy it - don't buy it: sentiment classification on amazon reviews using sentence polarity shift
PRICAI'12 Proceedings of the 12th Pacific Rim international conference on Trends in Artificial Intelligence
Recognition of word collocation habits using frequency rank ratio and inter-term intimacy
Expert Systems with Applications: An International Journal
Information fusion in taxonomic descriptions
Proceedings of the 2013 international workshop on Mining unstructured big data using natural language processing
Utilizing global and path information with language modelling for hierarchical text classification
Journal of Information Science
Hi-index | 0.00 |
We augment naive Bayes models with statistical n-gram language models to address short-comings of the standard naive Bayes text classifier. The result is a generalized naive Bayes classifier which allows for a local Markov dependence among observations; a model we refer to as the Chain Augmented Naive Bayes (CAN) Bayes classifier. CAN models have two advantages over standard naive Bayes classifiers. First, they relax some of the independence assumptions of naive Bayes—allowing a local Markov chain dependence in the observed variables—while still permitting efficient inference and learning. Second, they permit straightforward application of sophisticated smoothing techniques from statistical language modeling, which allows one to obtain better parameter estimates than the standard Laplace smoothing used in naive Bayes classification. In this paper, we introduce CAN models and apply them to various text classification problems. To demonstrate the language independent and task independent nature of these classifiers, we present experimental results on several text classification problems—authorship attribution, text genre classification, and topic detection—in several languages—Greek, English, Japanese and Chinese. We then systematically study the key factors in the CAN model that can influence the classification performance, and analyze the strengths and weaknesses of the model.