Selection of relevant features and examples in machine learning
Artificial Intelligence - Special issue on relevance
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
Making large-scale support vector machine learning practical
Advances in kernel methods
High-performing feature selection for text classification
Proceedings of the eleventh international conference on Information and knowledge management
Feature Subset Selection in Text-Learning
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Mining Sequential Patterns: Generalizations and Performance Improvements
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
EPIA '99 Proceedings of the 9th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Gender-Preferential Text Mining of E-mail Discourse
ACSAC '02 Proceedings of the 18th Annual Computer Security Applications Conference
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
Bidirectional inference with the easiest-first strategy for tagging sequence data
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Feature subsumption for opinion analysis
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Feature Selection with Ensembles, Artificial Variables, and Redundancy Elimination
The Journal of Machine Learning Research
N-Gram feature selection for authorship identification
AIMSA'06 Proceedings of the 12th international conference on Artificial Intelligence: methodology, Systems, and Applications
Mark my words!: linguistic style accommodation in social media
Proceedings of the 20th international conference on World wide web
Gender attribution: tracing stylometric evidence beyond topic and genre
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
CMCL '11 Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics
Style analysis of academic writing
NLDB'11 Proceedings of the 16th international conference on Natural language processing and information systems
Predicting age and gender in online social networks
Proceedings of the 3rd international workshop on Search and mining user-generated contents
Modeling of stylistic variation in social media with stretchy patterns
DIALECTS '11 Proceedings of the First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties
Discriminating gender on Twitter
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Using psycholinguistic features for profiling first language of authors
Journal of the American Society for Information Science and Technology
Construction and application of chinese emotional corpus
CLSW'12 Proceedings of the 13th Chinese conference on Chinese Lexical Semantics
Language independent gender classification on Twitter
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Recognition of understanding level and language skill using measurements of reading behavior
Proceedings of the 19th international conference on Intelligent User Interfaces
Hi-index | 0.00 |
The problem of automatically classifying the gender of a blog author has important applications in many commercial domains. Existing systems mainly use features such as words, word classes, and POS (part-of-speech) n-grams, for classification learning. In this paper, we propose two new techniques to improve the current result. The first technique introduces a new class of features which are variable length POS sequence patterns mined from the training data using a sequence pattern mining algorithm. The second technique is a new feature selection method which is based on an ensemble of several feature selection criteria and approaches. Empirical evaluation using a real-life blog data set shows that these two techniques improve the classification accuracy of the current state-of-the-art methods significantly.