Gender-Preferential Text Mining of E-mail Discourse
ACSAC '02 Proceedings of the 18th Annual Computer Security Applications Conference
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Weighted Graph Cuts without Eigenvectors A Multilevel Approach
IEEE Transactions on Pattern Analysis and Machine Intelligence
A comparative study of statistical features of language in blogs-vs-splogs
Proceedings of the second workshop on Analytics for noisy unstructured text data
Learning Age and Gender of Blogger from Stylistic Variation
PReMI '09 Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence
Hi-index | 0.01 |
This work attempts to report the stylistic differences in blogging for gender and age group variations using slang word co-occurrences. We have mainly focused on co-occurrence of non dictionary words across bloggers of different gender and age groups. For this analysis, we have focused on the feature use of slang words to study the stylistic variations of bloggers across various age groups and gender. We have modeled the co-occurrences of slang words used by bloggers as graph based model where nodes are slang words and edges represent the number of cooccurrences and studied the variations in predicting age groups and gender. We have used demographically tagged blog corpus from ICWSM Spinner dataset for these experiments and used Naive Bayes classifier with 10 fold cross validations. Preliminary results shows that the concurrence of of slang words could be a better choice for predicting age and gender.