Machine Learning for Author Affiliation within Web Forums -- Using Statistical Techniques on NLP Features for Online Group Identification

Authors:
Jeffrey Ellen;Shibin Parameswaran
Affiliations:
-;-
Venue:
ICMLA '11 Proceedings of the 2011 10th International Conference on Machine Learning and Applications and Workshops - Volume 01
Year:
2011

Citing 0
Cited 1

Implicit group membership detection in online text: analysis and applications

SBP'12 Proceedings of the 5th international conference on Social Computing, Behavioral-Cultural Modeling and Prediction

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although there have been previous studies performing authorship attribution to a specific individual, we find a shortage of efforts to group authors based on their affiliations. This paper presents our work on classification of website forum posts by the author's group affiliation. Specifically, we seek to classify translated website forum posts by the (inferred) political affiliation of the author. The two datasets that we attempt to classify consist of real-world data discussing current issues--Israeli/Palestinian dialogue (Bitter Lemons corpus) and translated Extremist/Moderate forum entries (from internet websites). To achieve our goal of reliable authorship affiliation, we extract term frequency-based features (that are conventional in document classification) along with less commonly used linguistic style-based features. The resulting set of stylometric features are then utilized in two widely used supervised classification algorithms, namely k-Nearest Neighbor algorithm and Support Vector Machines. Specifically, we used k-NN with cosine distance and Support Vector Machines with two different kernel functions. In addition to the popular RBF kernels, we also evaluate the applicability and performance of the recently introduced arc-cosine kernels for group affiliation. The results of our experiments show strong performance across a range of pertinent metrics.