Artificial immune system for illicit content identification in social media

Authors:
Ming Yang;Melody Kiang;Hsinchun Chen;Yijun Li
Affiliations:
Department of Management Science and Engineering, Harbin Institute of Technology, Harbin, China 150001;Department of Information System, California State University, Long Beach, CA 90840;Artificial Intelligence Lab, Department of Management Information Systems, University of Arizona, Tucson, AZ 85721;Department of management Science and Engineering, Harbin Institute of Technology, Harbin, China 150001
Venue:
Journal of the American Society for Information Science and Technology
Year:
2012

Citing 18
Cited 2

Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Artificial Immune Systems: A New Computational Intelligence Paradigm

Artificial Immune Systems: A New Computational Intelligence Paradigm
PEBL: Web Page Classification without Negative Examples

IEEE Transactions on Knowledge and Data Engineering
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Semisupervised Learning of Classifiers: Theory, Algorithms, and Their Application to Human-Computer Interaction

IEEE Transactions on Pattern Analysis and Machine Intelligence
Applying Authorship Analysis to Extremist-Group Web Forum Messages

IEEE Intelligent Systems
Text Classification without Negative Examples Revisit

IEEE Transactions on Knowledge and Data Engineering
Web-based text classification in the absence of manually labeled training documents

Journal of the American Society for Information Science and Technology
Guest Editors' Introduction: Social Media and Search

IEEE Internet Computing
AISIID: An artificial immune system for interesting information discovery on the web

Applied Soft Computing
Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums

ACM Transactions on Information Systems (TOIS)
Affect Analysis of Web Forums and Blogs Using Correlation Ensembles

IEEE Transactions on Knowledge and Data Engineering
Introduction to Semi-Supervised Learning

Introduction to Semi-Supervised Learning
Text-based video content classification for online video-sharing sites

Journal of the American Society for Information Science and Technology
A new approach for semi-supervised online news classification

HSI'05 Proceedings of the 3rd international conference on Human Society@Internet: web and Communication Technologies and Internet-Related Social Issues
Applying authorship analysis to arabic web content

ISI'05 Proceedings of the 2005 IEEE international conference on Intelligence and Security Informatics
Revisiting the Foundations of Artificial Immune Systems for Data Mining

IEEE Transactions on Evolutionary Computation
Affect analysis of text using fuzzy semantic typing

IEEE Transactions on Fuzzy Systems

Social media-based systems: an emerging area of information systems research and practice

Scientometrics
A method of feature selection and sentiment similarity for Chinese micro-blogs

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Social media is frequently used as a platform for the exchange of information and opinions as well as propaganda dissemination. But online content can be misused for the distribution of illicit information, such as violent postings in web forums. Illicit content is highly distributed in social media, while non-illicit content is unspecific and topically diverse. It is costly and time consuming to label a large amount of illicit content (positive examples) and non-illicit content (negative examples) to train classification systems. Nevertheless, it is relatively easy to obtain large volumes of unlabeled content in social media. In this article, an artificial immune system-based technique is presented to address the difficulties in the illicit content identification in social media. Inspired by the positive selection principle in the immune system, we designed a novel labeling heuristic based on partially supervised learning to extract high-quality positive and negative examples from unlabeled datasets. The empirical evaluation results from two large hate group web forums suggest that our proposed approach generally outperforms the benchmark techniques and exhibits more stable performance. © 2012 Wiley Periodicals, Inc.