Leveraging one-class SVM and semantic analysis to detect anomalous content

Authors:
Ozgur Yilmazel;Svetlana Symonenko;Niranjan Balasubramanian;Elizabeth D. Liddy
Affiliations:
Center for Natural Language Processing, School of Information Studies, Syracuse University, Syracuse, NY;Center for Natural Language Processing, School of Information Studies, Syracuse University, Syracuse, NY;Center for Natural Language Processing, School of Information Studies, Syracuse University, Syracuse, NY;Center for Natural Language Processing, School of Information Studies, Syracuse University, Syracuse, NY
Venue:
ISI'05 Proceedings of the 2005 IEEE international conference on Intelligence and Security Informatics
Year:
2005

Citing 11
Cited 5

Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Information Retrieval

Information Retrieval
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
One-class svms for document classification

The Journal of Machine Learning Research
Building Text Classifiers Using Positive and Unlabeled Examples

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Text classification from positive and unlabeled documents

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Detecting deception through linguistic analysis

ISI'03 Proceedings of the 1st NSF/NIJ conference on Intelligence and security informatics
A longitudinal analysis of language behavior of deception in e-mail

ISI'03 Proceedings of the 1st NSF/NIJ conference on Intelligence and security informatics
Emergent semantics from users' browsing paths

ISI'03 Proceedings of the 1st NSF/NIJ conference on Intelligence and security informatics

On off-topic access detection in information systems

Proceedings of the 14th ACM international conference on Information and knowledge management
Improving classification based off-topic search detection via category relationships

Proceedings of the 2009 ACM symposium on Applied Computing
Illuminating trouble tickets with sublanguage theory

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Detecting cyber security threats in weblogs using probabilistic models

PAISI'07 Proceedings of the 2007 Pacific Asia conference on Intelligence and security informatics
Role-based differentiation for insider detection algorithms

Proceedings of the 2010 ACM workshop on Insider threats

Quantified Score

Hi-index	0.00

Visualization

Abstract

Experiments were conducted to test several hypotheses on methods for improving document classification for the malicious insider threat problem within the Intelligence Community. Bag-of-words (BOW) representations of documents were compared to Natural Language Processing (NLP) based representations in both the typical and one-class classification problems using the Support Vector Machine algorithm. Results show that the NLP features significantly improved classifier performance over the BOW approach both in terms of precision and recall, while using many fewer features. The one-class algorithm using NLP features demonstrated robustness when tested on new domains.