Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
WordNet: a lexical database for English
Communications of the ACM
On domain knowledge and feature selection using a support vector machine
Pattern Recognition Letters
Verbs semantics and lexical selection
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Document preprocessing for naive Bayes classification and clustering with mixture of multinomials
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A New Methodology for Merging the Heterogeneous Domain Ontologies Based on the WordNet
NWESP '05 Proceedings of the International Conference on Next Generation Web Services Practices
Taking advantages of a disadvantage: Digital forensics and steganography using document metadata
Journal of Systems and Software
Latent semantic analysis for text categorization using neural network
Knowledge-Based Systems
Computer Crime Investigation by Means of Fuzzy Semantic Maps
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
Security and privacy issues in the Portable Document Format
Journal of Systems and Software
Text Editor based on Google Trigram and its Usability
EMS '10 Proceedings of the 2010 Fourth UKSim European Symposium on Computer Modeling and Simulation
Automatic Enrichment of Semantic Relation Network and Its Application to Word Sense Disambiguation
IEEE Transactions on Knowledge and Data Engineering
Domain N-gram construction and its application to text editor
ACIIDS'11 Proceedings of the Third international conference on Intelligent information and database systems - Volume Part I
Information Retrieval Techniques to Grasp User Intention in Pervasive Computing Environment
IMIS '11 Proceedings of the 2011 Fifth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing
Automatic Evaluation of Document Classification Using N-Gram Statistics
NBIS '12 Proceedings of the 2012 15th International Conference on Network-Based Information Systems
Editorial: Advanced technologies for homeland defense and security
Journal of Network and Computer Applications
Hi-index | 0.00 |
Classifying web documents is considered as one of the most important tasks to reveal the terrorism-related documents. Internet provides a lot of valuable information to the users and the amount of web contents is progressively increasing. This makes it very difficult to identify potentially dangerous documents. Simply extracting keywords from documents is not enough to classify the contents. To build automated document classification systems, many techniques have been studied so far, but they are mostly statistical and knowledge-based approaches. These methods, however, do not yield satisfactory results because of complexity of natural languages. To overcome this deficiency, we propose a method to use word similarity based on WordNet hierarchy and n-gram data frequency. This method was tested with the sampled New York Times articles by querying four distinct words from four different areas. Experimental results show our proposed method effectively extracts context words from the text and identifies terrorism-related documents.