Crime profiling for the Arabic language using computational linguistic techniques

Authors:
Meshrif Alruily;Aladdin Ayesh;Hussein Zedan
Affiliations:
-;-;-
Venue:
Information Processing and Management: an International Journal
Year:
2014

Citing 23
Cited 0

Intelligent Indexing of Crime Scene Photographs

IEEE Intelligent Systems
Crime Data Mining: A General Framework and Some Examples

Computer
Events Extraction and Classification for Arabic Information Retrieval Systems

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
MUC-4 evaluation metrics

MUC4 '92 Proceedings of the 4th conference on Message understanding
Extracting meaningful entities from police narrative reports

dg.o '02 Proceedings of the 2002 annual national conference on Digital government research
Towards including prosody in a text-to-speech system for modern standard Arabic

Computer Speech and Language
Using NLP Techniques for Tagging Events in Arabic Text

ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
Natural language processing and e-Government: crime information extraction from heterogeneous data sources

dg.o '08 Proceedings of the 2008 international conference on Digital government research
An Analysis of Data Mining Applications in Crime Domain

CITWORKSHOPS '08 Proceedings of the 2008 IEEE 8th International Conference on Computer and Information Technology Workshops
Arabic Named Entity Recognition from Diverse Text Types

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Cluster-Centric Approach to News Event Extraction

Proceedings of the 2008 conference on New Trends in Multimedia and Network Information Systems
TAGARAB: a fast, accurate Arabic name recognizer using high-precision morphological analysis

Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages
Classifying Amharic news text using self-organizing maps

Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages
LoLo: a system based on terminology for multilingual extraction

IEBeyondDoc '06 Proceedings of the Workshop on Information Extraction Beyond The Document
Person name entity recognition for Arabic

Semitic '07 Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources
Crime Type Document Classification from Arabic Corpus

DESE '09 Proceedings of the 2009 Second International Conference on Developments in eSystems Engineering
Semi-supervised named entity recognition: learning to recognize 100 entity types with little supervision

Semi-supervised named entity recognition: learning to recognize 100 entity types with little supervision
Automatically Constructing Dictionaries for Extracting Meaningful Crime Information from Arabic Text

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Automatically constructing a dictionary for information extraction tasks

AAAI'93 Proceedings of the eleventh national conference on Artificial intelligence
Simplified feature set for Arabic named entity recognition

NEWS '10 Proceedings of the 2010 Named Entities Workshop
Arabic Named Entity Recognition: A Feature-Driven Study

IEEE Transactions on Audio, Speech, and Language Processing
Named entity recognition for Arabic using syntactic grammars

NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Arabic is a widely spoken language but few mining tools have been developed to process Arabic text. This paper examines the crime domain in the Arabic language (unstructured text) using text mining techniques. The development and application of a Crime Profiling System (CPS) is presented. The system is able to extract meaningful information, in this case the type of crime, location and nationality, from Arabic language crime news reports. The system has two unique attributes; firstly, information extraction that depends on local grammar, and secondly, dictionaries that can be automatically generated. It is shown that the CPS improves the quality of the data through reduction where only meaningful information is retained. Moreover, the Self Organising Map (SOM) approach is adopted in order to perform the clustering of the crime reports, based on crime type. This clustering technique is improved because only refined data containing meaningful keywords extracted through the information extraction process are inputted into it, i.e. the data are cleansed by removing noise. The proposed system is validated through experiments using a corpus collated from different sources; it was not used during system development. Precision, recall and F-measure are used to evaluate the performance of the proposed information extraction approach. Also, comparisons are conducted with other systems. In order to evaluate the clustering performance, three parameters are used: data size, loading time and quantization error.