Mining for Norms in Clouds: Complying to Ethical Communication through Cloud Text Data Mining

Authors:
Ahsan Nabi Khan;Aslam Muhammad;A. M. Martinez Enriquez
Affiliations:
-;-;-
Venue:
UCC '12 Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing
Year:
2012

Citing 7
Cited 0

Principled design of the modern Web architecture

ACM Transactions on Internet Technology (TOIT)
Code and Other Laws of Cyberspace

Code and Other Laws of Cyberspace
The Grid 2: Blueprint for a New Computing Infrastructure

The Grid 2: Blueprint for a New Computing Infrastructure
Dynamo: amazon's highly available key-value store

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Bigtable: A Distributed Storage System for Structured Data

ACM Transactions on Computer Systems (TOCS)
Next Generation of Data Mining

Next Generation of Data Mining
PsycheTagger: using hidden Markov model to annotate English text with semantic tags based on emotive content

AIKED'12 Proceedings of the 11th WSEAS international conference on Artificial Intelligence, Knowledge Engineering and Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the world is realizing the power and efficiency of cloud computing, enhanced security and intelligence is needed in communication to filter out unethical data violating norms in clouds. No filtering categorization has been currently proposed. Numerous lists of banned, unethical and objectionable words have been developed with limited user satisfaction. Lists are usually manually generated, with some programmable extensibility for online forums and public newsgroups. We define a tool and methodology to categorize the censor data. We statistically grow words in the categorized data and tag the hidden neutral words with meaning in context. Using Computational Linguistics tools and modifying them to suit our means, we analyze sample text from gigabytes of email newsgroup dataset over Cloud Servers. A sample result dataset of the most frequently used words breaking the norms in recent cloud communication is presented in the results in broad categories. The categories separate cloud-server data found in newsgroups related to internet crimes, fraud, theft, anti-state elements, and other material of legal importance. Thus this study demonstrates a tag cloud of most frequent critical words in communications from legal and ethical point-of-view in the current scenario of cloud databases.