Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
The Journal of Machine Learning Research
Fast webpage classification using URL features
Proceedings of the 14th ACM international conference on Information and knowledge management
WebGuard: A Web Filtering Engine Combining Textual, Structural, and Visual Content-Based Analysis
IEEE Transactions on Knowledge and Data Engineering
Blocking objectionable web content by leveraging multiple information sources
ACM SIGKDD Explorations Newsletter
Internet content filtering using isotonic separation on content category ratings
ACM Transactions on Internet Technology (TOIT)
The Role of URLs in Objectionable Web Content Categorization
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Free Speech and Child Protection on the Web
IEEE Internet Computing
Recognition of Pornographic Web Pages by Classifying Texts and Images
IEEE Transactions on Pattern Analysis and Machine Intelligence
Short communication: Variable space hidden Markov model for topic detection and analysis
Knowledge-Based Systems
Sensitive webpage classification for content advertising
Proceedings of the 1st international workshop on Data mining and audience intelligence for advertising
Modeling online reviews with multi-grain topic models
Proceedings of the 17th international conference on World Wide Web
Harmful Contents Classification Using the Harmful Word Filtering and SVM
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part III: ICCS 2007
A Method for Determination on HMM Distance Threshold
FSKD '09 Proceedings of the 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 01
Tag tree template for Web information and schema extraction
Expert Systems with Applications: An International Journal
Semantic multi-grain mixture topic model for text analysis
Expert Systems with Applications: An International Journal
Collaborative cyberporn filtering with collective intelligence
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Text classification: combining grouping, LSA and kNN vs support vector machine
KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II
Topics modeling based on selective Zipf distribution
Expert Systems with Applications: An International Journal
An early decision algorithm to accelerate web content filtering
ICOIN'06 Proceedings of the 2006 international conference on Information Networking: advances in Data Communications and Wireless Networks
Identifying interesting Twitter contents using topical analysis
Expert Systems with Applications: An International Journal
Hi-index | 12.05 |
Web 2.0 technologies have made it easily for Web users to create and spread objectionable text content, which has been shown harmful to Web users, especially young children. Although detection methods based on key word list are superior in achieving faster detection and lower memory consumption, they fail to detect text content that is objectionable in semantic description. A framework that can perfectly integrate semantic model and detection method is proposed to perform probability inference for detecting this kind of Web text content. Based on the observation that an objectionable scene could be described by a set of sentences, a topic model which is learnt from the set is employed to act as a semantic model of the objectionable scene. For a given sentence, probability value which shows the likelihood of the sentence with respect to the model is calculated in the framework. Then we use a mapping function to transform the probability value into a new indicator which is convenient for making final decision. Extensive comparison experiments on two real world text sets show that the framework can effectively recognize semantic objectionable text, and both the detection rate and the false alarm rate are superior to those of traditional methods.