Detecting Fake Medical Web Sites Using Recursive Trust Labeling

Authors:
Ahmed Abbasi;Fatemeh “Mariam” Zahedi;Siddharth Kaza
Affiliations:
University of Virginia;University of Wisconsin-Milwaukee;Towson University
Venue:
ACM Transactions on Information Systems (TOIS)
Year:
2012

Citing 48
Cited 1

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Induction of Decision Trees

Machine Learning
Web site mining: a new way to spot competitors, customers and suppliers in the world wide web

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
HelpfulMed: intelligent searching for medical information over the internet

Journal of the American Society for Information Science and Technology
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages

Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Fighting Internet Auction Fraud: An Assessment and Proposal

Computer
Identifying link farm spam pages

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions

IEEE Transactions on Knowledge and Data Engineering
Spam: It's Not Just for Inboxes Anymore

Computer
An Antiphishing Strategy Based on Visual Similarity Assessment

IEEE Internet Computing
Detecting spam web pages through content analysis

Proceedings of the 15th international conference on World Wide Web
Detecting semantic cloaking on the web

Proceedings of the 15th international conference on World Wide Web
Link spam detection based on mass estimation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Why spoofing is serious internet fraud

Communications of the ACM
Exploring both Content and Link Quality for Anti-Spamming

CIT '06 Proceedings of the Sixth IEEE International Conference on Computer and Information Technology
Detecting Phishing Web Pages with Visual Similarity Assessment Based on Earth Mover's Distance (EMD)

IEEE Transactions on Dependable and Secure Computing
Detecting Link Spam Using Temporal Information

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Trust in health infomediaries

Decision Support Systems
A cautious surfer for PageRank

Proceedings of the 16th international conference on World Wide Web
Splog detection using self-similarity analysis on blog temporal dynamics

AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Improving web spam classifiers using link structure

AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Extracting link spam using biased random walks from spam seed sets

AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Know your neighbors: web spam detection using the web topology

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Winnowing wheat from the chaff: propagating trust to sift spam from the web

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
ServiceFinder: A method towards enhancing service portals

ACM Transactions on Information Systems (TOIS)
Combating web spam with trustrank

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Link analysis for Web spam detection

ACM Transactions on the Web (TWEB)
Tracking Web spam with HTML style similarities

ACM Transactions on the Web (TWEB)
DirichletRank: Solving the zero-one gap problem of PageRank

ACM Transactions on Information Systems (TOIS)
Web spam identification through content and hyperlinks

AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
MedSearch: a specialized search engine for medical information retrieval

Proceedings of the 17th ACM conference on Information and knowledge management
Dynamics of Trust Revision: Using Health Infomediaries

Journal of Management Information Systems
Metalearning: Applications to Data Mining

Metalearning: Applications to Data Mining
Web spam identification through language model analysis

Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Link spam target detection using page farms

ACM Transactions on Knowledge Discovery from Data (TKDD)
Feature subsumption for opinion analysis

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
A comparison of fraud cues and classification methods for fake escrow website detection

Information Technology and Management
Weblog classification for fast splog filtering: a URL language model segmentation approach

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Cyberchondria: Studies of the escalation of medical concerns in Web search

ACM Transactions on Information Systems (TOIS)
A Comparison of Tools for Detecting Fake Websites

Computer
Web spam detection: new classification features based on qualified link analysis and language models

IEEE Transactions on Information Forensics and Security
Social Participation in Health 2.0

Computer
The role of online trading communities in managing internet auction fraud

MIS Quarterly
Detecting fake websites: the contribution of statistical learning theory

MIS Quarterly
Thwarting the nigritude ultramarine: learning to identify link spam

ECML'05 Proceedings of the 16th European conference on Machine Learning
Perils of Internet fraud: an empirical investigation of deception and trust with experienced Internet consumers

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Reliability prediction of webpages in the medical domain

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval

Trust ranking of medical websites

Proceedings of the 4th ACM conference on Data and application security and privacy

Quantified Score

Hi-index	0.00

Visualization

Abstract

Fake medical Web sites have become increasingly prevalent. Consequently, much of the health-related information and advice available online is inaccurate and/or misleading. Scores of medical institution Web sites are for organizations that do not exist and more than 90% of online pharmacy Web sites are fraudulent. In addition to monetary losses exacted on unsuspecting users, these fake medical Web sites have severe public safety ramifications. According to a World Health Organization report, approximately half the drugs sold on the Web are counterfeit, resulting in thousands of deaths. In this study, we propose an adaptive learning algorithm called recursive trust labeling (RTL). RTL uses underlying content and graph-based classifiers, coupled with a recursive labeling mechanism, for enhanced detection of fake medical Web sites. The proposed method was evaluated on a test bed encompassing nearly 100 million links between 930,000 Web sites, including 1,000 known legitimate and fake medical sites. The experimental results revealed that RTL was able to significantly improve fake medical Web site detection performance over 19 comparison content and graph-based methods, various meta-learning techniques, and existing adaptive learning approaches, with an overall accuracy of over 94%. Moreover, RTL was able to attain high performance levels even when the training dataset composed of as little as 30 Web sites. With the increased popularity of eHealth and Health 2.0, the results have important implications for online trust, security, and public safety.