Induction of one-level decision trees
ML92 Proceedings of the ninth international workshop on Machine learning
C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Learning to remove Internet advertisements
Proceedings of the third annual conference on Autonomous Agents
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
A statistical learning learning model of text classification for support vector machines
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning
Machine Learning
ECML '95 Proceedings of the 8th European Conference on Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting
EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
Adaptive filtering of advertisements on web pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Detecting spam web pages through content analysis
Proceedings of the 15th international conference on World Wide Web
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Spam double-funnel: connecting web spammers with advertisers
Proceedings of the 16th international conference on World Wide Web
Strider typo-patrol: discovery and analysis of systematic typo-squatting
SRUTI'06 Proceedings of the 2nd conference on Steps to Reducing Unwanted Traffic on the Internet - Volume 2
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Support vector machines for spam categorization
IEEE Transactions on Neural Networks
SUT: Quantifying and mitigating URL typosquatting
Computer Networks: The International Journal of Computer and Telecommunications Networking
Moving to the mobile internet: analyzing Sedo's domain parking services
Proceedings of the 13th International Conference on Electronic Commerce
Hi-index | 0.00 |
An ads-portal domain refers to a Web domain that shows only advertisements, served by a third-party advertisement syndication service, in the form of ads listing. We develop a machine-learning-based classifier to identify ads-portal domains, which has 96% accuracy. We use this classifier to measure the prevalence of ads-portal domains on the Internet. Surprisingly, 28.3/25% of the (two-level) *.com/*.net web domains are ads-portal domains. Also, 41/39.8% of *.com/*.net ads-portal domains are typos of well-known domains, also known as typo-squatting domains. In addition, we use the classifier along with DNS trace files to estimate how often Internet users visit ads-portal domains. It turns out that ∼5% of the two-level *.com, *.net, *.org, *.biz and *.info web domains on the traces are ads-portal domains and ∼50% of these accessed ads-portal domains are typos. These numbers show that ads-portal domains and typo-squatting ads-portal domains are prevalent on the Internet and successful in attracting many visits. Our classifier represents a step towards better categorizing the web documents. It can also be helpful to search engines ranking algorithms, helpful in identifying web spams that redirects to ads-portal domains, and used to discourage access to typo-squatting ads-portal domains.