Analysing features of Japanese splogs and characteristics of keywords

Authors:
Yuuki Sato;Takehito Utsuro;Yoshiaki Murakami;Tomohiro Fukuhara;Hiroshi Nakagawa;Yasuhide Kawada;Noriko Kando
Affiliations:
University of Tsukuba, Tsukuba, Japan;University of Tsukuba, Tsukuba, Japan;Navix Co., Ltd., Tokyo, Japan;University of Tokyo, Kashiwa, Japan;University of Tokyo, Tokyo, Japan;Navix Co., Ltd., Tokyo, Japan;National Institute of Informatics, Tokyo, Japan
Venue:
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Year:
2008

Citing 3
Cited 5

Automatically collecting, monitoring, and mining japanese weblogs

Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Spam double-funnel: connecting web spammers with advertisers

Proceedings of the 16th international conference on World Wide Web
Splog detection using self-similarity analysis on blog temporal dynamics

AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web

An empirical study on selective sampling in active learning for splog detection

Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Detecting spam blogs from blog search results

Information Processing and Management: an International Journal
Adversarial Web Search

Foundations and Trends in Information Retrieval
Comparing similarity of HTML structures and affiliate IDs in splog analysis

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
Detecting splogs using similarities of splog HTML structures

Proceedings of the 4th International Conference on Uniquitous Information Management and Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper focuses on analyzing (Japanese) splogs based on various characteristics of keywords contained in them. We estimate the behavior of spammers when creating splogs from other sources by analyzing the characteristics of keywords contained in splogs. Since splogs often cause noises in word occurrence statistics in the blogosphere, we assume that we can efficiently (manually) collect splogs by sampling blog homepages containing keywords of a certain type on the date with its most frequent occurrence. We manually examine various features of collected blog homepages regarding whether their text content is excerpt from other sources or not, as well as whether they display affiliate advertisement or out-going links to affiliated sites. Among various informative results, it is important to note that more than half of the collected splogs are created by a very small number of spammers.