Efficient string matching: an aid to bibliographic search
Communications of the ACM
Similarity estimation techniques from rounding algorithms
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Detecting near-duplicates for web crawling
Proceedings of the 16th international conference on World Wide Web
On the Effectiveness of Simhash for Detecting Near-Miss Clones in Large Scale Software Systems
WCRE '11 Proceedings of the 2011 18th Working Conference on Reverse Engineering
How to break MD5 and other hash functions
EUROCRYPT'05 Proceedings of the 24th annual international conference on Theory and Applications of Cryptographic Techniques
Hi-index | 0.00 |
Among all the Internet applications, microblog is warmly welcomed by the users for its concise content, flexible form, rapid spread and real-time update. The total number of microblogs registered in China's main platforms, such as Sina, Tencent, Netease, Sohu and so on, exceeds 1 billion. Because of the huge number of the user, microblog seems to have become an independent route of information transmission, which can work independently in the guidance of public opinion. How to regulate microblog and locate public opinion hazards, and make use of microblog for better planning and guiding the public opinion, and create elegant and pleasant cultural life should be our new concern. In this paper, according to the characteristics of microblog content and form, we propose a technology based on the analysis of the fingerprint information. We use network spiders crawling microblog content with limited public opinion keywords, then extract the text content and use simhash algorithm creating the unique fingerprint. Fingerprints are clusterd base on hamming distance. In the same category, we analyze the data, discovering the sensitive topic, looking for key information disseminators, thus control the public opinion. After many times of experiments, this method has a higher accuracy and credibility.