The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Board Forum Crawling: A Web Crawling Method for Web Forum
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
iRobot: an intelligent crawler for web forums
Proceedings of the 17th international conference on World Wide Web
Exploring traversal strategy for web forum crawling
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
An Approach to Deep Web Crawling by Sampling
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Improving the performance of focused web crawlers
Data & Knowledge Engineering
Addressing the limited scope problem of focused crawling using a result merging approach
Proceedings of the 2010 ACM Symposium on Applied Computing
Automatic extraction rules generation based on XPath pattern learning
WISS'10 Proceedings of the 2010 international conference on Web information systems engineering
Hi-index | 0.00 |
Web forums have become a major source of information gathering/mining due to a large amount of user generated content. Crawling of web forums is necessary to gather/mine the information from them. However, a generic web crawler is unable to efficiently and effectively crawl the web forums because of the existence of many redundant and duplicate pages. In addition, there exists a crawling relationship among the useful pages that need to be considered. So, for efficient crawling, we need to intelligently crawl the web forums by eliminating redundant and duplicate pages, and understanding the crawling relationship. Existing works in forum crawling use visual pattern recognition based methods, which make them extremely computational expensive. In this paper, we propose a novel light-weight crawling method using text and links properties of the pages in web forums. Theoretical analysis and experimental results show the effectiveness and efficiency of the proposed method.