A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
Building implicit links from content for forum search
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Board Forum Crawling: A Web Crawling Method for Web Forum
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
iRobot: an intelligent crawler for web forums
Proceedings of the 17th international conference on World Wide Web
Exploring traversal strategy for web forum crawling
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Finding question-answer pairs from online forums
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Incorporating site-level knowledge to extract structured data from web forums
Proceedings of the 18th international conference on World wide web
Juicer: Scalable Extraction for Thread Meta-information of Web Forum
PAISI '09 Proceedings of the Pacific Asia Workshop on Intelligence and Security Informatics
User grouping behavior in online forums
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Template-independent wrapper for web forums
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Extracting chatbot knowledge from online discussion forums
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Learning online discussion structures by conditional random fields
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Automatic web information extraction based on rules
WISE'11 Proceedings of the 12th international conference on Web information system engineering
Hi-index | 0.00 |
This paper proposes an effective algorithm which can automatically extract all meta-information of threads from various forums. The algorithm contains two steps: thread extraction from board pages and detailed information extraction from thread pages. In the thread extraction step, the board pages are divided into five types according to their structure, and corresponding extraction algorithms and models are suggested. In the second step, an effective method is applied to identify the content of the origin post, other un-extracted fields of the origin post which are always located around the content are matched by regular patterns, and a model is trained to extract the reply posts. The experiment shows that the proposed algorithm is accurate and effective.