Template-based wrappers in the TSIMMIS system
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic information extraction from semi-structured Web pages by pattern discovery
Decision Support Systems - Web retrieval and mining
WebOQL: Restructuring Documents, Databases, and Webs
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
A Fast Algorithm for Optimal Alignment between Similar Ordered Trees
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Data extraction and label assignment for web databases
WWW '03 Proceedings of the 12th international conference on World Wide Web
A Fully Automated Object Extraction System for the World Wide Web
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
Wrapper induction for information extraction
Wrapper induction for information extraction
Mining data records in Web pages
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning block importance models for web pages
Proceedings of the 13th international conference on World Wide Web
Fully automatic wrapper generation for search engines
WWW '05 Proceedings of the 14th international conference on World Wide Web
Web data extraction based on partial tree alignment
WWW '05 Proceedings of the 14th international conference on World Wide Web
ViPER: augmenting automatic information extraction with visual perceptions
Proceedings of the 14th ACM international conference on Information and knowledge management
Simultaneous record detection and attribute labeling in web data extraction
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A Survey of Web Information Extraction Systems
IEEE Transactions on Knowledge and Data Engineering
Board Forum Crawling: A Web Crawling Method for Web Forum
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
iRobot: an intelligent crawler for web forums
Proceedings of the 17th international conference on World Wide Web
Exploring traversal strategy for web forum crawling
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Incorporating site-level knowledge to extract structured data from web forums
Proceedings of the 18th international conference on World wide web
Hi-index | 0.09 |
User reviews in forum sites are the important information source for many popular applications (e.g., monitoring and analysis of public opinion), which are usually represented in form of structured records. To the best of our knowledge, little existing work reported in the literature has systemically investigated the problem of extracting user reviews from forum sites. Besides the variety of web page templates, user-generated reviews raise two new challenges. First, the inconsistency of review contents in terms of both the document object model (DOM) tree and visual appearance impair the similarity between review records; second, the review content in a review record corresponds to complicated subtrees rather than single nodes in the DOM tree. To tackle these challenges, we present WeRE - a system that performs automatic user review extraction by employing sophisticated techniques. The review records are extracted from web pages based on the proposed level-weighted tree similarity algorithm first, and then the review contents in records are extracted exactly by measuring the node consistency. Our experimental results based on 20 forum sites indicate that WeRE can achieve high extraction accuracy.