A brief survey of web data extraction tools
ACM SIGMOD Record
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Perception-oriented online news extraction
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Template-independent news extraction based on visual consistency
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
ICCSA'10 Proceedings of the 2010 international conference on Computational Science and Its Applications - Volume Part II
Complete-Thread extraction from web forums
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Hi-index | 0.00 |
This paper presents a novel work on the task of extracting data from Web forums. Millions of users contribute rich information to Web forum everyday, which has become an important resource for manyWeb applications, such as product opinion retrieval, social network analysis, and so on. The novelty of the proposed algorithm is that it can not only extract the pure text but also distinguish between the original post and replies. Experimental results on a large number of real Web forums indicate that the proposed algorithm can correctly ex