Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Discovering informative content blocks from Web documents
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
On the algorithmic implementation of multiclass kernel-based vector machines
The Journal of Machine Learning Research
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Large Margin Methods for Structured and Interdependent Output Variables
The Journal of Machine Learning Research
2D Conditional Random Fields for Web information extraction
ICML '05 Proceedings of the 22nd international conference on Machine learning
Web page title extraction and its application
Information Processing and Management: an International Journal
Dynamic hierarchical Markov random fields and their application to web data extraction
Proceedings of the 24th international conference on Machine learning
Adaptive web-page content identification
Proceedings of the 9th annual ACM international workshop on Web information and data management
Sequence Labelling SVMs Trained in One Pass
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Learning to Extract Content from News Webpages
WAINA '09 Proceedings of the 2009 International Conference on Advanced Information Networking and Applications Workshops
An efficient language-independent method to extract content from news webpages
Proceedings of the 11th ACM symposium on Document engineering
Hi-index | 0.00 |
Until now, approaches to web content extraction have focused on random field models, largely neglecting large margin methods. Structured large margin methods, however, have recently shown great practical success. We compare, for the first time, greedy and structured support vector machines with conditional random fields on a real-world web news content extraction task, showing that large margin approaches are indeed competitive with random field models.