Web information extraction using Markov logic networks

Authors:
Sandeepkumar Satpal;Sahely Bhadra;Sundararajan Sellamanickam;Rajeev Rastogi;Prithviraj Sen
Affiliations:
Microsoft, Hyderabad, India;CSA, Indian Institute of Science, Bangalore, India;Yahoo! Labs, Bangalore, India;Yahoo! Labs, Bangalore, India;Yahoo! Labs, Bangalore, India
Venue:
Proceedings of the 20th international conference companion on World wide web
Year:
2011

Citing 2
Cited 2

Markov logic networks

Machine Learning
Simultaneous record detection and attribute labeling in web data extraction

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

Learning to adapt cross language information extraction wrapper

Applied Intelligence
Towards high-throughput gibbs sampling at scale: a study across storage managers

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we consider the problem of extracting structured data from web pages taking into account both the content of individual attributes as well as the structure of pages and sites. We use Markov Logic Networks (MLNs) to capture both content and structural features in a single unified framework, and this enables us to perform more accurate inference. We show that inference in our information extraction scenario reduces to solving an instance of the maximum weight subgraph problem. We develop specialized procedures for solving the maximum subgraph variants that are far more efficient than previously proposed inference methods for MLNs that solve variants of MAX-SAT. Experiments with real-life datasets demonstrate the effectiveness of our approach.