Machine Learning - Special issue on inductive transfer
Automatic segmentation of text into structured records
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Hierarchical Wrapper Induction for Semistructured Information Sources
Autonomous Agents and Multi-Agent Systems
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Mining reference tables for automatic text segmentation
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Regularized multi--task learning
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Why collective inference improves relational classification
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Kernel conditional random fields: representation and clique selection
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Web data extraction based on partial tree alignment
WWW '05 Proceedings of the 14th international conference on World Wide Web
Learning Multiple Tasks with Kernel Methods
The Journal of Machine Learning Research
Integrating Unstructured Data into Relational Databases
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Simultaneous record detection and attribute labeling in web data extraction
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A Survey of Web Information Extraction Systems
IEEE Transactions on Knowledge and Data Engineering
A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data
The Journal of Machine Learning Research
Semi-supervised conditional random fields for improved sequence segmentation and labeling
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Context-aware wrapping: synchronized data extraction
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
International Journal on Document Analysis and Recognition
Domain Adaptation of Conditional Probability Models Via Feature Subsetting
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Automatic wrapper induction from hidden-web sources with domain knowledge
Proceedings of the 10th ACM workshop on Web information and data management
Foundations and Trends in Databases
Extracting data records from the web using tag path clustering
Proceedings of the 18th international conference on World wide web
Domain adaptation from multiple sources via auxiliary classifiers
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Extracting discriminative concepts for domain adaptation in text mining
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Joint inference in information extraction
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Answering table augmentation queries from unstructured lists on the web
Proceedings of the VLDB Endowment
CETR: content extraction via tag ratios
Proceedings of the 19th international conference on World wide web
Using clustering and edit distance techniques for automatic web data extraction
WISE'07 Proceedings of the 8th international conference on Web information systems engineering
ONDUX: on-demand unsupervised learning for information extraction
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Efficient graph-based semi-supervised learning of structured tagging models
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Hi-index | 0.00 |
Extracting information from web pages is an important problem; it has several applications such as providing improved search results and construction of databases to serve user queries. In this paper we propose a novel structured prediction method to address two important aspects of the extraction problem: (1) labeled data is available only for a small number of sites and (2) a machine learned global model does not generalize adequately well across many websites. For this purpose, we propose a weight space based graph regularization method. This method has several advantages. First, it can use unlabeled data to address the limited labeled data problem and falls in the class of graph regularization based semi-supervised learning approaches. Second, to address the generalization inadequacy of a global model, this method builds a local model for each website. Viewing the problem of building a local model for each website as a task, we learn the models for a collection of sites jointly; thus our method can also be seen as a graph regularization based multi-task learning approach. Learning the models jointly with the proposed method is very useful in two ways: (1) learning a local model for a website can be effectively influenced by labeled and unlabeled data from other websites; and (2) even for a website with only unlabeled examples it is possible to learn a decent local model. We demonstrate the efficacy of our method on several real-life data; experimental results show that significant performance improvement can be obtained by combining semi-supervised and multi-task learning in a single framework.