On the limited memory BFGS method for large scale optimization
Mathematical Programming: Series A and B
Conceptual-model-based data extraction from multiple-record Web pages
Data & Knowledge Engineering
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Bootstrapping Semantic Annotation for Content-Rich HTML Documents
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Web data extraction based on partial tree alignment
WWW '05 Proceedings of the 14th international conference on World Wide Web
2D Conditional Random Fields for Web information extraction
ICML '05 Proceedings of the 22nd international conference on Machine learning
Correctness of Local Probability Propagation in Graphical Models with Loops
Neural Computation
Integrating Unstructured Data into Relational Databases
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Extracting Objects from the Web
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Simultaneous record detection and attribute labeling in web data extraction
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction
The Journal of Machine Learning Research
Scaling conditional random fields by one-against-the-other decomposition
Journal of Computer Science and Technology
Loopy belief propagation for approximate inference: an empirical study
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Beauty and the beast: the theory and practice of information integration
ICDT'07 Proceedings of the 11th international conference on Database Theory
On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs
IEEE Transactions on Information Theory
Hi-index | 0.00 |
Semantic annotation of Web objects is a key problem for Web information extraction. The Web contains an abundance of useful semi-structured information about real world objects, and the empirical study shows that strong two-dimensional sequence characteristics and correlative characteristics exist for Web information about objects of the same type across different Web sites. Conditional Random Fields (CRFs) are the state-of-the-art approaches taking the sequence characteristics to do better labeling. However, as the appearance of correlative characteristics between Web object elements, previous CRFs have their limitations for semantic annotation of Web objects and cannot deal with the long distance dependencies between Web object elements efficiently. To better incorporate the long distance dependencies, on one hand, this paper describes long distance dependencies by correlative edges, which are built by making good use of structured information and the characteristics of records from external databases; and on the other hand, this paper presents a two-dimensional Correlative-Chain Conditional Random Fields (2DCC-CRFs) to do semantic annotation of Web objects. This approach extends a classic model, two-dimensional Conditional Random Fields (2DCRFs), by adding correlative edges. Experimental results using a large number of real-world data collected from diverse domains show that the proposed approach can significantly improve the semantic annotation accuracy of Web objects.