A Polynomial Approach to the Constructive Induction of Structural Knowledge
Machine Learning - Special issue on evaluating and changing representation
Ontology-based extraction and structuring of information from data-rich unstructured documents
Proceedings of the seventh international conference on Information and knowledge management
Knowledge engineering: principles and methods
Data & Knowledge Engineering - Special jubilee issue: DKE 25
Towards text knowledge engineering
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
OminiSearch: a method for searching dynamic content on the Web
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Template detection via data mining and its applications
Proceedings of the 11th international conference on World Wide Web
Ontology Learning for the Semantic Web
Ontology Learning for the Semantic Web
Ontology Learning for the Semantic Web
IEEE Intelligent Systems
Discovering informative content blocks from Web documents
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
DOM-based content extraction of HTML documents
WWW '03 Proceedings of the 12th international conference on World Wide Web
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
A Fully Automated Object Extraction System for the World Wide Web
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
Eliminating noisy information in Web pages for data mining
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining data records in Web pages
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A survey of kernels for structured data
ACM SIGKDD Explorations Newsletter
Knowledge level modelling: concepts and terminology
The Knowledge Engineering Review
Learning block importance models for web pages
Proceedings of the 13th international conference on World Wide Web
Detecting and Partitioning Data Objects in Complex Web Pages
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Learning from parsed sentences with INTHELEX
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
A bootstrapping method for learning semantic lexicons using extraction pattern contexts
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Methods for domain-independent information extraction from the web: an experimental comparison
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
An introduction to kernel-based learning algorithms
IEEE Transactions on Neural Networks
ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Information extraction from syllabi for academic e-Advising
Expert Systems with Applications: An International Journal
Hierarchical organization of unstructured consumer reviews
Proceedings of the 20th international conference companion on World wide web
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Hi-index | 0.00 |
This paper presents an automated approach to learning object models by means of useful object data extracted from data-intensive semistructured web documents such as product descriptions. Modeling intensive data on the Web involves the following three phrases: First, we identify the object region covering the descriptions of object data when irrelevant contents from the web documents are excluded. Second, we partition the contents of different object data appearing in the object region and construct object data using hierarchical XML outputs. Third, we induce the abstract object model from the analogous object data. This model will match the corresponding object data from a Web site more precisely and comprehensively than the existing handcrafted ontologies. The main contribution of this study is in developing a fully automated approach to extract object data and object model from semistructured web documents using kernel-based matching and View Syntax interpretation. Our system, OnModer, can automatically construct object data and induce object models from complicated web documents, such as the technical descriptions of personal computers and digital cameras downloaded from manufacturers' and vendors' sites. A comparison with the available hand-crafted ontologies and tests on an open corpus demonstrate that our framework is effective in extracting meaningful and comprehensive models.