Template-based wrappers in the TSIMMIS system
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Information storage and retrieval
Information storage and retrieval
A scalable comparison-shopping agent for the World-Wide Web
AGENTS '97 Proceedings of the first international conference on Autonomous agents
Extracting schema from semistructured data
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Ontology-based extraction and structuring of information from data-rich unstructured documents
Proceedings of the seventh international conference on Information and knowledge management
Learning to extract symbolic knowledge from the World Wide Web
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
A hierarchical approach to wrapper induction
Proceedings of the third annual conference on Autonomous Agents
DTD inference for views of XML data
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
XTRACT: a system for extracting document type descriptors from XML documents
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A machine learning based approach for table detection on the web
Proceedings of the 11th international conference on World Wide Web
Ontology Learning and Its Application to Automated Terminology Translation
IEEE Intelligent Systems
WebOQL: Restructuring Documents, Databases, and Webs
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Four Steps Towards the Widespread Adoption of a Semantic Web
ISWC '02 Proceedings of the First International Semantic Web Conference on The Semantic Web
PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Efficiently mining frequent trees in a forest
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
SemTag and seeker: bootstrapping the semantic web via automated semantic annotation
WWW '03 Proceedings of the 12th international conference on World Wide Web
Representing and reasoning about mappings between domain models
Eighteenth national conference on Artificial intelligence
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
HTML Page Analysis Based on Visual Cues
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Reverse Engineering for Web Data: From Visual to Semantic Structures
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Web-scale information extraction in knowitall: (preliminary results)
Proceedings of the 13th international conference on World Wide Web
Mining tables from large scale HTML texts
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Using the structure of Web sites for automatic segmentation of tables
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Incremental Ontology-Based Extraction and Alignment in Semi-structured Documents
DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Hi-index | 0.00 |
RDF/XML has been widely recognised as the standard for annotating online web documents and for transforming the HTML web into the so-called Semantic Web. In order to enable widespread usability of the Semantic Web, there is a need to bootstrap large, rich and up-to-date domain ontologies that organise the most relevant concepts, their relationships and instances. In this paper, we present automated techniques for bootstrapping and populating specialised domain ontologies by organising and mining a set of relevant overlapping websites. We develop algorithms that detect and utilise HTML regularities in the web documents to turn them into hierarchical semantic structures encoded as XML. Next, we present tree-mining algorithms that identify key domain concepts and their taxonomical relationships. We also extract semi-structured concept instances annotated with their labels whenever they are available. We also report experimental evaluation for the news, travel and shopping domains to demonstrate the efficacy of our algorithms.