RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Reverse Engineering for Web Data: From Visual to Semantic Structures
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
AggregateRank: bringing order to web sites
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Using tagflake for condensing navigable tag hierarchies from tag clouds
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Creating tag hierarchies for effective navigation in social media
Proceedings of the 2008 ACM workshop on Search in social media
Ontology learning from domain specific web documents
International Journal of Metadata, Semantics and Ontologies
Managing knowledge on the Web - Extracting ontology from HTML Web
Decision Support Systems
Navigating within news collections using tag-flakes
Journal of Visual Languages and Computing
Ranking Algorithm for Semantic Document Annotations
International Journal of Information Retrieval Research
Hi-index | 0.00 |
In this paper, we present automated techniques for bootstrapping and populating specialized domain ontologies by organizing and mining a set of relevant overlapping Web sites provided by the user. We develop algorithms that detect and utilize HTML regularities in the Web documents to turn them into hierarchical semantic structures encoded as XML. Next, we present tree-mining algorithms that identify key domain concepts and their taxonomical relationships. We also extract semi-structured concept instances annotated with their labels whenever they are available. Experimental evaluation for the News, Travel, and Shopping domains indicates that our algorithms can bootstrap and populate domain specific ontologies with high precision and recall.