Managing knowledge on the Web - Extracting ontology from HTML Web

Authors:
Timon C. Du;Feng Li;Irwin King
Affiliations:
Department of Decision Sciences and Managerial Economics, The Chinese University of Hong Kong, Hong Kong;School of Business Administration, South China University of Technology, China;Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong
Venue:
Decision Support Systems
Year:
2009

Citing 33
Cited 4

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Learning to construct knowledge bases from the World Wide Web

Artificial Intelligence - Special issue on Intelligent internet systems
Ontologies: a silver bullet for knowledge management and electronic commerce

Ontologies: a silver bullet for knowledge management and electronic commerce
Using web structure for classifying and describing web pages

Proceedings of the 11th international conference on World Wide Web
Modern Information Retrieval

Modern Information Retrieval
A Portrait of the Semantic Web in Action

IEEE Intelligent Systems
Ontology Learning for the Semantic Web

IEEE Intelligent Systems
RoadRunner: Towards Automatic Data Extraction from Large Web Sites

Proceedings of the 27th International Conference on Very Large Data Bases
PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
OilEd: A Reason-able Ontology Editor for the Semantic Web

KI '01 Proceedings of the Joint German/Austrian Conference on AI: Advances in Artificial Intelligence
Extracting structured data from Web pages

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
A Fully Automated Object Extraction System for the World Wide Web

ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
Learning ontologies from natural language texts

International Journal of Human-Computer Studies
Learning Rules for Conceptual Structure on the Web

Journal of Intelligent Information Systems
Ontology mapping: the state of the art

The Knowledge Engineering Review
OntoMiner: bootstrapping ontologies from overlapping domain specific web sites

Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Ontology Evolution: Not the Same as Schema Evolution

Knowledge and Information Systems
The state of the art in ontology learning: a framework for comparison

The Knowledge Engineering Review
Learning domain ontologies for Web service descriptions: an experiment in bioinformatics

WWW '05 Proceedings of the 14th international conference on World Wide Web
An overview of methods and tools for ontology learning from texts

The Knowledge Engineering Review
Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites

Computational Linguistics
Mining Ontological Knowledge from Domain-Specific Text Documents

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Ontology Extraction from Tables on the Web

SAINT '06 Proceedings of the International Symposium on Applications on Internet
Extracting a domain-specific ontology from a corporate intranet

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Continuous auditing with a multi-agent system

Decision Support Systems
Using the structure of HTML documents to improve retrieval

USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
A paradigmatic and methodological examination of knowledge management research: 2000 to 2004

Decision Support Systems
Methodology for the implementation of knowledge management systems

Journal of the American Society for Information Science and Technology
A semantic-expansion approach to personalized knowledge recommendation

Decision Support Systems
Enhancing portability with multilingual ontology-based knowledge management

Decision Support Systems
Evaluating ontology mapping techniques: An experiment in public safety information sharing

Decision Support Systems
Review: Knowledge management and knowledge management systems: conceptual foundations and research issues

MIS Quarterly

Adoption of Semantic Web from the perspective of technology innovation: A grounded theory approach

International Journal of Human-Computer Studies
Who is talking? An ontology-based opinion leader identification framework for word-of-mouth marketing in online social blogs

Decision Support Systems
Deriving knowledge representation guidelines by analyzing knowledge engineer behavior

Decision Support Systems
Discovering role-based virtual knowledge flows for organizational knowledge support

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, the Internet has become one of the most important sources of information, and it is now imperative that companies are able to collect, retrieve, process, and manage information from the Web. However, due to the sheer amount of information available, browsing web content by searches using keywords is inefficient, largely because unstructured HTML web pages are written for human comprehension and not for direct machine processing. For the same reason, the degree of web automation is limited. It is recognized that semantics can enhance web automation, but it will take an indefinite amount of effort to convert the current HTML Web into the Semantic Web. This study proposes a novel ontology extractor, called OntoSpider, for extracting ontology from the HTML Web. The contribution of this work is the design and implementation of a six-phase process that includes the preparation, transformation, clustering, recognition, refinement, and revision for extracting ontology from unstructured HTML pages. The extracted ontology provides structured and relevant information for applications such as e-commerce and knowledge management that can be compared and analyzed more effectively. We give detailed information on the system and provide a series of experimental results that validate the system design and illustrate the effectiveness of OntoSpider.