Rule identification from web pages by the XRML approach

Authors:
Juyoung Kang;Jae Kyu Lee
Affiliations:
School of Business Administration, Ajou University, Wonchon-Dong Yeongtong-Gu, Suwon, Korea;Graduate School of Management, Korea Advanced Institute of Science and Technology, Cheongryangri, Seoul, Korea
Venue:
Decision Support Systems
Year:
2005

Citing 22
Cited 4

Knowledge base verification

AI Magazine
A survey of knowledge acquisition techniques and their relevance to managerial problem domains

Decision Support Systems
Information extraction and text summarization using linguistic knowledge acquisition

Information Processing and Management: an International Journal
Automatic rule generation by the transformation of expert's diagram: LIFT

International Journal of Man-Machine Studies
Semi-automatic acquisition of conceptual structure from technical texts

International Journal of Man-Machine Studies
Knowledge acquisition for intelligent decision systems

Decision Support Systems
Use of natural language for knowledge acquisition: strategies to cope with semantic and pragmatic variation

IBM Journal of Research and Development
Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)
SACD: a system for acquiring knowledge from regulatory texts

Computers and Electrical Engineering - Special issue on artificial intelligence and expert systems
Knowledge acquisition techniques for group decision support

Knowledge Acquisition
Using explicit ontologies in KBS development

International Journal of Human-Computer Studies
Understanding, building and using ontologies

International Journal of Human-Computer Studies
Combining Horn rules and description logics in CARIN

Artificial Intelligence
Using natural language sources in model-based knowledge acquisition

Data & Knowledge Engineering
Learning Information Extraction Rules for Semi-Structured and Free Text

Machine Learning - Special issue on natural language learning
Learning to construct knowledge bases from the World Wide Web

Artificial Intelligence - Special issue on Intelligent internet systems
Extracting focused knowledge from the semantic web

International Journal of Human-Computer Studies
Automatic Ontology-Based Knowledge Extraction from Web Documents

IEEE Intelligent Systems
A Knowledge-Based Information Extraction System for Semi-structured Labeled Documents

IDEAL '02 Proceedings of the Third International Conference on Intelligent Data Engineering and Automated Learning
The eXtensible Rule Markup Language

Communications of the ACM - Wireless networking security
Unsupervised learning of mDTD extraction patterns for web text mining

Information Processing and Management: an International Journal
XML-Based Schema Definition for Support of Interorganizational Workflow

Information Systems Research

Rule identification using ontology while acquiring rules from Web pages

International Journal of Human-Computer Studies
Review article: A review of structured document retrieval (SDR) technology to improve information access performance in engineering document management

Computers in Industry
Policy-Driven Process Mapping (PDPM): Discovering process models from business policies

Decision Support Systems
A framework for ontology based rule acquisition from web documents

RR'07 Proceedings of the 1st international conference on Web reasoning and rule systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the world of Web pages, there are oceans of documents in natural language texts and tables. To extract rules from Web pages and maintain consistency between them, we have developed the framework of XRML (eXtensible Rule Markup Language). XRML allows the identification of rules on Web pages and generates the identified rules automatically. For this purpose, we have designed the Rule Identification Markup Language (RIML), which is similar to the formal Rule Structure Markup Language (RSML), both as parts of XRML. RIML 2.0 is designed to identify rules not only from texts, but also from tables on Web pages, and to transform to the formal rules in RSML syntax automatically. While designing RIML 2.0, we considered the features of sharing variables and values, omitted terms, and synonyms.We have conducted an experiment to evaluate the potential benefit of the XRML approach with real world Web pages of Amazon.com, BarnesandNoble.com, and Powells.com. We found that 100.0% of the rules and 99.7% of the rule components could be identified and automatically generated if we do not count the statements for linkages, which generically do not exist on the Web pages. Since the linkage components occupy 11.2% of all components in the rule base, the overall limitation of automatic rule generation is 88.8%. In this setting, 88.5% of the overall rule components could be generated from the identified rules from the Web pages. The result provides solid proof that XRML can facilitate the extraction and maintenance of rules from Web pages while building expert systems in the Semantic Web environment.