iASA: learning to annotate the semantic web

Authors:
Jie Tang;Juanzi Li;Hongjun Lu;Bangyong Liang;Xiaotong Huang;Kehong Wang
Affiliations:
Department of Computer Science, Tsinghua University, Beijing, P.R. China;Department of Computer Science, Tsinghua University, Beijing, P.R. China;Department of Computer Science, Tsinghua University, Beijing, P.R. China;Department of Computer Science, Tsinghua University, Beijing, P.R. China;Department of Computer Science, Tsinghua University, Beijing, P.R. China;Department of Computer Science, Tsinghua University, Beijing, P.R. China
Venue:
Journal on Data Semantics IV
Year:
2005

Citing 23
Cited 5

Technical Note: Selecting a Classification Method by Cross-Validation

Machine Learning
Support-Vector Networks

Machine Learning
A maximum entropy approach to natural language processing

Computational Linguistics
Factorial Hidden Markov Models

Machine Learning - Special issue on learning with probabilistic representations
Learning Information Extraction Rules for Semi-Structured and Free Text

Machine Learning - Special issue on natural language learning
Embedding knowledge in Web documents

WWW '99 Proceedings of the eighth international conference on World Wide Web
Annotea: an open RDF infrastructure for shared Web annotations

Proceedings of the 10th international conference on World Wide Web
Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web by Its Inventor

Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web by Its Inventor
Automatic Ontology-Based Knowledge Extraction from Web Documents

IEEE Intelligent Systems
MnM: Ontology Driven Semi-automatic and Automatic Support for Semantic Markup

EKAW '02 Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web
S-CREAM - Semi-automatic CREAtion of Metadata

EKAW '02 Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Boosted Wrapper Induction

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
A maximum entropy approach to information extraction from semi-structured and free text

Eighteenth national conference on Artificial intelligence
Automatic document metadata extraction using support vector machines

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Automated semantic annotation and retrieval based on sharable ontology and case-based learning techniques

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Table extraction using conditional random fields

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Relational learning techniques for natural language information extraction

Relational learning techniques for natural language information extraction
Active learning with multiple views

Active learning with multiple views
iMAP: discovering complex semantic matches between database schemas

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Focused named entity recognition using machine learning

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10

Logical structure based semantic relationship extraction from semi-structured documents

Proceedings of the 15th international conference on World Wide Web
EOS: expertise oriented search using social networks

Proceedings of the 16th international conference on World Wide Web
Table detection from plain text using machine learning and document structure

APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Tree-structured conditional random fields for semantic annotation

ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Semantic annotation using horizontal and vertical contexts

ASWC'06 Proceedings of the First Asian conference on The Semantic Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the advent of the Semantic Web, there is a great need to upgrade existing web content to semantic web content. This can be accomplished through semantic annotations. Unfortunately, manual annotation is tedious, time consuming and error-prone. In this paper, we propose a tool, called iASA, that learns to automatically annotate web documents according to an ontology. iASA is based on the combination of information extraction (specifically, the Similarity-based Rule Learner—SRL) and machine learning techniques. Using linguistic knowledge and optimal dynamic window size, SRL produces annotation rules of better quality than comparable semantic annotation systems. Similarity-based learning efficiently reduces the search space by avoiding pseudo rule generalization. In the annotation phase, iASA exploits ontology knowledge to refine the annotation it proposes. Moreover, our annotation algorithm exploits machine learning methods to correctly select instances and to predict missing instances. Finally, iASA provides an explanation component that explains the nature of the learner and annotator to the user. Explanations can greatly help users understand the rule induction and annotation process, so that they can focus on correcting rules and annotations quickly. Experimental results show that iASA can reach high accuracy quickly.