The XTREEM Methods for Ontology Learning from Web Documents

Authors:
Marko Brunzel
Affiliations:
DFKI GmbH, Kaiserslautern, Germany
Venue:
Proceedings of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge
Year:
2008

Citing 25
Cited 1

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Using text processing techniques to automatically enrich a domain ontology

Proceedings of the international conference on Formal Ontology in Information Systems - Volume 2001
Migrating data-intensive web sites into the Semantic Web

Proceedings of the 2002 ACM symposium on Applied computing
Ontology Learning for the Semantic Web

IEEE Intelligent Systems
Exploiting Structure for Intelligent Web Search

HICSS '01 Proceedings of the 34th Annual Hawaii International Conference on System Sciences ( HICSS-34)-Volume 4 - Volume 4
Term Weighting Approaches in Automatic Text Retrieval

Term Weighting Approaches in Automatic Text Retrieval
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Learning by googling

ACM SIGKDD Explorations Newsletter
Finding new terminology in very large corpora

Proceedings of the 3rd international conference on Knowledge capture
Identification of relevant terms to support the construction of domain ontologies

HLTKM '01 Proceedings of the workshop on Human Language Technology and Knowledge Management - Volume 2001
A methodology for clustering XML documents by structure

Information Systems
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications

Ontology Learning and Population from Text: Algorithms, Evaluation and Applications
A nonparametric method for extraction of candidate phrasal terms

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
A clustering method based on path similarities of XML data

Data & Knowledge Engineering
Finding synonyms using automatic word alignment and measures of distributional similarity

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Identifying synonyms among distributionally similar words

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
New experiments in distributional representations of synonymy

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Discovering semantic sibling associations from web documents with XTREEM-SP

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Building medical ontologies based on terminology extraction from texts: methodological propositions

AIME'05 Proceedings of the 10th conference on Artificial Intelligence in Medicine
Discovering semantic sibling groups from web documents with XTREEM-SG

EKAW'06 Proceedings of the 15th international conference on Managing Knowledge in a World of Networks
Proceedings of the First international conference on Knowledge Discovery from XML Documents

KDXD'06 Proceedings of the First international conference on Knowledge Discovery from XML Documents
Discovering multi terms and co-hyponymy from XHTML documents with XTREEM

KDXD'06 Proceedings of the First international conference on Knowledge Discovery from XML Documents
Learning of semantic sibling group hierarchies - K-means vs. bi-secting-K-means

DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery
Domain relevance on term weighting

NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems

Automatically structuring domain knowledge from text: An overview of current research

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Ontology Learning is up to now dominated by techniques which use text as input. There are only few methods which use a different data source. The techniques which use highly structured data as input have the disadvantage that such data sources are rare. On the other side, there are enormous amounts of Web content present today. We present the XTREEM (Xhtml TREE Mining) methods which enable Ontology Learning from Web Documents. Those methods rely on the semi-structure of Web Documents. The added value of Web document markup is exploited by the XTREEM methods. We show methods for the acquisition of terms, synonyms and semantic relations. The XTREEM techniques are based on the structure of Web documents; they are domain and language independent. There is no need for NLP software nor for training. They do not rely on domain or document collection specific resources or background knowledge, such as patterns, rules or other heuristics; nor do they rely on manually assembling a document collection.