Discovering multi terms and co-hyponymy from XHTML documents with XTREEM

Authors:
Marko Brunzel;Myra Spiliopoulou
Affiliations:
Otto-von-Guericke-University Magdeburg;Otto-von-Guericke-University Magdeburg
Venue:
KDXD'06 Proceedings of the First international conference on Knowledge Discovery from XML Documents
Year:
2006

Citing 6
Cited 6

Migrating data-intensive web sites into the Semantic Web

Proceedings of the 2002 ACM symposium on Applied computing
Knowledge Acquisition of Predicate Argument Structures from Technical Texts Using Machine Learning: The System ASIUM

EKAW '99 Proceedings of the 11th European Workshop on Knowledge Acquisition, Modeling and Management
Exploiting Structure for Intelligent Web Search

HICSS '01 Proceedings of the 34th Annual Hawaii International Conference on System Sciences ( HICSS-34)-Volume 4 - Volume 4
Web-scale information extraction in knowitall: (preliminary results)

Proceedings of the 13th international conference on World Wide Web
Identification of relevant terms to support the construction of domain ontologies

HLTKM '01 Proceedings of the workshop on Human Language Technology and Knowledge Management - Volume 2001
A methodology for clustering XML documents by structure

Information Systems

Discovering Groups of Sibling Terms from Web Documents with XTREEM-SG

Journal on Data Semantics XI
The XTREEM Methods for Ontology Learning from Web Documents

Proceedings of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge
Discovering semantic sibling associations from web documents with XTREEM-SP

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Discovering semantic sibling groups from web documents with XTREEM-SG

EKAW'06 Proceedings of the 15th international conference on Managing Knowledge in a World of Networks
Learning of semantic sibling group hierarchies - K-means vs. bi-secting-K-means

DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery
Domain relevance on term weighting

NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Semantic Web needs ontologies as an integral component. Current methods for learning and enhancing ontologies, need to be further improved to overcome the knowledge acquisition bottleneck. The identification of concepts and relations with only minimal user interaction is still a challenging objective. Current approaches performed to extract semantics often use association rules or clustering upon regular flat text. In this paper we describe an approach on extracting semantics from Web Document collections which takes advantage of the semi structured content within XHTML (an XML dialect which can be obtained from traditional HTML documents) Web Documents. The XTREEM (Xhtml TREE Mining) method uses structural information, the mark-up in Web content, as indicators of term boundaries and for co-hyponymy relations.