Discovering semantic sibling groups from web documents with XTREEM-SG

Authors:
Marko Brunzel;Myra Spiliopoulou
Affiliations:
Otto-von-Guericke-University, Magdeburg;Otto-von-Guericke-University, Magdeburg
Venue:
EKAW'06 Proceedings of the 15th international conference on Managing Knowledge in a World of Networks
Year:
2006

Citing 12
Cited 5

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Migrating data-intensive web sites into the Semantic Web

Proceedings of the 2002 ACM symposium on Applied computing
Knowledge Acquisition of Predicate Argument Structures from Technical Texts Using Machine Learning: The System ASIUM

EKAW '99 Proceedings of the 11th European Workshop on Knowledge Acquisition, Modeling and Management
Exploiting Structure for Intelligent Web Search

HICSS '01 Proceedings of the 34th Annual Hawaii International Conference on System Sciences ( HICSS-34)-Volume 4 - Volume 4
Web-scale information extraction in knowitall: (preliminary results)

Proceedings of the 13th international conference on World Wide Web
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Automatic construction of a hypernym-labeled noun hierarchy from text

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Learning by googling

ACM SIGKDD Explorations Newsletter
A clustering method based on path similarities of XML data

Data & Knowledge Engineering
Finding instance names and alternative glosses on the web: wordnet reloaded

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Clustering XML documents using structural summaries

EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Discovering multi terms and co-hyponymy from XHTML documents with XTREEM

KDXD'06 Proceedings of the First international conference on Knowledge Discovery from XML Documents

Discovering Groups of Sibling Terms from Web Documents with XTREEM-SG

Journal on Data Semantics XI
The XTREEM Methods for Ontology Learning from Web Documents

Proceedings of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge
Discovering semantic sibling associations from web documents with XTREEM-SP

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Learning of semantic sibling group hierarchies - K-means vs. bi-secting-K-means

DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery
Domain relevance on term weighting

NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The acquisition of explicit semantics is still a research challenge. Approaches for the extraction of semantics focus mostly on learning hierarchical hypernym-hyponym relations. The extraction of co-hyponym and co-meronym sibling semantics is performed to a much lesser extent, though they are not less important in ontology engineering. In this paper we will describe and evaluate the XTREEM-SG (Xhtml TREE Mining – for Sibling Groups) approach on finding sibling semantics from semi-structured Web documents. XTREEM takes advantage of the added value of mark-up, available in web content, for grouping text siblings. We will show that this grouping is semantically meaningful. The XTREEM-SG approach has the advantage that it is domain and language independent; it does not rely on background knowledge, NLP software or training. In this paper we apply the XTREEM-SG approach and evaluate against the reference semantics from two golden standard ontologies. We investigate how variations on input, parameters and reference influence the obtained results on structuring a closed vocabulary on sibling relations. Earlier methods that evaluate sibling relations against a golden standard report a 14.18% F-measure value. Our method improves this number into 21.47%.