Domain relevance on term weighting

Authors:
Marko Brunzel;Myra Spiliopoulou
Affiliations:
DFKI GmbH, German Research Center for AI and Otto-von-Guericke, Universität Magdeburg, Germany;Otto-von-Guericke, Universität Magdeburg, Germany
Venue:
NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
Year:
2007

Citing 8
Cited 3

Generating and evaluating domain-oriented multi-word terms from texts

Information Processing and Management: an International Journal
A vector space model for automatic indexing

Communications of the ACM
Knowledge Acquisition of Predicate Argument Structures from Technical Texts Using Machine Learning: The System ASIUM

EKAW '99 Proceedings of the 11th European Workshop on Knowledge Acquisition, Modeling and Management
Term Weighting Approaches in Automatic Text Retrieval

Term Weighting Approaches in Automatic Text Retrieval
Identification of relevant terms to support the construction of domain ontologies

HLTKM '01 Proceedings of the workshop on Human Language Technology and Knowledge Management - Volume 2001
RELFIN – topic discovery for ontology enhancement and annotation

ESWC'05 Proceedings of the Second European conference on The Semantic Web: research and Applications
Discovering semantic sibling groups from web documents with XTREEM-SG

EKAW'06 Proceedings of the 15th international conference on Managing Knowledge in a World of Networks
Discovering multi terms and co-hyponymy from XHTML documents with XTREEM

KDXD'06 Proceedings of the First international conference on Knowledge Discovery from XML Documents

Discovering Groups of Sibling Terms from Web Documents with XTREEM-SG

Journal on Data Semantics XI
The XTREEM Methods for Ontology Learning from Web Documents

Proceedings of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge
Analyzing document collections via context-aware term extraction

NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The TFxIDF term weighting scheme is the standard approach on vectorization of textual data. For a data set where textual data stemming from web document structure is to be vectorized [2] the need for a enhanced term weighting scheme arose. In this publication we introduce a term weighting scheme which improves the behavior compared to the traditional TFxIDF scheme by adding a component which is based on the linguistically inspired notion of domain relevance. Domain relevance measures the degree to which a term is regarded as more relevant within a data set compared to a reference data set. By means of this external component a potential weakness of TFxIDF on non standard distributed data sets is overcome. This weighting scheme favours domain relevant terms, which can be regarded as more useful in settings where the clustering is performed to be consumed by an human supervisor e.g for semi-automatic ontology learning.