Generating and evaluating domain-oriented multi-word terms from texts
Information Processing and Management: an International Journal
A vector space model for automatic indexing
Communications of the ACM
EKAW '99 Proceedings of the 11th European Workshop on Knowledge Acquisition, Modeling and Management
Term Weighting Approaches in Automatic Text Retrieval
Term Weighting Approaches in Automatic Text Retrieval
Identification of relevant terms to support the construction of domain ontologies
HLTKM '01 Proceedings of the workshop on Human Language Technology and Knowledge Management - Volume 2001
RELFIN – topic discovery for ontology enhancement and annotation
ESWC'05 Proceedings of the Second European conference on The Semantic Web: research and Applications
Discovering semantic sibling groups from web documents with XTREEM-SG
EKAW'06 Proceedings of the 15th international conference on Managing Knowledge in a World of Networks
Discovering multi terms and co-hyponymy from XHTML documents with XTREEM
KDXD'06 Proceedings of the First international conference on Knowledge Discovery from XML Documents
Discovering Groups of Sibling Terms from Web Documents with XTREEM-SG
Journal on Data Semantics XI
The XTREEM Methods for Ontology Learning from Web Documents
Proceedings of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge
Analyzing document collections via context-aware term extraction
NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems
Hi-index | 0.00 |
The TFxIDF term weighting scheme is the standard approach on vectorization of textual data. For a data set where textual data stemming from web document structure is to be vectorized [2] the need for a enhanced term weighting scheme arose. In this publication we introduce a term weighting scheme which improves the behavior compared to the traditional TFxIDF scheme by adding a component which is based on the linguistically inspired notion of domain relevance. Domain relevance measures the degree to which a term is regarded as more relevant within a data set compared to a reference data set. By means of this external component a potential weakness of TFxIDF on non standard distributed data sets is overcome. This weighting scheme favours domain relevant terms, which can be regarded as more useful in settings where the clustering is performed to be consumed by an human supervisor e.g for semi-automatic ontology learning.