Evaluation of novelty metrics for sentence-level novelty mining

Authors:
Flora S. Tsai;Wenyin Tang;Kap Luk Chan
Affiliations:
School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore;School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore;School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore
Venue:
Information Sciences: an International Journal
Year:
2010

Citing 16
Cited 11

Novelty and redundancy detection in adaptive filtering

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Retrieval and novelty detection at the sentence level

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Using information retrieval techniques for supporting data mining

Data & Knowledge Engineering
Computation on sentence semantic distance for novelty detection

Journal of Computer Science and Technology
The nature of novelty detection

Information Retrieval
On the quality of resources on the Web: An information retrieval perspective

Information Sciences: an International Journal
Machine learning techniques for business blog search and mining

Expert Systems with Applications: An International Journal
A complex network approach to text summarization

Information Sciences: an International Journal
Combining named entities and tags for novel sentence detection

Proceedings of the WSDM '09 Workshop on Exploiting Semantic Annotations in Information Retrieval
Sentence-Level Novelty Detection in English and Malay

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Design and development of a mobile peer-to-peer social networking application

Expert Systems with Applications: An International Journal
Chinese novelty mining

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Reducing coding redundancy in LZW

Information Sciences: an International Journal
Database optimization for novelty detection

ICICS'09 Proceedings of the 7th international conference on Information, communications and signal processing
Detecting novel business blogs

ICICS'09 Proceedings of the 7th international conference on Information, communications and signal processing
Multilingual sentence categorization and novelty mining

Information Processing and Management: an International Journal

An intelligent system for sentence retrieval and novelty mining

International Journal of Knowledge Engineering and Data Mining
Design of an intelligent novelty detection application

International Journal of Innovative Computing and Applications
Automatic threshold estimation for data matching applications

Information Sciences: an International Journal
Dimensionality reduction for blog tag mining

International Journal of Web Engineering and Technology
Multilingual sentence categorization and novelty mining

Information Processing and Management: an International Journal
Chinese categorization and novelty mining

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
GenDocSum+MCLR: Generic document summarization based on maximum coverage and less redundancy

Expert Systems with Applications: An International Journal
A data-centric approach to feed search in blogs

International Journal of Web Engineering and Technology
Probabilistic Models for Social Media Mining

International Journal of Information Technology and Web Engineering
Adaptable Services for Novelty Mining

International Journal of Systems and Service-Oriented Engineering
Rare category exploration

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.07

Visualization

Abstract

This work addresses the problem of detecting novel sentences from an incoming stream of text data, by studying the performance of different novelty metrics, and proposing a mixed metric that is able to adapt to different performance requirements. Existing novelty metrics can be divided into two types, symmetric and asymmetric, based on whether the ordering of sentences is taken into account. After a comparative study of several different novelty metrics, we observe complementary behavior in the two types of metrics. This finding motivates a new framework of novelty measurement, i.e. the mixture of both symmetric and asymmetric metrics. This new framework of novelty measurement performs superiorly under different performance requirements varying from high-precision to high-recall as well as for data with different percentages of novel sentences. Because it does not require any prior information, the new metric is very suitable for real-time knowledge base applications such as novelty mining systems where no training data is available beforehand.