An algorithm for suffix stripping
Readings in information retrieval
On-line new event detection and tracking
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The impact of database selection on distributed searching
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
First story detection in TDT is hard
Proceedings of the ninth international conference on Information and knowledge management
Novelty and redundancy detection in adaptive filtering
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Retrieval and novelty detection at the sentence level
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A System for new event detection
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Novelty detection based on sentence level patterns
Proceedings of the 14th ACM international conference on Information and knowledge management
Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach
Computational Linguistics
Computation on sentence semantic distance for novelty detection
Journal of Computer Science and Technology
The nature of novelty detection
Information Retrieval
An information-pattern-based approach to novelty detection
Information Processing and Management: an International Journal
Semantic text similarity using corpus-based word similarity and string similarity
ACM Transactions on Knowledge Discovery from Data (TKDD)
Machine learning techniques for business blog search and mining
Expert Systems with Applications: An International Journal
Combining named entities and tags for novel sentence detection
Proceedings of the WSDM '09 Workshop on Exploiting Semantic Annotations in Information Retrieval
Sentence-Level Novelty Detection in English and Malay
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Design and development of a mobile peer-to-peer social networking application
Expert Systems with Applications: An International Journal
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
D2S: Document-to-sentence framework for novelty detection
Knowledge and Information Systems
Dimensionality reduction for blog tag mining
International Journal of Web Engineering and Technology
A data-centric approach to feed search in blogs
International Journal of Web Engineering and Technology
Hi-index | 12.05 |
Novelty detection aims at reducing redundant information from a chronologically ordered list of documents or sentences. Other studies of novelty detection have been conducted on the English language, but few papers have addressed the problem of multilingual novelty detection. Likewise, research in multilingual information retrieval have rarely been applied to novelty detection. This paper attempts to bridge the two disciplines by first describing the preprocessing steps for English, Malay and Chinese, then applying document and sentence-level novelty detection for the three languages on APWSJ and TREC 2004 Novelty Track data. Experiments on sentence-level novelty detection show similar results for all three languages, which indicates that our algorithm is suitable for multilingual novelty detection at the sentence level. However, results for document-level novelty detection show a disparity across the different languages, with English and Malay outperforming Chinese. After applying sentence-level novelty detection to detect novel documents, we observe substantial improvements on all three languages. This demonstrates that segmenting documents into sentences improves document-level novelty detection in multiple languages, and has practical benefits for a real-time multilingual novelty detection system.