An algorithm for suffix stripping
Readings in information retrieval
On-line new event detection and tracking
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The impact of database selection on distributed searching
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
First story detection in TDT is hard
Proceedings of the ninth international conference on Information and knowledge management
Relevance based language models
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Novelty and redundancy detection in adaptive filtering
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Topic-conditioned novelty detection
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Retrieval and novelty detection at the sentence level
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A System for new event detection
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Novelty detection based on sentence level patterns
Proceedings of the 14th ACM international conference on Information and knowledge management
Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach
Computational Linguistics
Chinese Named Entity Recognition combining a statistical model with human knowledge
MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
Improving the estimation of relevance models using large external corpora
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
LDA-based document models for ad-hoc retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A fast, accurate deterministic parser for Chinese
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
An algorithm for text categorization
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Combining named entities and tags for novel sentence detection
Proceedings of the WSDM '09 Workshop on Exploiting Semantic Annotations in Information Retrieval
Sentence-Level Novelty Detection in English and Malay
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Evaluation of novelty metrics for sentence-level novelty mining
Information Sciences: an International Journal
Evaluation of novelty metrics for sentence-level novelty mining
Information Sciences: an International Journal
Detecting novel business blogs
ICICS'09 Proceedings of the 7th international conference on Information, communications and signal processing
An intelligent system for sentence retrieval and novelty mining
International Journal of Knowledge Engineering and Data Mining
Design of an intelligent novelty detection application
International Journal of Innovative Computing and Applications
An approach for multi-objective categorization based on the game theory and Markov process
Applied Soft Computing
Dimensionality reduction for blog tag mining
International Journal of Web Engineering and Technology
A data-centric approach to feed search in blogs
International Journal of Web Engineering and Technology
International Journal of Advanced Pervasive and Ubiquitous Computing
Probabilistic Models for Social Media Mining
International Journal of Information Technology and Web Engineering
Adaptable Services for Novelty Mining
International Journal of Systems and Service-Oriented Engineering
Hi-index | 0.00 |
A challenge for sentence categorization and novelty mining is to detect not only when text is relevant to the user's information need, but also when it contains something new which the user has not seen before. It involves two tasks that need to be solved. The first is identifying relevant sentences (categorization) and the second is identifying new information from those relevant sentences (novelty mining). Many previous studies of relevant sentence retrieval and novelty mining have been conducted on the English language, but few papers have addressed the problem of multilingual sentence categorization and novelty mining. This is an important issue in global business environments, where mining knowledge from text in a single language is not sufficient. In this paper, we perform the first task by categorizing Malay and Chinese sentences, then comparing their performances with that of English. Thereafter, we conduct novelty mining to identify the sentences with new information. Experimental results on TREC 2004 Novelty Track data show similar categorization performance on Malay and English sentences, which greatly outperform Chinese. In the second task, it is observed that we can achieve similar novelty mining results for all three languages, which indicates that our algorithm is suitable for novelty mining of multilingual sentences. In addition, after benchmarking our results with novelty mining without categorization, it is learnt that categorization is necessary for the successful performance of novelty mining.