The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Newsjunkie: providing personalized newsfeeds via analysis of information novelty
Proceedings of the 13th international conference on World Wide Web
PageRank without hyperlinks: structural re-ranking using links induced by language models
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Graph-based ranking algorithms for sentence extraction, applied to text summarization
ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
LexRank: graph-based lexical centrality as salience in text summarization
Journal of Artificial Intelligence Research
Parsing with soft and hard constraints on dependency length
Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology
Natural Language Compression on Edge-Guided text preprocessing
Information Sciences: an International Journal
Graph-based term weighting for information retrieval
Information Retrieval
Review: A review of novelty detection
Signal Processing
Hi-index | 0.00 |
We discuss several feature sets for novelty detection at the sentence level, using the data and procedure established in task 2 of the TREC 2004 novelty track. In particular, we investigate feature sets derived from graph representations of sentences and sets of sentences. We show that a highly connected graph produced by using sentence-level term distances and pointwise mutual information can serve as a source to extract features for novelty detection. We compare several feature sets based on such a graph representation. These feature sets allow us to increase the accuracy of an initial novelty classifier which is based on a bag-of-word representation and KL divergence. The final result ties with the best system at TREC 2004.