Feature identification for topical relevance assessment in feed search engines

Authors:
Yongwook Shin;Jonghun Park
Affiliations:
Department of Industrial Engineering, Seoul National University, Seoul, Korea;Department of Industrial Engineering, Seoul National University, Seoul, Korea
Venue:
Intelligent Data Analysis
Year:
2013

Citing 19
Cited 0

A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Bridging the Gap: A Genre Analysis of Weblogs

HICSS '04 Proceedings of the Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS'04) - Track 4 - Volume 4
Discriminative models for information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
An evaluation of statistical spam filtering techniques

ACM Transactions on Asian Language Information Processing (TALIP)
Ranking a stream of news

WWW '05 Proceedings of the 14th international conference on World Wide Web
Thumbs up?: sentiment classification using machine learning techniques

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
A Self-Organizing Search Engine for RSS Syndicated Web Contents

ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
A regression framework for learning ranking functions using relative relevance judgments

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A framework for projected clustering of high dimensional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Detecting splogs via temporal dynamics using self-similarity analysis

ACM Transactions on the Web (TWEB)
Retrieval and feedback models for blog feed search

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Key blog distillation: ranking aggregates

Proceedings of the 17th ACM conference on Information and knowledge management
Blog site search using resource selection

Proceedings of the 17th ACM conference on Information and knowledge management
Search Engines: Information Retrieval in Practice

Search Engines: Information Retrieval in Practice
Collaborative filtering with temporal dynamics

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Detecting spam blogs: a machine learning approach

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Exploring the Relationship between Keywords and Feed Elements in Blog Post Search

World Wide Web
Semantic-based Merging of RSS Items

World Wide Web
Support vector machines for spam categorization

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Feed has become a popular way to effectively distribute and acquire information on the web. The explosive growth of feeds demands a search engine that can help users quickly discover feeds of their interests. Retrieval effectiveness of feed search engine highly depends on a relevance assessment method that determines candidates for ranking query results. However, existing relevance assessment approaches proposed for web page retrieval may produce unsatisfactory result due to the different characteristics of feeds from traditional web pages. Compared to web pages, feed is a dynamic document since it continually generates information on some specific topics. In addition, it is a structured document that consists of several data elements such as title and description. Accordingly, the relevance assessment method for feed retrieval needs to effectively address these unique characteristics of feeds. This paper considers a problem of identifying significant features which are a feature set created from feed data elements, with the aim of improving effectiveness of feed retrieval while at the same time reducing computational cost. We conducted extensive experiments to investigate the problem using support vector machine on real-world data sets, and found the significant features that can be employed for feed search services.