The Strength of Weak Learnability
Machine Learning
Machine Learning
Solving the multiple instance problem with axis-parallel rectangles
Artificial Intelligence
Feature Selection: Evaluation, Application, and Small Sample Performance
IEEE Transactions on Pattern Analysis and Machine Intelligence
The use of MMR, diversity-based reranking for reordering documents and producing summaries
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A framework for multiple-instance learning
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
On Issues of Instance Selection
Data Mining and Knowledge Discovery
A Unifying View on Instance Selection
Data Mining and Knowledge Discovery
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Solving the Multiple-Instance Problem: A Lazy Learning Approach
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
On the bursty evolution of blogspace
WWW '03 Proceedings of the 12th international conference on World Wide Web
Convex Optimization
Large-scale text categorization by batch mode active learning
Proceedings of the 15th international conference on World Wide Web
Batch mode active learning and its application to medical image classification
ICML '06 Proceedings of the 23rd international conference on Machine learning
Extracting redundancy-aware top-k patterns
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Distances and (Indefinite) Kernels for Sets of Objects
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Exploring in the weblog space by detecting informative and affective articles
Proceedings of the 16th international conference on World Wide Web
Multiple instance learning for sparse positive bags
Proceedings of the 24th international conference on Machine learning
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Comments-oriented blog summarization by sentence extraction
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Hi-index | 0.00 |
With an explosive growth of blogs, information seeking in blogosphere becomes more and more challenging. One example task is to find the most relevant topical blogs against a given query or an existing blog. Such a task requires concise representation of blogs for effective and efficient searching and matching. In this paper, we investigate a new problem of profiling a blog by choosing a set of m most representative entries from the blog, where m is a predefined number that is application-dependent. With the set of selected representative entries, applications on blogs avoid handling hundreds or even thousands of entries (or posts) associated with each blog, which are updated frequently and often noisy in nature. To guide the process of selecting the most representative entries, we propose three principles, i.e., anomaly, representativeness, and diversity. Based on these principles, a greedy yet very efficient entry selection algorithm is proposed. To evaluate the entry selection algorithms, an extrinsic evaluation methodology from document summarization research is adapted. Specifically, we evaluate the proposed entry selection algorithms by examining their blog classification accuracies. By evaluating on a number of different classification methods, our empirical results showed that comparable classification accuracy could be achieved by using fewer than 20 representative entries for each blog compared to that of engaging all entries.