Learning and Revising User Profiles: The Identification ofInteresting Web Sites
Machine Learning - Special issue on multistrategy learning
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
The Journal of Machine Learning Research
How do users evaluate the credibility of Web sites?: a study with over 2,500 participants
Proceedings of the 2003 conference on Designing for user experiences
Syskill & webert: Identifying interesting web sites
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Cool Blog Classification from Positive and Unlabeled Examples
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Credibility-inspired ranking for blog post retrieval
Information Retrieval
Hi-index | 0.00 |
Among a huge number of blogs on the internet, only some of them are considered to have great contents and worth to be explored. We call such kind of blogs cool blogs and attempt to identify them. To solve the cool blog identification problem, we consider three assumptions on cool blogs: (1) cool blogs tend to have definite topics, (2) cool blogs tend to have sufficient amount of blog entries, and (3) cool blogs tend to have certain levels of topic consistency among their blog entries. Corresponding to these assumptions, we extract a mixture of topic probabilities using a topic model, exploit the number of blog entries of each blog, and calculate the topic consistency among blog entries using distance functions over topic probabilities, respectively. We show the benefits of the proposed assumptions through these features. A feature unification model is also presented to achieve highest effectiveness. The experimental results on Japanese blog data show that we can improve the classification results by applying proposed assumptions.