Multiresolution select-distinct queries on large geographic point sets
Proceedings of the 20th International Conference on Advances in Geographic Information Systems
Supporting rapid processing and interactive map-based exploration of streaming news
Proceedings of the 20th International Conference on Advances in Geographic Information Systems
TweetPhoto: photos from news tweets
Proceedings of the 20th International Conference on Advances in Geographic Information Systems
An efficient layout method for a large collection of geographic data entries
Proceedings of the 16th International Conference on Extending Database Technology
PhotoStand: a map query interface for a database of news photos
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
A system, called News Stand, is introduced that automatically extracts images from news articles. The system takes RSS feeds of news article and applies an online clustering algorithm so that articles belonging to the same news topic can be associated with the same cluster. Using the feature vector associated with the cluster, the images from news articles that form the cluster are extracted. First, the caption text associated with each of the images embedded in the news article is determined. This is done by analyzing the structure of the news article's HTML page. If the caption and feature vector of the cluster are found to contain keywords in common, then the image is added to an image repository. Additional meta-information are now associated with each image such as caption, cluster features, names of people in the news article, etc. A very large repository containing more than 983k images from 12 million news articles was built using this approach. This repository also contained more than 86.8 million keywords associated with the images. The key contribution of this work is that it combines clustering and natural language processing tasks to automatically create a large corpus of news images with good quality tags or meta-information so that interesting vision tasks can be performed on it.