Information retrieval: data structures and algorithms
Information retrieval: data structures and algorithms
An algorithm for suffix stripping
Readings in information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Machine Learning
QuASM: a system for question answering using semi-structured data
Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Modern Information Retrieval
Lucene in Action (In Action series)
Lucene in Action (In Action series)
Usage patterns of collaborative tagging systems
Journal of Information Science
Introduction to Information Retrieval
Introduction to Information Retrieval
Text Extraction from the Web via Text-to-Tag Ratio
DEXA '08 Proceedings of the 2008 19th International Conference on Database and Expert Systems Application
IT Professional
Boilerplate detection using shallow text features
Proceedings of the third ACM international conference on Web search and data mining
Levenshtein Distance: Information theory, Computer science, String (computer science), String metric, Damerau?Levenshtein distance, Spell checker, Hamming distance
Hi-index | 0.00 |
Blogs are becoming an important social tool. By means of blogs, bloggers share their likes and dislikes, express their opinions, report news and form groups related to some subjects. Thus, the available information on the Blogsphere can certainly helps in the creation of interesting applications in various domains, such as e-learning, e-commerce, and e-government. However, due to the increasing number of blogs posted every day on the Web, and the dynamic nature of the Blogsphere, the tasks of collecting and extracting relevant information from blogs have become hard and time consuming. In this paper, we use techniques both from information retrieval and information extraction fields to deal with this problem. Since the blogs have many points of variability it is necessary to provide applications that can be easily adapted. We present the RetriBlog system, a framework for the development of blog crawlers dealing the variations in blogs. This paper presents the RetriBlog details and an evaluation of the proposed algorithms.