Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
WordNet: a lexical database for English
Communications of the ACM
An algorithm for suffix stripping
Readings in information retrieval
Objects, components, and frameworks with UML: the catalysis approach
Objects, components, and frameworks with UML: the catalysis approach
Building application frameworks: object-oriented foundations of framework design
Building application frameworks: object-oriented foundations of framework design
ACM Transactions on Internet Technology (TOIT)
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Component Software: Beyond Object-Oriented Programming
Component Software: Beyond Object-Oriented Programming
QuASM: a system for question answering using semi-structured data
Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
An Introduction to Software Architecture
An Introduction to Software Architecture
Design and Implementation of a High-Performance Distributed Web Crawler
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Designing Software Product Lines with UML: From Use Cases to Pattern-Based Software Architectures
Designing Software Product Lines with UML: From Use Cases to Pattern-Based Software Architectures
How blogging software reshapes the online community
Communications of the ACM - The Blogosphere
Usage patterns of collaborative tagging systems
Journal of Information Science
The Past, Present, and Future for Software Architecture
IEEE Software
Guest Editors' Introduction: Social Media and Search
IEEE Internet Computing
Survey of Improving Naive Bayes for Classification
ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
Text Extraction from the Web via Text-to-Tag Ratio
DEXA '08 Proceedings of the 2008 19th International Conference on Database and Expert Systems Application
Socialtagger - collaborative tagging for blogs in the long tail
Proceedings of the 2008 ACM workshop on Search in social media
The Information Filtering under the Web 2.0 Environment
ICIII '08 Proceedings of the 2008 International Conference on Information Management, Innovation Management and Industrial Engineering - Volume 01
IT Professional
A computational model for developing semantic web-based educational systems
Knowledge-Based Systems
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Improving Teachers' Knowledge Management with Blog Platform
ETTANDGRS '08 Proceedings of the 2008 International Workshop on Education Technology and Training & 2008 International Workshop on Geoscience and Remote Sensing - Volume 01
Study on Application Strategies of Blog in Information-based Teaching
JCAI '09 Proceedings of the 2009 International Joint Conference on Artificial Intelligence
An effective refinement strategy for KNN text classifier
Expert Systems with Applications: An International Journal
Boilerplate detection using shallow text features
Proceedings of the third ACM international conference on Web search and data mining
Levenshtein Distance: Information theory, Computer science, String (computer science), String metric, Damerau?Levenshtein distance, Spell checker, Hamming distance
Hi-index | 12.05 |
Blogs have become an important social tool. It allows the users to share their tastes, express their opinions, report news, form groups related to some subject, among others. The information obtained from the blogosphere may be used to create several applications in various fields. However, due to the growing number of blogs posted every day, as well as the dynamicity of the blogosphere, the task of extracting relevant information from the blogs has become difficult and time consuming. In this paper, we use information retrieval and extraction techniques to deal with this problem. Furthermore, as blogs have many variation points is required to provide applications that can be easily adapted. Faced with this scenario, the work proposes RetriBlog, an architecture-centered framework for the development of blog crawlers. Finally, it presents an evaluation of the proposed algorithms and three case studies.