An architecture-centered framework for developing blog crawlers

Authors:
Rafael Ferreira;Patrick Brito;Jean Melo;Evandro Costa;Rinaldo Lima;Fred Freitas
Affiliations:
Federal University of Pernambuco, Recife, Brazil;Federal University of Alagoas, Maceió, Brazil;Federal University of Alagoas, Maceió, Brazil;Federal University of Alagoas, Maceió, Brazil;Federal University of Pernambuco, Recife, Brazil;Federal University of Pernambuco, Recife, Brazil
Venue:
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Year:
2012

Citing 13
Cited 0

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Building application frameworks: object-oriented foundations of framework design

Building application frameworks: object-oriented foundations of framework design
Machine Learning

Machine Learning
QuASM: a system for question answering using semi-structured data

Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Designing Software Product Lines with UML: From Use Cases to Pattern-Based Software Architectures

Designing Software Product Lines with UML: From Use Cases to Pattern-Based Software Architectures
Introduction

Communications of the ACM - The Blogosphere
How blogging software reshapes the online community

Communications of the ACM - The Blogosphere
Survey of Improving Naive Bayes for Classification

ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
Text Extraction from the Web via Text-to-Tag Ratio

DEXA '08 Proceedings of the 2008 19th International Conference on Database and Expert Systems Application
A computational model for developing semantic web-based educational systems

Knowledge-Based Systems
An effective refinement strategy for KNN text classifier

Expert Systems with Applications: An International Journal
Boilerplate detection using shallow text features

Proceedings of the third ACM international conference on Web search and data mining
Levenshtein Distance: Information theory, Computer science, String (computer science), String metric, Damerau?Levenshtein distance, Spell checker, Hamming distance

Levenshtein Distance: Information theory, Computer science, String (computer science), String metric, Damerau?Levenshtein distance, Spell checker, Hamming distance

Quantified Score

Hi-index	0.00

Visualization

Abstract

Blogs have become interesting tools for knowledge generation and sharing. As a matter of fact, the activity on blogs doubles every two hundred days. Numerous applications could make use of this massive daily information in order to find out interesting interpretations. However, the dynamic nature of the blogosphere hinders the manual information extraction from it, promoting the development of new automated approaches. In this paper, we propose a component-based framework to create blog crawlers based on software architecture. This framework provides useful services for the blog analysis, including preprocessing, indexing, content extraction, classification, and tag recommendation. In addition, we report a case study represented by a blog recommendation system, which helps student interactions in educational forums. This research work also aims to demonstrate the effort reduction when creating an application for blog analysis caused by the proposed framework. Finally other aspects of the developed application, such as the system evolution impact, reusability, and instantiation cost are qualitatively discussed.