A model for fast web mining prototyping

Authors:
Álvaro Pereira;Ricardo Baeza-Yates;Nivio Ziviani;Jesús Bisbal
Affiliations:
Federal Univ. of Minas Gerais, Belo Horizonte, Brazil;Yahoo! Research & Barcelona Media, Barcelona, Spain;Federal Univ. of Minas Gerais, Belo Horizonte, Brazil;Universitat Pompeu Fabra, Barcelona, Spain
Venue:
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Year:
2009

Citing 12
Cited 1

A query language for a Web-site management system

ACM SIGMOD Record
Web mining research: a survey

ACM SIGKDD Explorations Newsletter
Foundations of Databases: The Logical Level

Foundations of Databases: The Logical Level
Mining the Web: Discovering Knowledge from HyperText Data

Mining the Web: Discovering Knowledge from HyperText Data
WebOQL: Restructuring Documents, Databases, and Webs

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Web Warehousing: An Algebra for Web Information

ADL '98 Proceedings of the Advances in Digital Libraries Conference
WEBVIEW: an SQL extension for joining corporate data to data derived from the web

Communications of the ACM - Special issue: RFID
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications)

Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications)
Complex queries over web repositories

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Genealogical trees on the web: a search engine user perspective

Proceedings of the 17th international conference on World Wide Web

A model for automatic generation of multi-partite graphs from arbitrary data

WAIM'10 Proceedings of the 2010 international conference on Web-age information management

Quantified Score

Hi-index	0.01

Visualization

Abstract

Web mining is a computation intensive task, even after the mining tool itself has been developed. Most mining software are developed ad-hoc and usually are not scalable nor reused for other mining tasks. The objective of this paper is to present a model for fast Web mining prototyping, referred to as WIM -- Web Information Mining. The underlying conceptual model of WIM provides its users with a level of abstraction appropriate for prototyping and experimentation throughout the Web data mining task. Abstracting from the idiosyncrasies of raw Web data representations facilitates the inherently iterative mining process. We present the WIM conceptual model, its associated algebra, and the WIM tool software architecture, which implements the WIM model. We also illustrate how the model can be applied to real Web data mining tasks. The experimentation of WIM in real use cases has shown to significantly facilitate Web mining prototyping.