Crawling and Extracting Process Data from the Web

Authors:
Yaling Liu;Arvin Agah
Affiliations:
Department of Electrical Engineering & Computer Science, The University of Kansas, Lawrence, USA 66045-7621;Department of Electrical Engineering & Computer Science, The University of Kansas, Lawrence, USA 66045-7621
Venue:
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Year:
2009

Citing 12
Cited 1

Discovering models of software processes from event-based data

ACM Transactions on Software Engineering and Methodology (TOSEM)
A hierarchical approach to wrapper induction

Proceedings of the third annual conference on Autonomous Agents
Software process validation: quantitatively measuring the correspondence of a process to a model

ACM Transactions on Software Engineering and Methodology (TOSEM)
Mining Process Models from Workflow Logs

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
Crawling the Hidden Web

Proceedings of the 27th International Conference on Very Large Data Bases
Workflow mining: a survey of issues and approaches

Data & Knowledge Engineering
Fully automatic wrapper generation for search engines

WWW '05 Proceedings of the 14th international conference on World Wide Web
A picture of search

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Comprehensive workflow mining

Proceedings of the 44th annual Southeast regional conference
Genetic process mining: an experimental evaluation

Data Mining and Knowledge Discovery
A genetic programming approach to business process mining

Proceedings of the 10th annual conference on Genetic and evolutionary computation
Experiences in crawling deep web in the context of local search

Proceedings of the 2nd international workshop on Geographic information retrieval

Topical crawling on the web through local site-searches

Journal of Web Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we address the design and implementation of a supporting system for process-based searches. This supporting system can efficiently crawl the Web and extract processes from obtained data. The retrieved processes can then be used in a Process-Based Search Engine (PBSE). In this work, a process is defined as a sequence of activities for achieving a goal. A PBSE uses the extracted processes to transform an original query into multiple sub-queries, and then performs keyword search for each transformed sub-query. To facilitate effective process-based searches, a large number of high quality processes are required. This paper focuses on how to efficiently and effectively build a database of processes by exploring the Web.