A novel semantic web browser for user centric information retrieval: PERSON
Expert Systems with Applications: An International Journal
Proceedings of the 8th International Conference on Semantic Systems
Semantic metadata in the news production process: achievements and challenges
Proceeding of the 16th International Academic MindTrek Conference
PARMA: a parallel randomized algorithm for approximate association rules mining in MapReduce
Proceedings of the 21st ACM international conference on Information and knowledge management
A decentralized approach for mining event correlations in distributed system monitoring
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
The Semantic Web has made possible the use of the Internet to extract useful content, a task that could necessitate an infrastructure across the Web. With Hadoop, a free implementation of the MapReduce programming paradigm created by Google, we can treat these data reliably over hundreds of servers. This article describes how the Apriori algorithm was adapted to MapReduce in the search for relations between entities to deal with thousands of Web pages coming from RSS feeds daily. First, every feed is looked up five times per day and each entry is registered in a database with MapReduce. Second, the entries are read and their content sent to the Web service OpenCalais for the detection of named entities. For each Web page, the set of all itemsets found is generated and stored in the database. Third, all generated sets, from first to last, are counted and their support is registered. Finally, various analytical tasks are executed to present the relationships found. Our tests show that the third step, executed over 3,000,000 sets, was 4.5 times faster using five servers than using a single machine. This approach allows us to easily and automatically distribute treatments on as many machines as are available, and be able to process datasets that one server, even a very powerful one, would not be able to manage alone. We believe that this work is a step forward in processing semantic Web data efficiently and effectively.