Apache: Big Data Europe 2016
Click here to Register or for more information 
Back To Schedule
Tuesday, November 15 • 15:30 - 16:20
Low Latency Web Crawling on Apache Storm - Julien Nioche, DigitalPebble Ltd.

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

StormCrawler is an open source collection of resources, mostly implemented in Java, for building low-latency, scalable web crawlers on Apache Storm. After a short introduction to Apache Storm and an overview of what StormCrawler provides, we will compare it with similar projects like Apache Nutch and present several real life use cases. In particular we will see how StormCrawler can be used with ElasticSearch and Kibana for crawling and indexing web pages and also monitor the crawl itself.

avatar for Julien Nioche

Julien Nioche

Director, DigitalPebble Ltd
I run DigitalPebble Ltd, a consultancy based in Bristol, UK and specialising in open source solutions for text engineering. My expertise covers web crawling, natural language processing, machine learning and search. I am a committer on Apache Nutch and am also involved in several... Read More →

Tuesday November 15, 2016 15:30 - 16:20 CET
Giralda III/IV