Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
Apache: Big Data Europe 2016
Click here to Register or for more information 
View analytic
Tuesday, November 15 • 11:00 - 11:50
Crawling the Web for Common Crawl - Sebastian Nagel, Common Crawl

Sign up or log in to save this to your schedule and see who's attending!

Common Crawl is non-profit organization which regularily crawls a significant sample of the web and makes the data accessible free charge to everyone interested in running machine-scale analysis on web data. The presentation will demonstrate how to use the Common Crawl data covering data formats and tools as well as examples and derived datasets. The monthly crawls are run by Apache Nutch on Apache Hadoop. Sebastian will also share his experience from running a web-scale crawl on a small budget.

Speakers
avatar for Sebastian Nagel

Sebastian Nagel

Crawl Engineer, commoncrawl.org
Sebastian Nagel works as crawl engineer at Common Crawl, a non-profit organization that makes web data freely accessible to everyone. Prior to joining Common Crawl he implemented search and data quality solutions at Exorbyte. Sebastian is a committer and PMC of Apache Nutch, a scalable web crawler, and presented the project at ApacheCon 2014.



Tuesday November 15, 2016 11:00 - 11:50
Giralda III/IV

Attendees (10)