Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
Apache: Big Data Europe 2016
Click here to Register or for more information 
View analytic
Tuesday, November 15 • 12:00 - 12:50
Create a Hadoop Cluster and Migrate 39PB Data Plus 150000 Jobs/Day - Stuart Pook, Criteo

Sign up or log in to save this to your schedule and see who's attending!

Criteo had an Hadoop cluster with 39 PB raw stockage, 13404 CPUs, 105 TB RAM, 40 TB data imported per day and >100000 jobs per day. This cluster was critical in both stockage and compute but without backups. This talk describes: 0/ the different options considered when deciding how to protect our data and compute capacity 1/ the criteria established for the 800 new computers and comparison tests between suppliers' hardware 2/ the non-blocking network infrastructure with 10 Gb/s endpoints scalable to 5000 machines 3/ the installation and configuration, using Chef, of a cluster on new hardware 4/ the problems encountered in moving our jobs and data from the old CDH4 cluster to the new CDH5 cluster 600 km distant 5/ running and feeding with data the two clusters in parallel 6/ fail over plans 7/ operational issues 8/ the performance of the 16800 core, 200 TB RAM and 60 PB disk CDH5 cluster.

Speakers
avatar for Stuart Pook

Stuart Pook

Senior DevOps Engineer, Criteo
Stuart loves storage (100 PB at Criteo) and is part of Criteo's Lake team that runs some small and two huge Hadoop clusters. He also loves automation with Chef because configuring 2000 Hadoop nodes by hand is just too slow. He has spoken at Devoxx2016 and at Criteo Lab's NABD Conference 2016. Before discovering Hadoop he developed user interfaces for biotech companies.



Tuesday November 15, 2016 12:00 - 12:50
Giralda VI/VII

Attendees (21)