Apache: Big Data Europe 2016
Click here to Register or for more information 
Back To Schedule
Monday, November 14 • 13:00 - 13:50
Hadoop, Hive, Spark and Object Stores - Steve Loughran, Hortonworks

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Cloud deployments of Apache Hadoop are becoming more commonplace. Yet Hadoop and it's applications don't integrate that well äóîsomething which starts right down at the file IO operations.

This talk looks at how to make use of cloud object stores in Hadoop applications, including Hive and Spark. It will go from the

foundational "what's an object store?" to the practical "what should I avoid" and the timely "what's new in Hadoop?" äóî the latter covering the improved S3 support in Hadoop 2.8+.

I'll explore the details of benchmarking and improving object store IO in Hive and Spark, showing what developers can do in order to gain performance improvements in their own code äóîand equally, what they must avoid.

Finally, I'll look at ongoing work, especially "S3Guard" and what its fast and consistent file metadata operations promise.

avatar for Steve Loughran

Steve Loughran

Member of Technical Staff, Hortonworks
Steve Loughran is a developer at Hortonworks, where he works on leading-edge Hadoop applications, most recently on Apache Slider and on Apache Spark's integration with Hadoop and YARN, and Hadoop's S3A connector to Amazon S3. He's the author of Ant in Action, a member of the Apache... Read More →

Monday November 14, 2016 13:00 - 13:50 CET
Nervion/Arenal II/III