Apache: Big Data Europe 2016
Click here to Register or for more information 
Back To Schedule
Wednesday, November 16 • 13:00 - 13:50
Parquet Format in Practice & Detail - Uwe L. Korn, Blue Yonder

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Apache Parquet is among the most commonly used column-oriented data formats in the big data processing space. It leverages various techniques to store data in a CPU- and I/O-efficient way. Furthermore, it has the capabilities to push-down analytical queries on the data to the I/O layer to avoid the loading of nonrelevant data chunks. With various Java and a C++ implementation, Parquet is also the perfect choice to exchange data between different technology stacks.

As part of this talk, a general introduction to the format and its techniques will be given. Their benefits and some of the inner workings will be explained to give a better understanding how Parquet achieves its performance. At the end, benchmarks comparing the new C++ & Python implementation with other formats will be shown.

avatar for Uwe L. Korn

Uwe L. Korn

Data Scientist, Blue Yonder GmbH
Uwe Korn is a Data Scientist at the German RetailTec company Blue Yonder. His expertise is on building architectures for machine learning services that are scalably usable for multiple customers aiming at high service availability as well as rapid prototyping of solutions to evaluate... Read More →

Wednesday November 16, 2016 13:00 - 13:50 CET
Nervion/Arenal I