Apache: Big Data Europe 2016
Click here to Register or for more information 
Back To Schedule
Wednesday, November 16 • 12:00 - 12:50
Apache Kudu: A Distributed, Columnar Data Store for Fast Analytics - Mike Percy, Cloudera

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

The Hadoop ecosystem has recently made great strides in its real-time access capabilities, narrowing the gap compared to traditional database technologies. With systems like Apache Spark, analysts can now run complex queries or jobs over large datasets within a matter of seconds. With systems like Apache HBase, applications can achieve millisecond-scale random access to arbitrarily-sized datasets. However, gaps remain when scans and random access are both required.

This talk will investigate the trade-offs between real-time random access and fast analytic performance from the perspective of storage engine internals. It will also describe Apache Kudu, the new addition to the open source Hadoop ecosystem with out-of-the-box integration with Apache Spark, that fills the gap described above to provide a new option to achieve fast scans and fast random access from a single API.

avatar for Mike Percy

Mike Percy

Software Engineer, Cloudera
Mike Percy is a software engineer at Cloudera and a PMC member on Apache Kudu, an open source distributed column store for the Hadoop ecosystem. He is also a PMC member on Apache Flume. Prior to joining Cloudera, Mike worked at Yahoo! building machine learning infrastructure for Big... Read More →

Wednesday November 16, 2016 12:00 - 12:50 CET
Nervion/Arenal I