Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
Apache: Big Data Europe 2016
Click here to Register or for more information 
View analytic
Wednesday, November 16 • 12:00 - 12:50
Apache Kudu: A Distributed, Columnar Data Store for Fast Analytics - Mike Percy, Cloudera

Sign up or log in to save this to your schedule and see who's attending!

The Hadoop ecosystem has recently made great strides in its real-time access capabilities, narrowing the gap compared to traditional database technologies. With systems like Apache Spark, analysts can now run complex queries or jobs over large datasets within a matter of seconds. With systems like Apache HBase, applications can achieve millisecond-scale random access to arbitrarily-sized datasets. However, gaps remain when scans and random access are both required.



This talk will investigate the trade-offs between real-time random access and fast analytic performance from the perspective of storage engine internals. It will also describe Apache Kudu, the new addition to the open source Hadoop ecosystem with out-of-the-box integration with Apache Spark, that fills the gap described above to provide a new option to achieve fast scans and fast random access from a single API.

Speakers
avatar for Mike Percy

Mike Percy

Software Engineer, Cloudera
Mike Percy is a software engineer at Cloudera and a PMC member on Apache Kudu, an open source distributed column store for the Hadoop ecosystem. He is also a PMC member on Apache Flume. Prior to joining Cloudera, Mike worked at Yahoo! building machine learning infrastructure for Big Data. Mike holds a BSCS from UC Santa Cruz and an MSCS from Stanford.



Wednesday November 16, 2016 12:00 - 12:50
Nervion/Arenal I

Attendees (39)