Loading…
Apache: Big Data Europe 2016
Click here to Register or for more information 
Tuesday, November 15 • 13:00 - 13:50
Massively Parallel Data Warehousing in the Hadoop Stack - Gregory Chase & Roman Shaposhnik, Pivotal

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Hadoop has been touted as a replacement for data warehouses.  In practice Hadoop has had success offloading ETL/ELT workloads, but still has gaps serving requirements for operational analytics.

Apache Bigtop now includes Greenplum Database in deployment of big data solutions. Greenplum Database is, an open source massively parallel data warehouse  based on PostgreSQL, and is an excellent addition to the Hadoop ecosystem.

In this session we'll cover:
  • Introduction to Greenplum 
  • Bigtop Support for Greenplum
  • External tables in Hadoop by Greenplum
  • Parallel reads and writes to Hadoop by Greenplum
  • Running advanced analytics on structured and unstructured data in both Hadoop and Greenplum via Apache MADlib (incubating)
  • Geospatial and Machine Learning in Greenplum based on HDFS data
  • Storing data from a data lake in Greenplum for high throughput analytical queries

Speakers
GC

Gregory Chase

Director Product Marketing, PagerDuty
Greg Chase is Director of Product Marketing for PagerDuty Automation and Rundeck. He's been in marketing and engineering in software companies for too many decades, evangelizing and building automation platforms, developer tools and data engineering frameworks. Before PagerDuty, Greg... Read More →
avatar for Roman Shaposhnik

Roman Shaposhnik

Director of Open Source, Linux Foundation
Apache Software Foundation and Data, oh but also unikernels


Tuesday November 15, 2016 13:00 - 13:50 CET
Nervion/Arenal I