Loading…
Apache: Big Data Europe 2016
Click here to Register or for more information 
Back To Schedule
Tuesday, November 15 • 13:00 - 13:50
Massively Parallel Data Warehousing in the Hadoop Stack - Gregory Chase & Roman Shaposhnik, Pivotal

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Hadoop has been touted as a replacement for data warehouses.  In practice Hadoop has had success offloading ETL/ELT workloads, but still has gaps serving requirements for operational analytics.

Apache Bigtop now includes Greenplum Database in deployment of big data solutions. Greenplum Database is, an open source massively parallel data warehouse  based on PostgreSQL, and is an excellent addition to the Hadoop ecosystem.

In this session we'll cover:
  • Introduction to Greenplum 
  • Bigtop Support for Greenplum
  • External tables in Hadoop by Greenplum
  • Parallel reads and writes to Hadoop by Greenplum
  • Running advanced analytics on structured and unstructured data in both Hadoop and Greenplum via Apache MADlib (incubating)
  • Geospatial and Machine Learning in Greenplum based on HDFS data
  • Storing data from a data lake in Greenplum for high throughput analytical queries

Speakers
GC

Gregory Chase

Director of Big Data Communities, Pivotal Software
Greg Chase is an enterprise software marketing executive more than 20 years experience in marketing, sales, and engineering with software companies. Most recently Greg has been passionately advocating for innovation and transformation of business and IT practices through big data... Read More →
avatar for Roman Shaposhnik

Roman Shaposhnik

Director of Open Source, Linux Foundation
Apache Software Foundation and Data, oh but also unikernels


Tuesday November 15, 2016 13:00 - 13:50 CET
Nervion/Arenal I