Apache: Big Data Europe 2016
Click here to Register or for more information 
Tuesday, November 15 • 13:00 - 13:50
Large Scale Open Source Data Processing Pipelines at Trivago - Clemens Valiente, Trivago

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

trivago is processing roughly 7 billion events per day with an architecture that is entirely open source - from producing the data until its visualization in dashboards and reports. This talk will explain the idea behind the pipeline, highlight a particular business use case and share the experience and engineering challenges from two years in production. Clemens Valiente will furthermore show the different tools, frameworks and systems used, with Kafka for data ingestion, hadoop and Hive for processing and Impala for querying as the main focus. The successful implementation of this large scale data processing pipeline fundamentally transformed the way trivago was able to approach its business.

avatar for Clemens Valiente

Clemens Valiente

Lead Data Engineer, trivago GmbH
I'm part of trivago's Data Engineering team where we are running a data processing pipeline through kafka, hadoop, impala and R processing roughly 7 billion events per day. Our hadoop cluster is central for BI dashboards, reports, ad hoc analyses, personalisation, bidding and recommendation... Read More →

Tuesday November 15, 2016 13:00 - 13:50 CET
Giralda VI/VII