Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
Apache: Big Data Europe 2016
Click here to Register or for more information 
View analytic
Tuesday, November 15 • 13:00 - 13:50
Large Scale Open Source Data Processing Pipelines at Trivago - Clemens Valiente, Trivago

Sign up or log in to save this to your schedule and see who's attending!

trivago is processing roughly 7 billion events per day with an architecture that is entirely open source - from producing the data until its visualization in dashboards and reports. This talk will explain the idea behind the pipeline, highlight a particular business use case and share the experience and engineering challenges from two years in production. Clemens Valiente will furthermore show the different tools, frameworks and systems used, with Kafka for data ingestion, hadoop and Hive for processing and Impala for querying as the main focus. The successful implementation of this large scale data processing pipeline fundamentally transformed the way trivago was able to approach its business.

Speakers
avatar for Clemens Valiente

Clemens Valiente

Lead Data Engineer, trivago GmbH
I'm part of trivago's Data Engineering team where we are running a data processing pipeline through kafka, hadoop, impala and R processing roughly 7 billion events per day. Our hadoop cluster is central for BI dashboards, reports, ad hoc analyses, personalisation, bidding and recommendation algorithms as well as our invoicing.


Tuesday November 15, 2016 13:00 - 13:50
Giralda VI/VII

Attendees (40)