Apache: Big Data Europe 2016
Click here to Register or for more information 
Back To Schedule
Wednesday, November 16 • 13:00 - 13:50
Real Time Aggregation with Kafka, Spark Streaming and ElasticSearch, Scalable Beyond Million RPS - Dibyendu Bhattacharya, InstantLogic

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

While building a massively scalable real time pipeline to collect transaction logs from network traffic, one of the major challenges was performing aggregation on streaming data on the fly. This was needed to compute multiple metrics across various dimensions which help our customer to see near real time views of application delivery and performance. In this talk, learn how we designed our real time pipeline for doing multi-stage aggregation powered by Kafka ,Spark Streaming and ElasticSearch. At InstartLogic we used custom Spark Receiver for Kafka which is used in first stage aggregation. The second stage includes Spark Streaming driven aggregation within given batch window . Final stage aggregation involves custom ElasticSearch plugins to aggregate across Batches. I will cover this multi-stage aggregation,including optimisation across all stages which is scalable beyond million RPS

avatar for Dibyendu Bhattacharya

Dibyendu Bhattacharya

Data Platform Engineer, InstartLogic
Dibyendu Holds MS in Software Systems and B.Tech in Computer Science having experience in building applications and products leveraging distributed computing and big data technologies. Presently working as Data Platform Engineer at InstartLogic, the world's first endpoint-aware application... Read More →

Wednesday November 16, 2016 13:00 - 13:50 CET
Giralda III/IV