Loading…
Apache: Big Data Europe 2016
Click here to Register or for more information 

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

stream [clear filter]
Monday, November 14
 

11:00 CET

Apache Gearpump Next-Gen Streaming Engine - Karol Brejna & Huafeng Wang, Intel
Stream processing goes mainstream in the Big Data world and becomes widely adopted in the industry. Despite its expanding popularity, many hard problems remain to be solved. Apache Gearpump(incubating) is a next-gen streaming engine designed to solve the hard parts in stream processing. It is good at streaming infinite out-of-order data and guarantees correctness. It helps user to easily program streaming applications, get runtime information and update dynamically. In this presentation, we will demystify how Gearpump solves the hard parts in stream processing and achieves high throughput at millisecond latency message delivery.

Speakers
avatar for Karol Brejna

Karol Brejna

Intel
Father, husband, software enthusiast. After over a dozen years of struggling with system integration, service and event oriented/driven architectures, business process management, enterprise content management, NoSQLs, ESBs, clouds joined Intel to work for Analytics and Artificial... Read More →
avatar for Huafeng Wang

Huafeng Wang

Software engineer, Vipshop
Huafeng is a software engineer from Intel's Big Data engineering group, as well as a committer of Apache Gearpump, which is an open sourced streaming process engine initiated by Intel.



Monday November 14, 2016 11:00 - 11:50 CET
Nervion/Arenal I

12:00 CET

Property-based Testing for Spark Streaming - Adrian Riesco, Universidad Complutense de Madrid
Spark Streaming is currently one of the leading frameworks in the industry for distributed stream processing. However testing Spark Streaming programs is still a challenge, due to the complications of dealing with time. In this presentation, Adrian Riesco gives an introduction to sscheck, a testing library for Spark that extends ScalaCheck with additional temporal logic operators for generators and properties, that are used to define tests for Spark Streaming as linear temporal logic formulas, resulting in tests that are high level and easy to understand.

Speakers
avatar for Adrian Riesco

Adrian Riesco

PhD Assistant Professor, Facultad de Informatica (UCM)
I currently work as PhD Assistant Professor at Universidad Complutense de Madrid, Spain. I am also a member of the research group FADOSS, and my research interests include formal methods, logic, debugging, and testing.



Monday November 14, 2016 12:00 - 12:50 CET
Nervion/Arenal I

15:30 CET

Streaming Report: Functional Comparison and Performance Evaluation - Huafeng Wang, Intel
Streaming Report (Mao Wei, Intel) - Streaming processing technology developed so fast recently. Spark Streaming, Flink, Storm, Heron, Gearpump, so many choices are available when people want to pick up the proper one to resolve their real business problems. In this presentation, Mao Wei will go thought all of these different frameworks and compare them in detail. From functional aspect, Wei will discuss underlying mechanism of these frameworks and review several function points which users may care about generally. And from practical aspect, you will see a performance test result based on HiBench, which is a cross platforms micro benchmark suite for big data open sourced by Intel BDT. The test cases include identity, repartition, state operation and window operation.

Speakers
avatar for Huafeng Wang

Huafeng Wang

Software engineer, Vipshop
Huafeng is a software engineer from Intel's Big Data engineering group, as well as a committer of Apache Gearpump, which is an open sourced streaming process engine initiated by Intel.



Monday November 14, 2016 15:30 - 16:20 CET
Nervion/Arenal I

16:30 CET

Real Time Aggregates in Apache Calcite -- Optimal Use of your Streaming Data - Atri Sharma, Microsoft
The talk shall focus on how to develop applications in real time analytics space using Apache Calcite's advanced query planning capabilities. The talk shall give a small overview of Calcite's planner and rules engine and then proceed to discuss the capabilities that can be used to develop real time applications that continuously stream data and process them. The talk shall be discussing the ongoing work in Calcite's framework and the upcoming streaming aggregation features that will be present soon. The talk shall also focus on Calcite's highly adaptable framework that allows Calcite to work with many existing projects and how your current application can take advantage of Calcite' s planning and aggregation capabilities.

Speakers
avatar for Atri Sharma

Atri Sharma

SDE-II, Microsoft
A distributed systems engineer, committer on Apache Apex, PMC Member on Apache MADLib, PPMC Member on Apache HAWQ and major contributor in PostgreSQL Project, having implemented GROUPING SETS, ROLLUP, CUBE and Ordered Set Aggregates


Monday November 14, 2016 16:30 - 17:20 CET
Nervion/Arenal I
 
Tuesday, November 15
 

12:00 CET

Building Streaming Applications with Apache Apex - Thomas Weise & Chinmay Kolhatkar, DataTorrent
Stream processing applications built on Apache Apex run on Hadoop clusters and typically power analytics use cases where availability, flexible scaling, high throughput, low latency and correctness are essential. These applications consume data from a variety of sources, including streaming sources like Apache Kafka, Kinesis or JMS, file based sources or databases. Processing results often need to be stored in external systems (sinks) for downstream consumers (pub-sub messaging, real-time visualization, Hive and other SQL databases etc.). Apex has the Malhar library with a wide range of connectors and other operators that are readily available to build applications. We will cover key characteristics like partitioning and processing guarantees, generic building blocks for new operators (write-ahead-log, incremental state saving, windowing etc.) and APIs for application specification.

Speakers
CK

Chinmay Kolhatkar

Chinmay is Software Engineer at DataTorrent Software, India and committer on the Apache Apex project.
avatar for Thomas Weise

Thomas Weise

CTO, Atrato.io
Thomas is Apache Apex PMC Chair and CTO at Atrato. Prior to founding Atrato he was Architect at DataTorrent and lead the development of Apex from the beginning of the project. Before that he was member of the Hadoop Team at Yahoo! and contributed to several of the big data ecosystem... Read More →



Tuesday November 15, 2016 12:00 - 12:50 CET
Carmona
 
Wednesday, November 16
 

11:00 CET

SQL and Streaming Systems - Atri Sharma, Microsoft
The talk shall focus on to design and build systems for stream based data and exploiting the power of SQL and relational algebra on Streaming data using Apache Apex and Apache Calcite.

Speakers
avatar for Atri Sharma

Atri Sharma

SDE-II, Microsoft
A distributed systems engineer, committer on Apache Apex, PMC Member on Apache MADLib, PPMC Member on Apache HAWQ and major contributor in PostgreSQL Project, having implemented GROUPING SETS, ROLLUP, CUBE and Ordered Set Aggregates


Wednesday November 16, 2016 11:00 - 11:50 CET
Giralda VI/VII