Apache Big Data Europe 2016: Full Schedule

Apache: Big Data Europe 2016
Click here to Register or for more information

11:00 CET

Geospatial Track: Apache SIS for Earth Observation and Beyond - Martin Desruisseaux, Geomatys

Apache SIS is a library for helping developers to create their own geospatial application. SIS follows closely international standards published jointly by the Open Geospatial Consortium (OGC) and the International Organization for Standardization (ISO). In this talk we will show how SIS provides a unified metadata model based on ISO 19115 standard for summarizing the content of some file formats used for earth observation: GeoTIFF, NetCDF, Landsat 8 and MODIS. We will show how to get the Coordinate Reference System (CRS) from those file formats or from other sources like Well Known Text (WKT) 2 or registry maintained by authorities, and how to use those CRS for coordinate operations. We will present new issues to take in account when applying those tools to extra-terrestrial bodies like Mars or asteroids. Finally we will present next developments proposed for Apache SIS.

Speakers

Martin Desruisseaux

Developer, Geomatys

I hold a Ph.D thesis in oceanography, but have continuously developed tools for helping analysis work. I used C/C++ before to switch to Java in 1997. I develop geospatial libraries since that time, initially as a personal project then as a GeoTools contributor until 2008. I'm now... Read More →

Monday November 14, 2016 11:00 - 11:50 CET
Carmona

Geo

12:00 CET

Geospatial Track: Geospatial Big Data: Software Architectures and the Role of APIs in Standardized Environments - Ingo Simonis, Open Geospatial Consortium (OGC)

A number of technologies have evolved around big data, in particular products from the Apache community such as Hadoop, Storm, Spark, Hive, or Cassandra. The geospatial community has developed a range of standards to handle geospatial data in an efficient way. Most of these standards are produced by the Open Geospatial Consortium (OGC) and implemented in the form of domain-agnostic data models and Web services. With the emerging demand for streamlined APIs, new questions emerge how access to Big Data in the geospatial community can be handled most efficiently, how existing standards serve these new demands and implementation realities with distributed Big Data repositories operated e.g. by the various space agencies. This presentation should stimulate the discussion of geospatial Big Data handling in standardized environments and explore the role of products from the Apache community.

Speakers

Ingo Simonis

Director Innovation Programs & Science, OGC

Dr. Ingo Simonis is director of interoperability programs and science at the Open Geospatial Consortium (OGC), an international consortium of more than 525 companies, government agencies, research organizations, and universities participating in a consensus process to develop publicly... Read More →

Monday November 14, 2016 12:00 - 12:50 CET
Carmona

Geo

13:00 CET

Geospatial Track: Crowd Learning for Indoor Navigation - Thomas Burgess, indoo.rs GmbH

indoo.rs enables location based services for indoor applications. With indoo.rs, developers can add new features to their products, including having locations trigger events, track assets, showing closest routes to other places. For this, we use WiFi/beacon radio infrastructure, mobile devices and our cloud which produce lots of geospatial time series data. The real-time indoor navigation fuses independent movement from custom 9D sensor fusion and position estimates obtained by comparing current signal readings to a reference map. This talk will discuss how we create and maintain these maps in our big data machine learning system which leverages crowd data through Kafka and Spark to run SLAM and context aware algorithms to create high quality trajectories. In addition to use in reference maps, these trajectories provide an additional input for our interactive analytics.

Speakers

Thomas Burgess

Director of research, indoo.rs GmbH

Thomas is the CRO of indoo.rs and leads its research efforts since 2012. Earlier, he did his PhD in particle physics at Stockholm University for the AMANDA/IceCube neutrino telescopes, and worked as a postdoctoral researcher at University of Bergen for the ATLAS experiment at the... Read More →

Monday November 14, 2016 13:00 - 13:50 CET
Carmona

Geo

15:30 CET

Processing Planetary Sized Datasets - Tim Park, Microsoft

In my group at Microsoft, we have worked with the United Nations, Guide Dogs for the Blind in the UK, several automotive companies, and Strí_er on a number of projects involving high scale geospatial data.

In this talk, I'll share some of the best practices and patterns that have come out of those experiences: best practices for storing and indexing geospatial data at scale, incremental ingestion and slice processing of the data, and efficiently building and presenting progressive levels of detail.

The audience will walk away with an understanding of how to efficiently summarize data over a geographic area, general methods for doing ingestion with Apache Kafka (or other event ingestion systems), and incremental updates to large scale datasets with Apache Spark, and best practices around visualizing this data on the frontend.

Speakers

Tim Park

Software Engineer, Microsoft

Tim is a Principal Software Engineer at Microsoft and works with customers and partners to help them utilize open source platforms on Microsoftâ€™s Azure cloud. He has a particular focus on big data, and, in particular, processing large scale geospatial data. His project experience... Read More →

Monday November 14, 2016 15:30 - 16:20 CET
Carmona

geo

16:30 CET

Myriad, Spark, Cassandra, and Friends - Big Data Powered by Mesos - Jörg Schad, Mesosphere

Processing Big Data necessitates large compute cluster. And large clusters -especially when running multiple Big Data systems- require some kind of cluster manager and cluster scheduler.

In this talk, we will give an overview how Apache Mesos and DC/OS help solve the problems of large scale clusters and then take a look at the current state of the Big Data ecosystem built on top of this foundation.

We will discuss differences between Apache Yarn and Apache Mesos and why -thanks to Apache Myriad- they are not exclusive choices.

Furthermore, we will look at the growing Big Data ecosystem on top of Apache Mesos and DC/OS including, for example, Apache Spark, Apache Cassandra, and Apache Kafka.

Finally, we will also provide some insights into future developments, both for the foundation (i.e., Apache Mesos and DC/OS) as well as the Big Data ecosystem on top.

Speakers

Jörg Schad

CTO, ArangoDB

Jörg Schad is the CTO at ArangoDB. In a previous life, he has worked on or built machine learning pipelines in healthcare, distributed systems, including early Kubernetes code at Mesosphere, and in-memory databases. He received his Ph.D. for research about distributed databases and... Read More →

Monday November 14, 2016 16:30 - 17:20 CET
Carmona

Spark

17:30 CET

BoF: Open Source Beyond Software - Alexander Bezzubov, NF Labs

Open source software in general, and Apache Software Foundation in particular is a great example of how principles below have changed the whole industry:

Permissive licensing
Open governance
Distributed networks of collaborators
Work, guided by one's desire

Same principles begin to be are applied to other aspects of life by different communities around the globe

Open hardware
Makers
Publishing
DIYbio
Housing

As well as some more traditional cultural phenomenon similar in spirit:

Shanzhai (Chinese: 山寨)
Kibbutz (Hebrew: קִבּוּץ / קיבוץ)

Let's explore existing initiative and see where it can lead us together!

Speakers

Alexander Bezzubov

Software Engineer, NFLabs

Alexander Bezzubov is Apache Zeppelin contributor, PMC member and software engineer at NFLabs. Previous speaking experience includes Apache BigData NA 2016 in Vancouver, FOSSASIA 2016 in Singapore, Apache BigData EU 2015 in Budapest.

Monday November 14, 2016 17:30 - 18:30 CET
Carmona

BoF

11:00 CET

Apache HBase: Overview and Use Cases - Apekshit Sharma, Cloudera

NoSQL databases are critical in building Big Data applications. Apache HBase, one of the most popular NoSQL databases, is used by Facebook, Apple, eBay and hundreds of other enterprises to store, analyze and profit from their petabyte-scale volume of data. This talk will discuss

- motivation behind NoSql databases

- basic architecture of a popular NoSql system, Apache HBase

- some commonly seen big data usage patterns in industry, and when & how to use Apache HBase (or other better suited NoSQL database).

Speakers

Apekshit Sharma

Software Engineer, Cloudera Inc

Apekshit Sharma (Appy) is a Software Engineer at Cloudera, and contributor of Apache HBase. Prior, he was at Google building backend infrastructure using Map-Reduce, Bigtable & Millwheel. He earned his B.Tech in Computer Science from Indian Institute of Technology, Bombay. Currently... Read More →

Tuesday November 15, 2016 11:00 - 11:50 CET
Carmona

Intro

12:00 CET

Building Streaming Applications with Apache Apex - Thomas Weise & Chinmay Kolhatkar, DataTorrent

Stream processing applications built on Apache Apex run on Hadoop clusters and typically power analytics use cases where availability, flexible scaling, high throughput, low latency and correctness are essential. These applications consume data from a variety of sources, including streaming sources like Apache Kafka, Kinesis or JMS, file based sources or databases. Processing results often need to be stored in external systems (sinks) for downstream consumers (pub-sub messaging, real-time visualization, Hive and other SQL databases etc.). Apex has the Malhar library with a wide range of connectors and other operators that are readily available to build applications. We will cover key characteristics like partitioning and processing guarantees, generic building blocks for new operators (write-ahead-log, incremental state saving, windowing etc.) and APIs for application specification.

Speakers

Chinmay Kolhatkar

Chinmay is Software Engineer at DataTorrent Software, India and committer on the Apache Apex project.

Thomas Weise

CTO, Atrato.io

Thomas is Apache Apex PMC Chair and CTO at Atrato. Prior to founding Atrato he was Architect at DataTorrent and lead the development of Apex from the beginning of the project. Before that he was member of the Hadoop Team at Yahoo! and contributed to several of the big data ecosystem... Read More →

Apache BigData Spain Building Streaming Application.pptx pdf

Tuesday November 15, 2016 12:00 - 12:50 CET
Carmona

Stream

13:00 CET

Scalable Private Information Retrieval: Introducing Apache Pirk (incubating) - Ellison Anne Williams, Creator of Apache Pirk

Querying information over TBs of data where no one can see what you query or the responses obtained? It sounds like science fiction, but it is actually the science of Private Information Retrieval (PIR). This talk will introduce Apache Pirk - a new incubating Apache project designed to provide a framework for scalable, distributed PIR. We will discuss the motivation for Apache Pirk, its distributed implementations in platforms such as Spark and Storm, itäó»s current algorithms, the power of homomorphic encryption, and take a look at the path forward.

Speakers

Ellison Anne Williams

Ellison Anne Williams is a creator and PMC member of Apache Pirk, a pure mathematician by training, and a practical computer scientist in real life. Her passion is doing cool stuff with massive amounts of data.

Tuesday November 15, 2016 13:00 - 13:50 CET
Carmona

Intro

15:30 CET

Your Datascience Journey with Apache Zeppelin - Moon soo Lee, Anthony Corbacho & Jongyoul Lee, NFLabs

Take a journey together to see how Apache Zeppelin started, how Apache Zeppelin helps your data science lifecycle, how Apache Zeppelin became popular TLP project. We'll also see how community focus has been changed, from basic notebook feature, spark integration to advanced features like multi-tenancy. Lee moon soo will explain value of Apache Zeppelin with some key use case scenario demo. Also we'll see eco-system around it - How various projects and companies are using Apache Zeppelin in their product and services in many different ways.

Finally, we'll discuss about Apache Zeppelin's future roadmap with some challenges that community have.

Speakers

Anthony CORBACHO

Jongyoul Lee

Software Development Engineer, ZEPL

I'm a member of PMC of Apache Zeppelin and works at ZEPL. In Apache Zeppelin, I focus on stabilizing Apache Zeppelin to be used in production level, developing some enterprise features and enhancing Apache Spark/JDBC features. Personally, I'm really interested in distributed and fault-tolerant... Read More →

Moon

cto, NFLabs

Moon soo Lee is a creator for Apache Zeppelin and a Co-Founder, CTO at NFLabs. For past few years he has been working on bootstrapping Zeppelin project and itâ€™s community. His recent focus is growing Zeppelin community and getting adoptions.

Tuesday November 15, 2016 15:30 - 16:20 CET
Carmona

Intro

16:30 CET

Avro: Travel Across (r)evolution - Arek Osinski & Darek Eliasz, Allegro Group

In those days, we are generating enormous amount of data. Biggest challenge is hidden in transformation of raw data to knowledge. We would like to take you on a short travel and show our approach for conversion from non-structured world of microservices to the world with Avro schemas inside our data pipelines.

Avro is well known format for storing and online processing information of any kind. What are key features of this format? What are the common problems? Where you can meet pitfails? How this influences our Big Data ecosystem?

Whole story will be covered by examples from real life implementation.

Speakers

Dariusz Eliasz

Senior Data Platform Engineer, Grupa Allegro Sp. z o.o.

Mainly interested in: - big data platform architecture - data governance Enthusiast of scalable distributed solutions, processing large amounts of data and continuous improvement.

Arek Osinski

Senior Data Platform Engineer, Allegro

Works in Allegro Group as a senior data engineer. From the beginning he is related with building and maintaining of Hadoop infrastructure within Allegro Group. Previously he was responsible for maintaining large scale database systems. Passionate about new technologies and cyclin... Read More →

Tuesday November 15, 2016 16:30 - 17:20 CET
Carmona

Intro

11:00 CET

Why is My Hadoop Cluster Slow? - Steve Loughran, Hortonworks

Apache Hadoop is used to run jobs that execute tasks over multiple machines with complex dependencies between tasks. And at scale, there can be 10äó»s to 1000äó»s of tasks running over 100's to 1000äó»s of machines which increases the challenge of making sense of their performance. Pipelines of such jobs that logically run a business workflow add another level of complexity. No wonder that the question of why Hadoop jobs run slower than expected remains a perennial source of grief for developers. In this talk, we will draw on our experience in debugging and analyzing Hadoop jobs to describe some methodical approaches to this and present current and new tracing and tooling ideas that can help semi-automate parts of this difficult problem.

Speakers

Steve Loughran

Member of Technical Staff, Hortonworks

Steve Loughran is a developer at Hortonworks, where he works on leading-edge Hadoop applications, most recently on Apache Slider and on Apache Spark's integration with Hadoop and YARN, and Hadoop's S3A connector to Amazon S3. He's the author of Ant in Action, a member of the Apache... Read More →

Wednesday November 16, 2016 11:00 - 11:50 CET
Carmona

12:00 CET

Get in Control of Your Workflows with Apache Airflow - Christian Trebing, Blue Yonder

Whenever you work with data, sooner or later you stumble across the definition of your workflows. At what point should you process your customeräó»s data? What subsequent steps are necessary? And what went wrong with your data processing last Saturday night?

At Blue Yonder we use Apache Airflow to solve these problems. It can be extended with new functionality by developing plugins in Python. With Airflow, we define workflows as directed acyclic graphs and get a shiny UI for free. Airflow comes with some task operators which can be used out of the box to complete certain tasks. For more specific cases, you can also develop new operators in your plugin.

This talk will explain the concepts behind Airflow, demonstrating how to define your own workflows and how to extend the functionality. Youäó»ll also get to hea about our experiences using this tool in real-world scenarios.

Speakers

Christian Trebing

Senior Software Engineer

Christian is a Software Developer from Karlsruhe, Germany. He has studied Computer Science at TU Darmstadt. Currently he is working on big data applications at Blue Yonder, enjoying the challenges at the intersection between software engineering and data science.

Wednesday November 16, 2016 12:00 - 12:50 CET
Carmona

13:00 CET

Highly Scalable Big Data Analytics with Apache Drill - Tom Barber, Meteorite Consulting

Big Data analytics is becoming more and more popular as the query response times improve. We'll look at building and deploying a fully operational and highly scalable Apache Bigtop based Big Data Analytics platform with no code.

In this talk we'll utilise the power of the open source Juju application modelling platform to deploy our software and configure it for us. We'll also discuss deployment options, scalability and resilliency allowing users to get the most from the data.

Speakers

Tom Barber

Technical Director, Spicule LTD

Tom Barber is the director of Meteorite BI and Spicule BI. A member of the Apache Software Foundation and regular speaker at ApacheCon, Tom has a passion for simplifying technology. The creator of Saiku Analytics and open source stalwart, when not working for NASA, Tom currently deals... Read More →

Wednesday November 16, 2016 13:00 - 13:50 CET
Carmona