Apache: Big Data Europe 2016
Monday, November 14 • 15:30 - 16:20
Moven: Machine/Deep Learning Models Distribution Relying on the Maven Infrastructure - Sergio Fernandez, Redlink GmbH

Modern NLP pipelines use large models that need to be distributed across all the processing infrastructure. For example, in the SSIX project we're managing models of several GBs for the financial sector. At that scale you can't assume the models will be transferred at task submission time, neither manually. From our research, it doesn't look to be any well-accepted approach to solve this issue (e.g., TensorFlow simply uses git).

Moven (models+maven) is a proof-of-concept implemented relying on the Maven infrastructure to publish machine/deep learning models. The current implementation allows to make use of them in both Java and Python. Although we're targeting more specific needs of some concrete environments, such as Apache Spark or Apache Beam Runners API.

Further details at https://bitbucket.org/ssix-project/moven

Sergio Fernández

Software Engineer, Redlink GmbH
I'm a Software engineer specialized in innovation, with a focus on Data Architectures. My interests include Distributed Architectures, Data Integration, Linked Data and System Engineering. I've worked as software engineer and project manager in different industries, but always somehow close to science; because I strongly believe that innovation can be achieved by equally using research and engineering. Therefore all my scientific contributions... Read More →

Monday November 14, 2016 15:30 - 16:20
Nervion/Arenal II/III

