Apache: Big Data Europe 2016
Click here to Register or for more information 
Back To Schedule
Monday, November 14 • 15:30 - 16:20
Moven: Machine/Deep Learning Models Distribution Relying on the Maven Infrastructure - Sergio Fernandez, Redlink GmbH

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Modern NLP pipelines use large models that need to be distributed across all the processing infrastructure. For example, in the SSIX project we're managing models of several GBs for the financial sector. At that scale you can't assume the models will be transferred at task submission time, neither manually. From our research, it doesn't look to be any well-accepted approach to solve this issue (e.g., TensorFlow simply uses git).

Moven (models+maven) is a proof-of-concept implemented relying on the Maven infrastructure to publish machine/deep learning models. The current implementation allows to make use of them in both Java and Python. Although we're targeting more specific needs of some concrete environments, such as Apache Spark or Apache Beam Runners API.

Further details at https://bitbucket.org/ssix-project/moven

avatar for Sergio Fernández

Sergio Fernández

Software Engineer, Redlink GmbH
I'm a Software engineer specialized in innovation, with a focus on Data Architectures. My interests include Distributed Architectures, Data Integration, Linked Data and System Engineering. I've worked as software engineer and project manager in different industries, but always somehow... Read More →

Monday November 14, 2016 15:30 - 16:20 CET
Nervion/Arenal II/III