Apache: Big Data Europe 2016
Click here to Register or for more information 
Back To Schedule
Tuesday, November 15 • 12:00 - 12:50
Using Apache Spark for Generating ElasticSearch Indices Offline - Andrej Babolcai, ESET

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Making historical data available for searching can be a challenge, especially if you have a lot of it. Indexing data to a live cluster can degrade search performance and having a spare cluster where you index your data can be expensive. In this talk we present the approaches we tried and describe an approach to create ElasticSearch indices offline using Apache Spark. When created, these indices are then stored as snapshots in HDFS and can then be restored to a running ElasticSearch cluster. Snapshots in HDFS also work as a backup, ready to restore solution in case of an error.


Andrej Babolcai

Software Engineer, Eset
Software Engineer at ESET Currently working with Big Data technologies at ESET. Responsible for collecting and storing and making data available for end users. Previously worked at Honeywell. Speaking experience: Caro workshop 2016 (http://2016.caro.org/)

Tuesday November 15, 2016 12:00 - 12:50 CET
Giralda I/II