Loading…
Apache: Big Data Europe 2016
Click here to Register or for more information 
solr [clear filter]
Monday, November 14
 

12:00 CET

Managing Deeply Nested Documents in Apache Solr - Anshum Gupta, IBM Watson
Apache Solr in the recent past started supporting deeply-nested documents. Solr can now be used to perform search and faceting on documents such as nested email threads, comments and replies on social media, enriched and annotated documents etc. without having to flatten them before ingestion.

Anshum Gupta would discuss pre-processing of data so that it can be indexed in Solr, making it possible to perform complex search and statistical aggregation on top of it. He would also cover query formation for sample use cases of nested data and multiple options and features that Solr provides for faceting or aggregation of such documents.

By the end of this talk, Solr users would have a better understanding of both, how to work with features that Solr provides to find answers to interesting questions from deeply nested documents as well as work-arounds for the missing pieces.

Speakers
avatar for Anshum Gupta

Anshum Gupta

Sr. Software Engineer, IBM Watson
Anshum Gupta is a Lucene/Solr committer and PMC member with over 10 years of experience with search. He is a part of the search team at IBM Watson, where he works on extending the limits and improving SolrCloud. Prior to this, he was a part of the open source team at Lucidworks and... Read More →


Monday November 14, 2016 12:00 - 12:50 CET
Giralda V

13:00 CET

Fast & Scalable Email System with Apache Solr - Strategies, Tradeoffs and Optimizations - Arnon Yogev, IBM Research
Email interaction has its unique characteristics and is different than traditional web search (for example in that users search their own private mailboxes and are often interested in recent emails rather than the archive).

Taking advantage of these characteristics, we were able to optimize our infrastructure in terms of indexing strategy and query optimization and achieve a significant gain in scalability and performance.

Arnon will present the various tradeoffs that were explored, including multi-tiered indexes, sorted indexes, query optimizations and more.

Arnon will then present the benchmark results that stress the importance of correctly designing a Solr infrastructure and tailoring it to oneäó»s specific use case.

Speakers
avatar for Arnon Yogev

Arnon Yogev

Software Developer & Researcher, IBM Research
Arnon is a software engineer in IBM Research, part of the Social Analytics & Technologies team, Big Data and Cognitive Analytics department. Arnon earned his MBA degree and his B.Sc in Computer Science from the Technion. Being part of the Social Analytics & Technologies team, Arnon's... Read More →


Monday November 14, 2016 13:00 - 13:50 CET
Giralda V

15:30 CET

Large Scale SolrCloud Cluster Management via APIs - Anshum Gupta, IBM Watson
Apache Solr is widely used by organizations to power their search platforms and often support multiple users. A lot of cluster management APIs were introduced over the last few releases, allowing the users to to manage operations ranging from replica placement to forcing leader elections via API calls. At the end of this talk, intermediate Solr users would understand what's available, and when can they avoid direct interference with the system, leading to more stable clusters and lower chances of nodes going down. The attendees would also be much better equipped to build their own SolrCloud cluster management tools. I would also talk about when not to use these APIs and what's planned in the near future to handle specific operational use cases.

Speakers
avatar for Anshum Gupta

Anshum Gupta

Sr. Software Engineer, IBM Watson
Anshum Gupta is a Lucene/Solr committer and PMC member with over 10 years of experience with search. He is a part of the search team at IBM Watson, where he works on extending the limits and improving SolrCloud. Prior to this, he was a part of the open source team at Lucidworks and... Read More →


Monday November 14, 2016 15:30 - 16:20 CET
Giralda V

16:30 CET

ETL Pipelines with OODT, Solr and Stuff - Tom Barber, Meteorite Consulting
Discover a number of Apache projects you may not have heard of and how they can help you process both Clinical and non Clinical data. Apache OODT developed by NASA allows users to ingest and store files and metadata along with process workflows. OODT along with CTakes allows us to extract clinical information from files and then process them and allow end users access to the extracted data.



We can then take these sources and manipulate them further creating a highly flexible ETL pipeline offering reliability and scalability. Backed by Apache SOLR users can then interrogate the data via web interfaces and instigate further post processing and investigation.



Of course you may not have a clinical use case, but the platforms can be repurposed and will allow you to go away and build your own, scalable data pipeline for processing and integstion.

Speakers
avatar for Tom Barber

Tom Barber

Technical Director, Spicule LTD
Tom Barber is the director of Meteorite BI and Spicule BI. A member of the Apache Software Foundation and regular speaker at ApacheCon, Tom has a passion for simplifying technology. The creator of Saiku Analytics and open source stalwart, when not working for NASA, Tom currently deals... Read More →


Monday November 14, 2016 16:30 - 17:20 CET
Giralda V
 
Tuesday, November 15
 

15:30 CET

Building and Running a Solr-as-a-Service for IBM Watson - Shai Erera, IBM
Running a managed Solr service brings fun challenges with it, to both the users and the service itself. Users typically do not have access to all components of the Solr system (e.g. the ZK ensemble, the actual nodes that Solr runs on etc.). On the other hand the service must ensure high-availability at all times, and handle what is often user-driven tasks such as version upgrades, taking nodes offline for maintenance and more.



In this talk I will describe how we tackle these challenges to build a managed Solr service on the cloud, which currently hosts few thousands of Solr clusters. I will focus on the infrastructure that we chose to run the Solr clusters on, as well how we ensure high-availability, cluster balancing and version upgrades.

Speakers
avatar for Shai Erera

Shai Erera

STSM, Social Analytics & Technologies, IBM
Shai Erera is a Researcher at IBM Research, Haifa, Israel. Shai earned his M.Sc in Computer Science from the University of Haifa in 2007. Shai’s work experience includes the development of search-based systems over Lucene and Solr and he is also a Lucene/Solr committer.


Tuesday November 15, 2016 15:30 - 16:20 CET
Giralda VI/VII
 
Filter sessions
Apply filters to sessions.