This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
Apache: Big Data Europe 2016
Click here to Register or for more information 
View analytic
Wednesday, November 16 • 13:00 - 13:50
What's With the 1s and 0s? Making Sense of Binary Data at Scale with Tika and Friends - Nick Burch, Quanticate

Sign up or log in to save this to your schedule and see who's attending!

Large amounts of unknown data seeks helpful tools to identify itself and generate content!

With one or two files, you can take time to manually identify them, and get out their contents. With thousands of files, or the internet's worth, this won't scale, even with mechanical turks! Luckily, there are open source tools and programs out there to help.

First we'll look at how we can work out what a given blob of 1s and 0s actually is, be it textual or binary. We'll then see how to extract common metadata from it, along with text, embedded resources, images, and maybe even the kitchen sink! We'll see how Apache Tika can do all of this for you, along with alternate and additional tools. Finally, we'll look a how to roll this all out on a Big Data scale.


Nick Burch

CTO, Apache Software Foundation
Nick began contributing to Apache projects in 2003, and hasn't looked back since! He's mostly involved in ""Content"" projects like Apache POI, Apache Tika and Apache Chemistry, as well as foundation-wide activities like Conferences and Travel Assistance. | | Nick is CTO at Quanticate, a Clinical Research Organisation (CRO) with a strong focus on data and statistics. | | Nick has spoken at most ApacheCons since 2007, and as well as many... Read More →

Wednesday November 16, 2016 13:00 - 13:50
Giralda I/II

Attendees (9)