Loading…
Apache: Big Data Europe 2016
Click here to Register or for more information 
Tuesday, November 15 • 13:00 - 13:50
Sparkler - Crawler on Apache Spark - Karanjeet Singh & Thamme Gowda, USC

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

A web crawler is a bot program that fetches resources from the web for the sake of building applications like search engines, knowledge bases, etc. In this presentation, Karanjeet Singh and Thamme Gowda will describe a new crawler called Sparkler (contraction of Spark-Crawler) that makes use of recent advancements in distributed computing and information retrieval domains by conglomerating various Apache projects like Spark, Kafka, Lucene/Solr, Tika, and Felix. Sparkler is extensible, highly scalable, and high-performance web crawler that is an evolution of Apache Nutch and runs on Apache Spark Cluster. GitHub Link - https://github.com/USCDataScience/sparkler

Speakers
avatar for Thamme Gowda

Thamme Gowda

Graduate Student, University of Southern California
Thamme Gowda is a grad student at the Univ. of Southern California, Los Angeles, CA, and also an intern at NASA Jet Propulsion Laboratory, Pasadena, CA, USA. He is a co-founder of Datoin.com, a software as a service platform built using Hadoop and Spark. He is also a committer and... Read More →
avatar for Karanjeet Singh

Karanjeet Singh

Research Assistant, University of Southern California
He is pursuing his Master's degree in Computer Science from the University of Southern California (USC). His projects and research are mostly from the area of Information Retrieval and Data Science. He is also affiliated with NASA Jet Propulsion Lab. Prior to this, he was working... Read More →



Tuesday November 15, 2016 13:00 - 13:50 CET
Giralda III/IV