Apache: Big Data Europe 2016
Wednesday, November 16 • 13:00 - 13:50
Distributed Logistic Model Trees - Mateo Alvarez & Antonio Soriano, Stratio Big Data

Classification algorithms play an important role in different business areas, such as fraud detection, cross selling or customer behavior. In the business context, interpretability is a very desirable property, sometimes even a hard requirement. However, interpretable algorithms are usually outperformed by other non-interpretable algorithms such as Random Forest. In this talk Antonio Soriano will present a distributed implementation in Spark of the Logistic Model Tree (LMT) algorithm (Landwehr, et al. (2005). Machine Learning, 59(1-2), 161-205.), which consists of a decision tree with logistic classifiers in the leafs. While being highly interpretable, the LMT consistently performs equal or better than other popular algorithms in several performance metrics such as accuracy, precision/recall or area under the ROC curve.


Mateo Alvarez

Big Data developer/ Data Scientist, Stratio
Mateo Álvarez studied aerospace engineering at the Universidad Politécnica de Madrid, with a masters degree in Propulsion Systems, and Data Science in the Universidad Rey Juan Carlos. He is passionate about data analysis with Scala, Python and all Big Data technologies, and is currently part of the Data Science team at Stratio Big Data, working with ML algorithms, profiling analysis based around Spark.

Wednesday November 16, 2016 13:00 - 13:50
Giralda VI/VII

