EN: The presentation will be in English. Info in English below.


Příští setkání Pražské Czech Java User Group proběhne v pondělí 27.5. od 19h v posluchárně S5 na Matematicko-fyzikální fakultě Karlovy Univerzity na Malostranském náměstí 25, Praha 1. Vstup na akce CZJUGu je zdarma, a není třeba se předem registrovat. Pokud se chystáte přijít, dejte nám vědět přihlášením na tuto událost na facebooku: https://www.facebook.com/events/898763063796125/


Čas: 27.5. 2019 19:00

Název: CZJUG Praha - A look under the hood of H2O - machine learning for developers

Místo: Posluchárna S5 na Matematicko-fyzikální fakultě Karlovy Univerzity na Malostranském náměstí 25, Praha 1


Občerstvení na toto setkání zabezpečuje firma H2O.ai, za co jim moc děkujeme.


CZJUG dále pravidelně podporují:

  • Avast - nahrávání přednášek pro shlédnutí online

  • JetBrains - licence pro vývojářské nástroje pro přednášející

  • další: Oracle, portál www.java.cz


EN: The next meetup will happen on Monday 27th May at 19:00, in room S5, 2nd floor in the building of the Charles University at address Malestranské Náměstí 25, Praha 1.

Title: Productionizing H2O Models with Apache Spark


Spark pipelines represent a powerful concept to support productionizing machine learning workflows. Their API allows to combine data processing with machine learning algorithms and opens opportunities for integration with various machine learning libraries. However, to benefit from the power of pipelines, their users need to have a freedom to choose and experiment with any machine learning algorithm or library. Therefore, we developed Sparkling Water that embeds H2O machine learning library of advanced algorithms into the Spark ecosystem and exposes them via pipeline API. Furthermore, the algorithms benefit from H2O MOJOs – Model Object Optimized – a powerful concept shared across entire H2O platform to store and exchange models. The MOJOs are designed for effective model deployment with focus on scoring speed, traceability, exchangeability, and backward compatibility. In this talk we will explain the architecture of Sparkling Water with focus on integration into the Spark pipelines and MOJOs. We’ll demonstrate creation of pipelines integrating H2O machine learning models and their deployments using Scala or Python.


Speaker: Jakub Háva


Jakub (or “Kuba” as we call him) completed his Bachelor’s Degree in Computer Science and Master’s Degree in Software Systems at Charles University in Prague. During his master’s degree studies, he developed a cluster monitoring tool for JVM based languages which makes debugging and reasoning the performance of distributed systems easier using a concept called distributed stack traces. Kuba enjoys dealing with problems and learning new programming languages. At H2O.ai, Kuba leads development of Sparkling Water project and takes care of integration of Sparking Water with the rest of H2O.ai ecosystem.


H2O internals from the technical point of view


H2O-3 is an open-source machine learning platform made to be scalable and fast. While providing interfaces faimiliar to data scientists (Python, R, Scala, Web UI and others), H2O-3 itself is implemented in Java. It contains many of the most popular machine learning algorithms, including Gradient Boosting Machines, XGBoost, Generalized Linear Models, Deep Learning and much more. It is a distributed, scalable platform users can start with by simply running it on their laptops with minimal requirements and then taking it to the cloud, running large H2O clusters and processing vast amounts of data. An introduction to H2O's features and mission will be done in order to demonstrate the challenges faced while implementing such system. A look under H2O's hood follows, revealing some of the internal machanisms used to make machine learning algorithms distributed and fast. And what challenges does that bring. Finally, H2O is not an isolated island floating in the waters of machine learning only. Lots of engineering effort goes into integration with other systems, such as databases, file systems and distributed computing platform. Also, resulting models must be productionized. There will be a guided tour through the engineering of such parts, focusing on challenges introduced by the distributed nature of the system.


Speaker: Pavel Pscheidl


Pavel is a machine learning engineer at H2O. Holding a master's degree in Applied Informatics from Faculty of Informatics UHK, his main focus during his studies was applied statistics & stochastic methods, agent-based simulations and optimization. He joined a research team as a Ph.D. candidate while working on various problems like the effectiveness of fraud detection methods in highly-distributed systems. Due to his roots in computer science, his commercial focus was on enterprise Java systems and related standards. He is an author of Java EE 8 Microservices book. In 2017, Pavel joined H2O's awesome team, abandoning all other activities. At H2O, he is proud of being able to leverage his passion for algorithms and optimization while diving deeper into many other fields, including statistics, by learning from the all-start H2O team.