Červnové setkání Pražské Czech Java User Group proběhne 26.6. od 19h v posluchárně S5 na Matematicko-fyzikální fakultě Karlovy Univerzity na Malostranském náměstí 25, Praha 1. Čekají nás prezentace Engineering Machine Learning Algorithms at Scale (prof. Jan Vitek), Real-time stream data processing (Zbyněk Šlajchrt). Sponzorem tohoto setkání je firma AVAST Software. Vstup na akce CZJUGu je zdarma, a není třeba se předem registrovat. Pokud se chystáte přijít, dejte nám vědět formou hlasování v anketě na hlavní stránce portálu java.cz.

Engineering Machine Learning Algorithms at Scale

The talk will describe how to engineer a scalable implementation of a popular supervised machine learning algorithm, Random Forest, so that it can scale to terabyte data sets. To achieve this I will describe how to leverage the H20 analytics engine to write Java distributed Fork/Join code that is massively scalable and efficient. H20 has an API for Big Data Math that uses a simple giant vector programming style that runs in parallel across a cluster. H20 is able to run on top on infrastructures like Hadoop or stand alone and has been shown to scale to 100s of nodes. The H20 project is an open source effort and so is our implementation of Random Forest.

Jan Vitek

Jan Vitek is a Professor of Computer Science at Purdue University, USA. His research career encompasses work on all aspects of programming language design and implementation. He lead the development of the first real-time Java virtual machine, he worked on language-based security, concurrency and transactional memory. On the academic side of his life he chairs SIGPLAN, the ACM Special Interest Group on Programming Languages and chaired conferences such as ECOOP, PLDI, COORDINATION and TOOLS. He was an academic visitor for several years at IBM and Oracle. He cofounded Fiji Systems to sell real-time technology and he is currently an advisor at 0xdata where he works on big data. His most recent research interests include JavaScript and the R programming language.

Real-time stream data processing

This presentation deals with the concept of coroutines and its applicability in the world of stream data processing. Although it is rarely used in the todays applications, the coroutines have been here since the early days of digital computing. Surprisingly, coroutines can be nicely combined with the map-reduce paradigm that is used frequently in the world of cloud computing and big data processing. In contrast to the traditional map-reduce concept, which is designed for offline job processing, the coroutines&map-reduce hybrid is primarily targeted at real-time event processing. Clockwork, an open-source library developed at Avast, combines these two concepts and allows a programmer to write a real-time stream analysis as if he wrote a traditional map-reduce job for Hadoop, for instance. The presentation is focused mainly on coding and samples and will show how to program applications ranging from simple real-time statistics to more advanced tasks.

Zbyněk Šlajchrt

After finishing studies at Faculty of Mathematics and Physics at the Charles University, he began to work as Java EE developer and architect at several Czech and international companies. Now he works at Avast a.s. and aside his main job he gives lectures of Java EE programming at the University of Economics, Prague. In the current position he is responsible for designing and developing a private cloud platform and applications build on the Java platform in AVAST Software.