Glenna2/Glenna2-Newsletter-Dec-2019

From neicext
Jump to navigation Jump to search

Newsletter.jpeg


Glenna2 Newsletter, December 2019

Welcome to read the newsletter of the Nordic Glenna2 cloud computing project.

________________________________________________________

News:

Kubeflow

The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable.

Part of the goals within the Glenna2 project is providing cloud-based tools and infrastructure for researchers in the Nordics to use in their work. With the increasing trend of applying machine learning to various problems, utilizing the cloud for these workloads provides an appealing alternative for large-scale computations.

Kubeflow is a Google-backed project that aims to provide a curated collection of existing state-of-art machine learning tools compatible with Kubernetes environments all in a single package, in addition to some services created by Kubeflow. With a large foothold in the cloud computing scene, Kubernetes is widely adopted as the industry standard. With this in mind Kubeflow looks like a promising option for working with machine learning, also in this scene.

The project is currently under active development and is open-source, publishing the source code under the Kubeflow Github organization: https://github.com/kubeflow. The project is still not at a stage of full release, but stable versions of the current tools and setup are continuously released as development progresses. Given the opportunities Kubeflow suggests, the Glenna2 project decided to take a closer look and evaluate Kubeflow.

The major effort in this evaluation was focused on KfPipelines. For this, a workflow was developed using the KfP tool in which a machine learning model is trained and packaged into a container for serving predictions. The source code of this example is available on Github (https://github.com/pharmbio/kubeflow-pipelines/tree/master/kensert_CNN), along with more detailed documentation and instructions for running it.

The workflow consists of 4 stages; data preprocessing, training the model, evaluating the model and finally packaging the model into a Docker container. Because this workflow was adapted from a set of Python scripts running through the entire process, the original code was mostly kept as-is, instead running each script via a main shell script within each pipeline stage. A Docker container was created for each individual pipeline stage, including the shell script for that stage, the source code for the Python scripts and any other dependencies. These containers are also available in the Github repository.


Lean AI: the Full Stack Data Science Platform


Lean AI: full stack data science is an open source initiative initiated and maintained by Scaleout Systems and built in collaboration by Uppsala University (Pharmaceutical Biosciences) and NeIC. The purpose of the stack is to provide a highly flexible, cloud native open toolkit for full stack data science projects.

Leading open source components are integrated covering all stages from data ingestion and transformation, feature extraction, model definition, training and evaluation, to deployment, inference and monitoring with infrastructure automation on top of Kubernetes.

The aim of this work is that Lean AI full stack data science will continue to evolve with the open source DevOps and ML community and through active development by Scaleout.

Github repository: https://github.com/leanaiorg/leanaistack


Kafka, the Stream Processing Framework

Apache Kafka is a publish-subscribe-based durable messaging system that can exchange data between processes, applications, and servers. Kafka was designed for Big Data use cases, which need linear horizontal scalability for both message producers and consumers, and high reliability and durability. Kafka is used for building real-time data pipelines and streaming apps. Real-time data streams may come from sources that include sensors on devices connected to the Internet of Things (IoT), e-commerce transactions, data from financial trading floors or instrumentation in data centers.


Apurva Nandan at CSC has investigated Kafka and built a pilot implementation, ingesting weather data from personal weather stations in Kafka and processing the data using Spark Streaming. The presentation below (given during the Glenna2 F2F meeting in October 2019) outlines the approach.


https://wiki.neic.no/w/ext/img_auth.php/f/f7/GlennaKafka.pdf


About Glenna2:

Glenna2 is a three-year project to continue Nordic collaboration on cloud computing in Denmark, Finland, Iceland, Norway and Sweden. The aim of the project is to build on already existing services and infrastructure at the participating centers. The work is supported and directed by the Nordic e-infrastructure providers. Background:

Glenna2 aims to provide added value to the national cloud and data intensive computing initiatives by:

1. Supporting national cloud initiatives to sustain affordable IaaS cloud resources through financial support, knowledge exchange and pooling competency on cloud operations. 2. Using such national resources to establish an internationally leading collaboration on data intensive computing in collaboration with user communities. 3. Leveraging the pooled competency to take responsibility for assessing future hybrid cloud technology and communicate that to the national initiatives. 4. Supporting use of resources by pooling national cloud application expert support and create a Nordic support channel for cloud and big data. The mandate is to sustain a coordinated training and dissemination effort, creating training material and providing application level support to cloud users in all countries.


The project is organised into four Aims. The focus of Aims 1-3 is largely technological, while Aim 4 focuses on disseminating pooled knowledge down to end-users.


More about the four aims and the Glenna2 project: https://wiki.neic.no/wiki/Glenna2


The name Glenna is an Icelandic name and means "Opening in the clouds"

---

The Glenna2 Newsletter is distributed to team members and affiliated parties.

Feel free to contact me for more information!

On behalf of the Glenna2 team,

Dan Still

Glenna2 Project Manager, NeIC https://wiki.neic.no/wiki/Glenna email: Dan.Still@csc.fi tel: +358 50 381 9037