Glenna2/Team-Meeting-2018-05-07
Gelnna2 Aim1 IaaS Meeting: May 5th 2018 at 12:00-13:00 CEST
Present: Gurvinder (NO), Nikolay(NO), Apurva(FI), Olli (FI), Risto (FI), Dan (FI)
Absent: Matteo (SE)
Channels: Google Hangouts: https://plus.google.com/hangouts/_/g4g5nyl5glc66wqgbqk4yjvb4ua
1. Review of last meeting and News
Not today focus on Kubecon review
2. Glenna2 Aim2 Topics:
Kubecon at the Bella Center in Copenhagen
4500 participants, Olli&Risto from Glenna2
Risto made notes available at: https://ep.csc.fi/p/!kubecon2018
(also snapshot below in Appendix)
2019 timeframe vm:s will be running on Kubernetes
RL: use cases at Kubecon
norwegian tax authority running openshift since 2014 six clusters
NYtimes main pages on Kubernetes FT and Bloomberg, zalando, spotify
Cern has 200 clusers for science applications
Gitops
lsio & servicemesh
GS: Selium(?) -> the best performance bare metal
RL: one team that manages Kubernetes clusters centrally ; multitenancy in Kubernetes Gvisor & Kata
custome resource definitions as part of Kubernetes API
Particularily good presentation from Simon Wardley https://www.youtube.com/watch?v=xlNYYy8pzB4
Next Kubecon in Shanghai -> (Seattle in Dec) -> Barcelona
RL: Väestorekisterikeskus building OpenShift Kelsey Hightower
Kubeflow is the ML new thing
Pytorch at CSC Notebooks (Apurva)
Only two talks from Docker
Get access to appstore in Norway for Risto Apurva and Olli
App store into production in August
4.6. next meeting
Appendix:
Risto & Olli KubeCOn 2018 Copenhagen Notes
Tuesday
17:00 Lightning Talk: Chaos Engineering In Practice - Paul Jones, Capgemini UK (Intermediate Skill Level)
- kubemonkey - chaos monkey for spring - chaos can be induced with sidecar containers
17:05 Lightning Talk: Not One Size Fits All, How to Size Kubernetes Clusters - Jeff Sloyer & Dan Berg, IBM (Any Skill Level)
- have multiple node types - use limits, requests and selectors - prometheus is memory hungry, example at IBM has 64GB RAM
17:10 Lightning Talk: Why you Should Really Pay Attention to K8S Security Best Practices - Benjy Portnoy, Aqua Security (Intermediate Skill Level)
Tesla's k8s cluster was hacked, because the dashboard had default credentials
- free tools - cis benchmakr testing - kubehunter - microscanner - interesting
17:15 Lightning Talk: Schedule the Scaling of Your Kubernetes Resources Using kube-start-stop - Lili Cosic, Weaveworks (Beginner Skill Level)
- uses custom resource definitions
17:20 Lightning Talk: A Desktop GUI for your First Kubernetes Deployment - Alessandro Pilotti, Cloudbase Solutions (Beginner Skill Level)
- github.com/cloudbase/kubinstaller
17:35 Lightning Talk: Kubernetes is Blowing Up - Ron Miller, TechCrunch (Any Skill Level)
- Big names flocked to cncf - $4 billion investments to K8s tech - lots of startups bought out by big names
17:40 Lightning Talk: Scaling Distributed Deep Learning with Service Discovery: How CoreDNS Helps Distributed TensorFlow Tasks - Yong Tang, Infoblox Inc. (Intermediate Skill Level)
- TensorFlow is developing fast, users need to be in control - no separate OPS team for TF - tensorflow needs predefined topology info, abstracted away with CoreDNS - cloud-init based deployment
17:45 Lightning Talk: Tips for Operating Kubernetes with OpenStack Cloud Provider - Yang Yu & Yifeng Xiao, VMware (Beginner Skill Level)
- normal queue pool tuning needed - high api load - if token expires between calls from nova to cinder, no 401 is visible to k8s - token needs to be regenerated - problems with cinder volume detach -> rpc_response_timeout needs to be increased in nova.conf
17:50 Lightning Talk: Extending Kubernetes with gRPC - Vladimir Vivien, VMware (Intermediate Skill Level)
- gRPC can be used locally through unix domain sockets - example: CSI with gRPC externalizes storage provisioning - CSI/gRPC success will define future plugin architecture
17:55 Lightning Talk: TSDB: The Engine behind Prometheus - Goutham Veeramachaneni, IIT Hyderabad (Beginner Skill Level)
- Grafana Labs contributes to prometheus - because of domain knowledge, data can be compressed well
18:00
Red Hat OpenShift Commons Machine Learning Reception
Red Hat/CoreOS Update with Brandon Philips
- tectonic: auto-updating K8s - OpenShift is getting tectonic features - K8s is in control of OS software - all through k8s API - running k8s enables custom APIs over IaaS providers - ticketmaster runs 344 prometheus instances with operators, and it works because of domain knowledge in prometheus operator - operator SDK makes it easier to develop operations
Panel: Machine Learning on OpenShift with ML Lightning Talks from Red Hat, Google, MSFT and others.
- there is SIG for ML
Carol Willing/Jupyter HUB - zero to jupyterhub with k8s
Clive Cox from Seldon - seldon does ML on k8s - core is open source, to deploy your model (training part is proprietary) - they provide builder images -
Diane Feddema and Zak Hassan/RedHat - radanalytics.io - spark on OpenShift - less than 10% overhead k8s vs bare metal
Daniel Whitenack / Pachyderm - reproducable ML pipelines - Pachyderm: data management, KubeFlow: distributed processing
William Buchwalter / Microsoft - kubernetes is becoming the runtime of the cloud - up to 10k cores for a single training job - AKS now supports GPUs - virtual kubelet
David Aronchick / Google - KubeFlow: ML for everyone - ML pipeline portability is an issue (OS and hardware changes from laptop -> training rig -> cloud) - Kubeflow could be the k8s for ML
QA - ML has a reproduceability problem - ML libraries are very young still (like 60s before C) - ML models are going to be reverse engineered - ML models are very sensitive, and also can not be applied in different environments - 500 queries is enough for getting 90% reverse engineered model - but it does not matter because of the sensitivity to data - sharing models for the benefit of the mankind?
- ML community is very open - but again it is hard to apply models in different environments
- fine training: 5000 data points to get acceptable performance, 10 million data points to get better than human performance
Red Hat Road Map to Kubecon/EU with Diane Mueller
Wednesday
Keynotes
Cloud native trail map
l.cncf.io
Linkerd, Istio, service mesh
Spiffe & Spire
Jaeger, distributed tracing
NATS, messaging for cloud native apps
CERN: Yadage service
GitOps
Keynote: How good is our code - 1500 in Berlin - 4300 in copenhagen - loads of new companies in cncf 80 to 200+ in a year - k8s is double slocs of linux - nodejs has more code than linux
Keynote: CNCF project updates Liz Rice
- 8 to 20 cncf projects in a year - rook is going to support cockroachdb - NATS messaging joined - vitess: mysql horizontal scaling, online resharding - k8s now in 'early majority' stage in majority
Lew Tucker - rethinking networking for microservices
- service mesh ftw
Ricardo Rocha, clenimar Filemon, CERN, multi-cloud federated kubernetes
- 200 k8s clusters - motivation for federation
- for load spikes, simplication in admin ops, same API onsite and offsite
- htcondor use case
- bare metal desired - containerized htcondor agent for k8s, as daemonset - daemonset runs across federated clusters - tested azure, t-systems, aws, interanl clusters - public cloud bingo, up to 7 public clouds in federation - reana/recast, yadage
Dirk Hohndel, VMWare, From Innovation to production - diversity problem
Alexis Richardson, CNCF 20-20 vision - from startup to acceptance to ubiquity - cncf is building a cloud platform - hadoop slow, kubeflow fast - well performing teams deploy often - 'just run my code' is missing from CNCF - convergence: k9s, just run my code - by 2020, we'll get to 'just run my code'
- security,
- cloud native gets gitops
- push code, not containers
- ethics!
- don't be zuckeberg
Whats Up With All The Different Container Runtimes? - Ricardo Aravena, Branch Metrics (Intermediate Skill Level)
- lwn.net : demystifiyng container runtimes - 2002 namespaces, 2006 opensvz, 2007 cgroups, 2009 mesos uses cgroups, 2011 lxc,
2013 docker lxc, 2014 docker libcontainer, 2014 rkt, 2015 OCI runc, 2016 CRI,
- openvz supports live migration - rkt is stalling - wcow and lcow - cri-o
- runs any oci runtime - podman
- crun - fast - kata
- runV + CelarContainers - works with docker too - easy to to configure plain docker to use kata - k8s: use any CRI plugin - cons: startup time, bare metal (or nested)
- others: nvidia runtime, railcar, putch, systemd-nspawn, lmctfy - unikernels
- meh
- mixed mesos and k8s: use mesos with containerd for flink and spark - future
- convergence to OCI, CRI, CNI, CSI - kata - cgroups v2
-
Evolving Systems Design: From Unreliable rpc to Resilience with Linkerd - Edward Wilde, Form3 (Intermediate Skill Level)
- integration to main payment platforms - book recommendations: dealers of lightning (Xerox PARK), show stopper (windows NT) - tail latency matter when you have millions of requests - Linkerd
- retry, timout - circuit breaker - tracing: zipkin
- tracing graphs reveal real patterns of communication - centralizing comms to a brokering compnenent enhances visibility and reduces exposure - traffic load generator: K6 - presenter used goland IDE - consul-registrator - listen on docker sock and register everything with an exposed port - article: tail at scale by google
Building Docker Images without Docker - Matt Rickard, Google (Intermediate Skill Level)
- security, reproduceability, minimal images
- cri-o does not build
- dockerfile-less, dockerfile RUN considered bad
- tools
- projectatomic/buildah - genuinetools/img
- even better: runtime-less
- more portable, can run in containers
- GoogleContainerTools/distroless
- declarative and reproducable - rebaseable - minimal images
- GoogleContainerTools/kaniko
- only works inside containers
- gVisor - programmatic manipulation of images FTW - builds without RUN are HARD - GCP/jib for java - GCP/runtimes-common - language specific images, improve caching Q/C: - using distroless may give bigger image than using alpine - unikernels and k8s do not mix (at least yet) - nix and docker? some effort ongoing
Continuously Deliver your Kubernetes Infrastructure - Mikkel Larsen, Zalando SE (Advanced Skill Level) - 80 K8s clusters managed by a single infra team - all clusters autoscaled - zalando-incubator/cluster-lifecycle-manager
The Route To Rootless Containers - Ed King, Pivotal & Julz Friedman, IBM (Any Skill Level)
(about framework AND user containers being run without root whereever possible)
- cloud foundry is multitenant, heroku (openshift?) like platform - docker: isolation, resource sharing, encapsulation - disk quotas are not there yet - pivot_root syscall explained - user namespaces
- multiple users on host mapped to multiple users in containers with suid binary newuidmap
- cgroups cannot be manipulated by users by default, subtrees can be chowned to users, though - UN and layered filesystems
- aufs needs root - btrfs exploded at scale - overlayfs supports user namespaces without root
- disk quotas
- xfs quotas needs root, a small setuid binary for that
- external network connectivity needs root - not rootless, setuid binary for that
- note: 1st class funniness in the presentaion
Improving your Kubernetes Workload Security with Hardware Virtualization - Fabian Deutsch, Red Hat & Samuel Ortiz, Intel (Intermediate Skill Level)
- legacy workloads are a no-go in k8s - hardware virtualization to the rescue - goal is to make hwv transparent in k8s - kata
- applies hwv to pods - one VM per pod - openstack foundation hosts kata containers - kubernetes secure containers: spec per workload - supports virtio, sr-iov, device passthrough
- kubevirt runs a VM with a wrapper pod
- legacy apps managed with k8s - k8s operator (virt-controller) - "kubectl get vms" - can use PVs - can use networks
- both share problems with using virtualization and upstream k8s assuming running processes on Linux host - gVisor?
- virtualizes syscalls
Evening keynotes
17:10
Keynote: Anatomy of a Production Kubernetes Outage - Oliver Beattie, Head of Engineering, Monzo Bank
- Monzo, a crowdfunded bank, on opensource infra, in AWS
17:30 Keynote: Container-Native Dev-and-ops Experience: It's Getting Easier, Fast. - Ralph Squillace, Principal PM – Azure Container Platform, Microsoft
- cool demo of remote debugging from vscode
17:35 Keynote: Cloud Native Observability & Security from Google Cloud - Craig Box, Staff Developer Advocate, Google
17:40 Keynote: CNCF End User Awards - Presented by Chris Aniszczyk, COO, Cloud Native Computing Foundation
17:45 Keynote: Prometheus 2.0 – The Next Scale of Cloud Native Monitoring - Fabian Reinartz, Software Engineer, Google
- 2.0 is lot faster and leaner than 1.0
18:05 Keynote: Serverless, Not So FaaS - Kelsey Hightower, Kubernetes Community Member, Google 18:13 Keynote: Closing Remarks - Liz Rice, Technology Evangelist, Aqua Security
General - lots of young people in the audience - grafana everywhere - Zalando representation made me realize that rahti-team is the new IT department
Introducing gRPC / Jayant Kolhe, Google
Room fullness index: some free seats, mostly full
Links:
- grpc.io - github.com/grpc - www.http2demo.io
- Remote procedure call library - Enables development of microservices that can communicate with each other while being implemented in many different languages - Enables strict service contracts - Lots of integrations to other tools - Single line installation, idiomatic APIs, error propagation, reconnect automatically on broken idle connections - Actively developed, production ready, now at v. 1.11 - Next generation version of Stubby RPC used at Google
* Microservices at Google: O(10^10) RPCs per second * Being used at Google alongside Stubby (takes some time to port apps from Stubby to gRPC)
- Define a service in a .proto file, compile it into a server/client stub - Protocol buffers - not required for gRPC, but very handy - Example: RouteGuide - grpc/grpc/examples - Getting started:
1. Start with defining messages you want to send 2. Add service definition 3. Code generator converts .proto idiomatically to your language 4. Write code for your service by creating a derived class that implements the RPC method handlers specified in the .proto file 5. Write code for your client by creating a "Stub" and invoking RPCs as its member functions
- Takes advantage of HTTP/2 feature set: multiplexing, header compression, binary framing - Integrates auth & security easily through a plugin auth mechanism - Developed on GitHub, in CNCF
Envoy project intro / Matt Klein & Jose Niño, Lyft
Room fullness index: packed, people turned away
Links:
- https://www.envoyproxy.io/
- Service mesh tool - allow microservices written in different languages to communicate and expose their APIs via routes - Started at Lyft - Development started in 2015, open sourced in 2016 - Envoy@Lyft:
* > 200 services * > 20 000 hosts * > 5 million RPS
- xDS APIs:
* RDS - Route Discovery Service * CDS - Cluster --||-- * ...
Kubernetes and Taxes: Lessons Learned at the Norwegian Tax Administration (NTA) / Bjarte Karlsen, NTA
Room fullness index: large auditorium, packed
Links:
- The Aurora OpenShift platform: https://skatteetaten.github.io/aurora-openshift/ - https://github.com/skatteetaten/openshift-reference-springboot-server - https://skatteetaten.github.io/aurora
- Running OpenShift/Kubernetes for nearly three years - Aurora OpenShift platform - Goal set in 2014: "Faster development and more efficient ops" - 6 OpenShift clusters
* Prod backend: 12 teams, 190 apps, 450 pods
- Separate build nodes because builds run as root - Apps written in Java and packaged as Docker containers - Jenkins for CI, Bitbucket for Git
* "Jenkinsfile is a subset of Groovy that is undocumented"
- Building Docker images for PaaS is hard
* Should teams build Docker images themselves or should they be built centrally? * NTA builds images centrally
- Not using Helm since it's so new - similar functionality has been built inhouse - Separate config repos for app configuration
* Declarative, cascading configuration format in YAML
- Culture:
* embrace existing culture - keep some parts from the old culture * automate those old parts that you keep * integrate new infra with the old
- Wallboard: status display for multiple apps running on Kubernetes - similar to dashboards ops people are used to - General lessons:
* Create simple and easy contracts on how to do things on the platform * Loose couplings * Modular designs - use "Legos" to build larger systems * Beware of the hype * Use what is rock solid - don't use the beta/tech preview stuff for production * You want your platform to be boring
Continuously deliver your Kubernetes infrastructure / Mikkel Larsen, Zalando
Room fullness index: two large auditoriums combined into one, packed
Links:
- github.com/zalando-incubator/kubernetes-on-aws - github.com/mikkeloscar/kubernetes-e2e - github.com/mikkeloscar/pdb-controller
- Zalando:
* > 23 million active users * > 200 million visitor per month * ~2000 employees in tech in over 200 delivery teams * 366 AWS accounts * 84 Kubernetes clusters
- STUPS toolset around AWS - Kubernetes at Zalando:
* Clusters per product * Instances not managed by teams * Hands off approach * A lot of stuff out of the box
- Transitioning to Kubernetes as the base infrastructure - The reason why not everything is in one large cluster is related to how the organization is structured
* Also avoiding putting all eggs in one basket
- Philosophy:
* No pet clusters - no tweaking of custom settings for 80 clusters * Always provide the latest stable K8s * Continuous and non-disruptive cluster updates - no maintenance windows * "Fully" automated operations - operators should only need to manually merge PRs
- Clusters provisioned in AWS via CloudFormation - etcd stack outside Kubernetes - CoreOS and immutable instances - Multi AZ worker nodes - Cluster configuration stored in Git - End-to-end testing in Jenkins - Dev, alpha, beta/stable channels for Kubernetes clusters:
* 3 dev clusters * 1 alpha cluster * 80+ beta/stable clusters
- Tests as part of pipeline for verifying K8s clusters:
* Conformance tests from upstream Kubernetes: 144 * StatefulSet tests: 2 * Custom Zalando tests: 4
- Best to run tests with -flakeAttempts=2 - Node upgrade strategies:
* Naive strategy: scale up AutoScalingGroup by one, drain one existing node and have it replaced * No autoscaling during upgrades * Not clear how to long to take to drain a node
Building a Cloud Native Culture in an Enterprise / Deep Kapadia & Tony Li, The New York Times
Room fullness index: full auditorium, a couple empty seats
Links:
- gregdmd.com/blog/2015/07/19/delivery-engineering-team/
- NY Times first foray into public cloud infra in 2010 - 2016: plan to shut down data centers - 2018: www.nytimes.com is cloud native running on Kubernetes, data centers shut down
* 300+ apps migrated to AWS or GCP
- US presidential elections: lots of traffic on election night, need to scale up massively - Lots of apps on the NY times front page, lots of backend apps as well - The IT team transformed into infrastructure and product teams - Organic growth sprouted many product and platform teams - 2010: AWS initially treated like the data center:
* closely guarded access, not a lot of automation -> not possible to scale easily * spaghetti architecture, no 12 factor, secret stores etc.
- 2016: more concerted effort
* focus towards developer experience * a couple teams dabbled in container orchestration in isolation * formation of the Delivery Engineering Team
- Standardizing
* GitOps - GitHub driven workflow * Drone - pipeline as code * Terraform - infrastructure as code * Vault - actually manage secrets the right way, policies as code
Thursday
From PaaS to Kubernetes / Google
Room fullness index: about 70% full
Links:
- cncf.io/certification/software-conformance/ - https://github.com/GoogleContainerTools/skaffold * https://github.com/GoogleContainerTools/skaffold/issues/353
- Kubernetes is more flexible compared to older PaaS platforms - There's no vendor lock-in unlike in many PaaS systems - You can deploy more than just 12-factor applications - "Kubernetes is a workload-level abstraction." - Kubernetes sits at the ideal level of abstration:
* No need to manage infrastructure, but * Not tightly tied to specific programming languages
- It can be tough to get started with Kubernetes compared to PaaS, but you don't run into limitations as you progress
* "Be wary of the things that make the simple easier, but the complex harder."
- What to bring over to Kubernetes from the PaaS world:
* git push to deploy (there was a demo of doing this on the Google Cloud Platform, this is also the bread and butter of OpenShift) * configuration as code (a.k.a. GitOps) - separate code and config repos * review apps - create an ephemeral version of the application that can be accessed, viewed and reviewed * ensure portability by using a certified K8s provider * automate *everything*
- Things to consider carefully:
* poorly supported dependencies - you clone it, you own it * build packs * pricing models
- Skaffold for rapid development iteration (there was a demo of using skaffold)
* watches code files and deploys on changes to the cloud automatically * can avoid having to run a local development environment * shows you errors in deployment immediately * you can use the exact same K8s manifests that you would use in production, so your development environment is closer to production * question from the audience: can it listen to hooks from artifactory? not sure, but it should. * doesn't work out-of-the-box with OpenShift - tries to list pods that it is not allowed to list
Writing Kube controllers for everyone / Maciej Szulik, Red Hat
Room fullness index: almost full, individual empty seats
Links:
- github.com/kubernetes/sample-controller - https://github.com/kubernetes-sigs/kubebuilder - https://medium.com/@cloudark/kubernetes-custom-controllers-b6c7d0668fdf - github.com/kubernetes/community/blob/master/contributors/devel/controllers.md
- About 400 LOC in the cronjob controller - probably the smallest controller in Kubernetes
* The job controller is not a good example of a controller these days
- Control loop:
* worker function takes items froms a queue and processes them in some way * there are rate limiters you can use with the queue
- Where to start when writing a controller?
* Sample controller for an easy example (see links) * kubebuilder (see links)
- Some amount of boiler plate needs to be written still, but the amount of this is being reduced - Important to remember: shared informers - use them!
* podInformer = InformerFactory.Core().V1.Pods() * No documentation available though, even though everyone is using them * Shared informer == shared data cache * Add event handlers to your controller for reacting to new objects, updated objects and removed objects
- Shared informers - listers:
* podStore = podInformer.Lister() * Listing objects is really expensive
- SyncHandler: the main loop of your controller
* Convert into a distinct namespace and name * Get the object from the cache * Most important thing of all after the shared informers: pod := podTmp.DeepCopy() * If you want to modify an object, you have to do a DeepCopy of it * DeepCopy is very expensive - only do it if you need to do it (== you are going to be updating it)
- Controller ground rules are in the docs - read them before writing a controller
Best Practice for Container Security at Scale / Dawn Chen & Zhengyu He, Google
Room fullness index: almost full, individual empty seats
Links:
- https://github.com/google/gvisor * 6 commands to run to try it out on one's own laptop - http://stopdisablingselinux.com - https://cloudplatform.googleblog.com/2018/05/Open-sourcing-gVisor-a-sandboxed-container-runtime.html
- Container isolation mechanisms previously:
* Run as unprivileged user * Drop capabilities * Apply CGroups * Apply namespaces * SELinux/AppArmor
- Additional isolation provided by gVisor - Work started on gVisor 5 years ago - Goals: a random container image should not exploit my Linux machine, there should be little to no work required to fulfill this requirement and it should still feel like a container (light & fast) - Sandboxing methods:
* SELinux/AppArmor/seccomp-bpf * Syscall filtering * Not easy to define rules for these * Hypervisors (strong isolation, but heavy and inflexible)
- gVisor takes the best features from hypervisor isolation while remaining lightweight and flexible (150 ms startup, 15 MB mem overhead)
* Special kernel written in Go runs on top of the host kernel * Implements Linux system API in user space - 211 syscalls so far * Runs unmodified Linux binaries * No filter configuration, AppArmor or SELinux policies needed
- Runsc: an OCI runtime powered by gVisor - No need to fix resource usage at startup unlike VMs - Good for: small containers, spin up quickly, high density - Not good for: trusted images (no need for additional isolation), syscall heavy workloads
Kubernetes multi-cluster operations without federation / Rob Szumski, CoreOS
Room fullness index: almost full, individual empty seats
Links:
- https://coreos.com/blog/introducing-operators.html - https://github.com/operator-framework - https://github.com/kubernetes/community/tree/master/sig-multicluster
- First thing learned: everybody is running multiple clusters
* Dev/test/prod * Different regions * Average number of clusters: 5-10
- Current federation features: federation API server connects multiple API servers
* Must be root and able to act in all namespaces * Deployment model leads to SPOF for most uses * The federation API has the same requirements as the API server, but e.g. etcd is not designed to run over a WAN -> need to put the federation API server's etcd to a single site -> SPOF
- App owners want: CI/CD, cluster discovery, failover between clusters, credential mgmt - Infra admins want: connect and track clusters, ensure overall security, lock down as much as possible, resource limits - Existing building blocks for working over multiple clusters:
* Cluster registry * Access control
- Tectonic has some nice multi-cluster functionality
* No cluster needs to access all clusters * ServiceAccounts can be audited/revoked * ServiceAccounts only read Clusters and ClusterPolicies * Distributed cluster registry
- Operator pattern: https://coreos.com/blog/introducing-operators.html - Federation v2 coming
* Aggregated API server
- Multi-cluster SIG: https://github.com/kubernetes/community/tree/master/sig-multicluster
Modernizing Traditional Security: The Security + Compliance Benefits of Shifting a Legacy App to Containers / John Morello, Twistlock
Room fullness index: about 70% full
Links:
- Containers + "lift and shift": moving a monolithic application to containers - no re-architecture required - Example case:
* 12+ year old Java application that's been migrated from bare metal through VMware to Azure * Significant security requirements (FISMA, GDPR) * Users from many orgs around the world
- Challenges:
* Governmental compliance/security standards required manual configuration * Complicated disaster recovery * New builds could not be automated * Maintaining consistency between dev/test/prod was difficult and resource intensive
- Security benefits of packaging the legacy app as a container:
* Vulnerability scanning can be done before deployment instead of after * Automatic compliance instead of manually created compliance documentation and rules * Shared, central view into compliance state of all assets in the environment * Can create a model for what the application does and restrict unneeded operations based on this model
- Containers are:
* Minimal - typically single process entities * Declarative * Predictable - you can create rules that only allow those operations that the app needs
YAML is for computers, ksonnet is for humans / Bryan Liles, Heptio
Room fullness index:
Links:
- https://ksonnet.io/ - https://github.com/ksonnet - Side note: https://httpbin.org/
- Goals of ksonnet:
* GitOps * Manage any app * Make declarative app mgmt easy * Democratize mgmt of apps
- Run an app and expose it in K8s, normally you need to write YAML for:
* Deployment * Service * Ingress
- What if you want multiple versions of these? - What if you want to deploy to another cluster with the same YAML? - What if you have a different ingress setup in dev vs. prod? - What if you wanted to apply a node affinity rule? Or pod affinity? - Config management: procedural vs. declarative
* Ansible is not optimal for creating Kubernetes objects because it is procedural - declarative is better
- ksonnet puts shared config between multiple deployments into one place, so it doesn't need to be repeated in multiple YAML files - Can use code to build YAML in flexible ways - Prototypes for common objects so you don't have to look at the spec - Helm charts integration coming up - Will be able to overlay parts of a YAML on another YAML file - Will have VS Code integration - Question from audience: Can I deploy custom resources with ksonnet? Answer: yes.
09:00 Keynote: Kubernetes Project Update - Aparna Sinha, Group Product Manager, Kubernetes and Google Kubernetes Engine, Google Keynote: k8s project update
- 54% of f500 are using k8s - 100000 nodes under GKE - sandboxed containers - gVisor - k8s has a native scheduler for spark
- demo of spark - spark operator - using kubectl to run spark workloads directly
09:30 Keynote: Accelerating Kubernetes Native Applications - Brandon Philips, CTO of CoreOS, Red Hat - fast forward version from Tuesday evening
09:35
Keynote: Switching Horses Midstream: The Challenges of Migrating 150+ Microservices to Kubernetes - Sarah Wells, Technical Director for Operations and Reliability, Financial Times
- FTs content platform now uses k8s
- 150 microservices
- 2015: in-house docker based system
- 80% savings in AWS (vs. service per VM) - hard to support
- choose boring technology - late 2016 opted for k8s - 2200 releases a year after moving to microservices - 2000 code releases while running parallel - not cheap - if/else code is a better idea than branch for each type of environment (merge hell) - systemd service files -> helm chart ('Helm days') - go code base can rot without vendoring (dependencies) - lots of improvements due to tech dept revelead during the migration - happy about the new stack
09:55
Keynote: Shaping the Cloud Native Future - Abby Kearns, Executive Director, Cloud Foundry Foundation
- investing in in-house developers + cloud infra -> success
10:15 Keynote: Skip the Anxiety Attack - Build Secure Apps with Kubernetes - Jason McGee, Fellow, IBM
10:20
Keynote: Software's Community - Dave Zolotusky, Software Engineer, Spotify
- open source support vs enterprise support
- have to rely on "unnatural" chat support - that works - or direct mals to lead developer of Prometheus
- even some banks start sharing their infra experiences and open sources their software - GDPR means re-architecturing some of the infra - software is easy, talking to people is hard
Autoscale your Kubernetes Workload with Prometheus - Frederic Branczyk, CoreOS (Intermediate Skill Level) - HPA (moar podzes) and VPA (bigger podzes) - history
- heapster based HPA in 1.2 - now formally deprecated, retired for 1.13
- 1.8: resource and custom metrics apis - core metrics: cpu, memofy - metrics-server
- is the canonical implementation - installed in all clusters (if not replaced by vendor by something compatible)
- HPAv2 can use custom metrics - k8s-prometheus-adapter -> prometheus -> pod - future
- autoscale CRDs in 1.11 - stable metrics -
Writing Kube Controllers for Everyone - Maciej Szulik, Red Hat - kubernetes/sample-controller - kubebuilder - use shared informers - for this reason do not use the cron job controller as an example - use shared informer lister in controllers - https://github.com/kubernetes/sample-controller/issues/13 nice schema from github/devdattakulkarni -
How We Used Jaeger and Prometheus to Deliver Lightning-Fast User Queries - Bryan Boreham, Weaveworks (Intermediate Skill Level) - weaveworks cortex: multi-tenant prometheus https://github.com/weaveworks/cortex - measure the real system and keep on measuring it - prometheus has a function histogram_quantile() - averages mask problems - jaeger collects output from instrumented pieces of sw -
Exploring Container Mechanisms Through the Story of a Syscall - Alban Crequy, Kinvolk (Intermediate Skill Level) - mount propatagion in k8s 1.10 (pod can mount volumes on the host)
CNCF Storage groups - discussion about the next steps. - scope on the next techs in cloud native storage - the more decoupled you get, to more cloud native you are (fs -> block -> object storage) - mission: cloud-nativize everybody -
Horizontal autoscaler reloaded
- VPA is being worked on - hpa debugging: look at the .status.conditions for errors - pod metric type automatically polls pods related to rs/rc - object metric type can poll given object - external metric type: can match any label - future:
- pod and object metrics will support labels - api cleanup
General
- native app development is being targeted on multiple fronts
Friday
Keynotes:
- medium.com/wardleymaps
The Serverless and Event-Driven Future / Austen Collins, Serverless
Room fullness index: auditorium mostly full
Links:
- serverless.com - serverless.com/event-gateway - cloudevents.io
- Speaker made a framework in 2015 for using serverless products from Amazon/Microsoft/Google - Serverless qualities:
* functions as the new unit of deployment * never pay for idle
- All big cloud provider have their own serverless product by now - At the same time, open options are emerging: kubeless, apache openwhisk, oracle fn, fission, openfaas - Forecast for serverless: more event-driven systems, IoT, intelligent systems
* Functions everywhere: multi-cloud, on-premise, edge
- CNCF's role: "standards", common specs
* Problem: what to standardize first and how?
- Standardization areas:
* runtimes * APIs * events (what format?) <- work on standardization starting here * plumbing
- Serverless WG introducing cloudevents
* Version 0.1 out a few weeks ago
- Cloudevents use cases
* Normalize events across envs * Facilitate integrations across platforms * Increase portability of FaaS * Normalizing webhooks * Event data evolution * Event tracing * IoT
- People from e.g. these companies contributed to the spec: serverless.com, google, microsoft, ibm, vmware etc. - There was a demo of an event triggering a function on several different serverless products - Future:
* tracing * workflows * correlation * rules * standard events * event directories * webhooks
- Azure already supports CloudEvents
InfluxDB & Prometheus / Paul Dix, Influxdata
Room fullness index: large auditorium about 30% full
- InfluxDB: open source (MIT) time series DB written in Go and with an SQLish query language - Commercial offering: HA clustered InfluxDB - Supports float64, int64, string, bool - First thing for Prometheus support: remote write/read support added to InfluxDB - InfluxDB's goal is to be built like a DB - Prometheus not so much -> InfluxDB good for long term storage of Prometheus data - Can push from multiple Prometheuses to one InfluxDB -> ephemeral Prometheus instances in K8s (avoid state in K8s) and push to InfluxDB - Configure in Prometheus: remote_write, remote_read - Influx 2.0:
* new query language IFQL * new execution engine * decouple storage from compute -> scale IFQL query processing and storage of data separately * better support for Prometheus data formats * new execution engine to support multiple query languages with IFQL being the most important * idea for future: add support for PromQL to the engine so you can query InfluxDB with PromQL
- Influx 1.5 released about a month ago - There's an intention to support the Prometheus ecosystem - Future work:
* Combine Prometheus & Influx results * IFQL against Prometheus? * Input/output plugins - could e.g. push data from InfluxDB to Prometheus * Apache Arrow as data interchange format? To avoid marshalling/unmarshalling of data in workflows.
A Hacker's Guide to Kubernetes and the Cloud / Rory McCune, NCC Group
Room fullness index: about 90% full
Links:
- CIS Security Guide for Kubernetes: https://learn.cisecurity.org/benchmarks - https://github.com/raesene/kubeconeu-presentation
- Starting to see people attack Kubernetes clusters in the wild - Know your threat model - who is attacking you and how?
* Random attackers from the Internet * Targeted attacks specifically against you * Indirect attacks against you via your trusted supplier * Nation states
- Attack surface - what can be targeted in your service?
* Consistent security over your whole attack surface * Attacks can come via your underlying cloud provider * GitHub can provide useful secrets to attack you * Private repos on GitHub can slip into the public side if they are forked
- External attackers - first thing to do: port scan - Kubernetes port scan:
* open etcd? * open cAdvisor? * insecure API port? * etc.
- Attacking the API server - Attacking etcd
* Does it have authentication enabled?
- Shodan: a directory of servers connected to the Internet updated daily - useful for finding targets - Attacking the kubelet
* There didn't use to be authentication in the kubelet * Can run any command in any container on the kubelet's node through the kubelet * Not a good idea to leave your kubelets open
- Malicious containers
* You don't want a malicious container to compromise your whole cluster * Increased attack surface * Container filesystem access * "Internal" network access * Kernel vulns.
- Attacking via service accounts: if you don't have RBAC enabled, every container has an admin token - Leveraging access in the cloud: get access to the underlying cloud via accounts with too wide access rights - Secure defaults are important
* Your provider might not have the same threat model in mind when deciding on defaults as you
- Top 10 key security considerations:
1 Disable insecure port 2 Control access to the kubelet - also turn off read-only port 3 Control access to etcd 4 Restrict service token use - only mount service tokens when needed and make sure RBAC is enabled 5 Restrict privileged containers - beware of network plugins 6 API server authentication 7 API service authorisation - use RBAC 8 PodSecurityPolicy 9 NetworkPolicy - stop containers accessing the control plane 10 Regular upgrades!
- Honourable mention - cloud rights: don't give too many rights to your integration accounts
Kubernetes Runtime Security: What Happens if a Container Goes Bad? / Jen Tong & Maya Kaczorowski, Google
Room fullness index: about 90% full
Links:
- Kubernetes is so new that a lots of practitioners don't know what security controls come with it - Tesla hack: unsecured K8s dashboard -> AWS cloud credentials - Infra sec, software supply sec, runtime sec - This presentation concentrates on runtime sec - Containers vs. VMs:
* Minimalist host OS in container reduces attacks surface, but there's no strong HV protection * Separation using namespaces and cgroups, but host resources not all well separated * Containers have a shorter lifetime, so less time to exploit, but it's also harder to do forensics because of this
- Better to have wide coverage of the attack surface rather than really good coverage of small parts of the attack surface - NIST cybersecurity framework
* Identify, protect, detect, respond, recover * Know what you are running, use secure defaults, detect deviations from the norm in containers, plan your response to detected threats, complete forensics and fix things so this doesn't happen to your container again * At this framework level not very different from protecting other kinds of objects than containers
- Detecting bad things at runtime
* ptrace, kprobes, tracepoints * Audit logs * eBPF: kernel introspection * XDP: uses eBPF for filtering network packets * User-mode API: for kernel events like inotify * Falco from Sysdig * You could e.g. alert on shell creation on a container * Some of these features only come with newer kernels like 4.8-4.11
- Detection/capture tools deployed as privileged pod alongside end user pods - Response:
1. send alert 2. isolate the container by e.g. moving it to a new network 3. pause the container 4. restart the container, i.e. kill and restart processes 5. kill the container, i.e. kill processes without restart
- What can you do today:
* Make it part of your security plan * Try out open source options * Deploy early - not after the first incident * Rehearse an event
Everything You Need to Know About Using GPUs with Kubernetes / Rohit Agarwal, Google
Room fullness index: about 20% full (the second to last slot, so some thinning to be expected)
Links:
- The presentation is about the how, not the why or the when of using GPUs - Containers: package once, run anywhere - EXCEPT when one of your dependencies is a kernel module - Using NVIDIA GPUs requires the NVIDIA kernel module and some user-level libraries
* The cluster admin takes care of driver and kernel module installation
- The version of the user-level libraries needs to match the kernel module -> not practical to package user libraries in the container image - First attempt: alpha.kubernetes.io/nvidia-gpu
* let the user deal with the dependencies - just expose the GPU to the container * users install kernel module and libraries on host, use hostPath volumes to access the libraries * works, but is terrible and not portable * this was in-tree, which is not optimal: what about AMD GPUs, Intel GPUs, Xilinx FPGAs etc.? * deprecated in 1.10, removed in 1.11
- Second attempt: device plugins (e.g. nvidia.com/gpu)
* support generic devices * vendor specific code out-of-tree * enable portable PodSpec * resources.limits.nvidia.com/gpu: 2 * device plugin APIs handle access to libraries * no host specific stuff in the PodSpec * The cluster admin installs the device plugin * resource quotas in 1.10 * introduced in 1.8, beta in 1.10, the current way to use GPUs
- Don't build images with user-level shared libraries - Do build images with the CUDA toolkit - Request GPU resources in pod spec - Multiple GPU types in cluster? No specific support for asking for specific GPU type - need to use nodeSelectors instead. - Keep up with the driver vresion required by the latest CUDA release - your users are going to run newer CUDA versions and they should work - GPU monitoring: memory_used, memory_total, duty_cycle collected by cAdvisor using NVML (added in 1.9) - Dedicate nodes for GPU workloads: GPUs are expensive - don't let pods not requiring GPUs to fill these nodes - In commercial public clouds: aggressively scale down GPU nodes - What's missing:
* No GPU support in minikube * No fine grained quota control: can't control quota by GPU type * No support for GPU sharing * Not aware of GPU topology * Autoscaling support not ideal
- Recommendations for base images? NVIDIA's CUDA images.
Multi-tenancy in Kubernetes / David Oppenheimer, Google
Room fullness index: about 90% full (last session slot of the conference)
Links:
- Multi-tenancy: isolation between tenants within a cluster - Why multi-tenancy within a cluster?
* no need to manage 100s/1000s of clusters * no need to pay control plane cost per tenant * less resource fragmentation * no need to create a cluster for each new tenant
- Cluster per tenant is not practical in many cases - Control plane isolation - Container isolation - Use case: enterprise
* All users from the same org (semi-trusted) * Tenant == namespace == team * Many different semi-trusted apps * May be OK with vanilla container isolation * May want to restrict container capabilities * Inter-pod communication depends on the app topology (single- vs. multi-namespace apps)
- Use case: Kubernetes as a service / Platform as a Service
* SaaS where the software is Kubernetes * Untrusted users running untrusted code * Tenant == a consumer + their API objects * User can create namespaces and CRUD non-policy objects within their namespace(s) * Resource quota based on how much the user paid * Untrusted code -> sandbox pods (gVisor/Kata) or sole-tenant nodes * Strong network isolation between namespaces belonging to different tenants
- Use case: SaaS multi-tenant app
* Single of application * Consumer interacts only within application * Tenant is internal to app, opaque to K8s
- Use case: SaaS single-tenant app
* Consumer interacts only with one app * Each consumer has their own app instance * Tenant == one application instance * Code semi-trusted, may incude untrusted (plugins) -> sandbox pods or sole-tenant nodes * Pod communication: within namespace + to shared infra
- Multi-tenancy features in Kubernetes: auth-related
* RBAC authorizer: which users/groups/SAs can do which operations on which API resources in which namespaces * PodSecurityPolicy - what kind of pods can users create? Privileged vs. non-privileged etc. * NetworkPolicy - which other pods can pods communicate with?
- Multi-tenancy features in Kubernetes: scheduling-related
* ResourceQuota + ResourceQuota admission controller - how many resources can a namespace use? * resource request + scheduler and eviction manager - prevent scheduler from packing too many pods on a node * limit + cgroups: limit can actually be greater than resource request -> overcommitment * taints/tolerations - dedicate certain nodes to specific users * anti-affinity - enforce sole-tenant nodes, for example, or create exclusive pods that are the only pod on their node * currently in alpha: priority and pre-emption for pods * ResourceQuota divided by priority: you get X RAM and Y CPUs at priority Z - someone else might get their quota at higher priority * SchedulingPolicy
- Security Profile: improve usability of K8s security/multi-tenancy features
* currently you need to understand K8s features deeply to securely operate a multi-tenant cluster, the aim is to make this easier * create a small menu of security profiles for different use cases
- Open policy agent
09:00 Keynote: Kubeflow ML on Kubernetes - David Aronchick, Product Manager, Cloud AI and Co-Founder of Kubeflow, Google & Vishnu Kannan, Sr. Software Engineer, Google
- google uses ML to lower PUE - ML is hard
- hosted ML is easy in the start
- KubeFlow comes with a set of integrated tools - has CRDs - TPU: tensor processing unit, a special chip for Tensorflow from Google
09:20 Keynote: Running with Scissors - Liz Rice, Technology Evangelist, Aqua Security - drop caps you don't need - demo of running nginx as non root - user namespaces are still missing -
09:40 Keynote: Scaling Deep Learning Models in Production Using Kubernetes - Sahil Dua, Software Developer, Booking.com
- booking.com : 15M bookins per day, 1.4M properties -> lot of data for training - DL. image tagging, translations, ads bidding - image tagging is domain specific, public taggers do not cut it for booking - HDFS (external?) for storing the training data - HDFS also gets training process output streams for checking the progress - prediction: models are kept out of the container images
10:00 Keynote: Crossing the River by Feeling the Stones - Simon Wardley, Researcher, Leading Edge Forum - we won the infra war, and are losing the higher level war to AWS Lambda
Rook Deep Dive – Bassam Tabbara, Tony Allen & Jared Watts, Upbound (Intermediate -Skill Level) - extends k8s by CRDs - in CNCF - deployed by an operator - pluggable providers - minio just added as a provider - minio: object storage - minio operator: tini as entrypoint to catch interrupts - prometheus metricks through ceph MGR - watch latencies, make sure deep scrubbing is on - CRI-O FTW - stolon: postgresql for kubernetes https://github.com/sorintlab/stolon - idea: sidecar containers to take backups - demo:
- loads of applications (nextcloud, gitlab, ...) running on rook - serving blockstorage and fs to the pods - ELK on block storage
- problems with glusterfs locking, ceptfs worked fine. running a HA gitlab on a shared cephfs - ceph dashboard available from grafana.com -
Using Kubernetes Local Storage for Scale-Out Storage Services in Production - Michelle Au, Google & Ian Chakeres, Salesforce (Intermediate Skill Level) - salesforce runs storage on k8s, 10PB energized (? power on?) - use cases: clustered systems, semi-persistent caching - beta in 1.10 - you can recycle local PVs automatically - stage 1: prepare the volumes on nodes, including formatting - stage 2: create PVs, normal PV lifecycle - salesforce
- nodeprep daemonset takes care of node init (after k8s install!) - lv-provisioner ds waits for prep daemon labelling
- data locality (from HDFS, ceph, ...) can be provided to scheduler by labels
Securing Serverless Functions via Kubernetes Objects - Sebastien Goasguen, Bitnami (Advanced Skill Level) - kubeless - CRD for Functions and Triggers - mapping from external events to functions are mapped in k8s api through trigger CRD - normal RBAC, namespacing etc applies - labels can be used to trigger ingress creation and network policies - don't run your function server as root - check: falco from sysdig
- used in a demo to trigger pod deletion from an event produced by falco when bitcoin mining was detected
- kubectl supports basic http auth ootb, with nginx-ingress-controller - kong ingress controller is also supported - auth0 - accessing cloud: AWS supports annotation - GCP: export GOOGLE_APPLICATION_CREDENTIALS - cognito user pool - istio:
- supports secured ingress - supports rbac between pods
Federated Prometheus Monitoring at Scale - Nandhakumar Venkatachalam & LungChih Tung, Oath Inc (Intermediate Skill Level) - 2k server nodes, tens of thousands of pods - relabel metric endpoints with datacenter label through a job - etcd /metrics proxied from masters, because nodes cant call etcd - selective aggragated metrics pulled to permanent storage - one year retention time for the aggregated data - prometneus instances run as pods (of course they do, this was not even directly mentioned) - aggregation rules
- many available from prometheus operator team - defined for all levels in hierarchy - datacenter,namespace,controller,pod,container
- alert rules
- node down, cpu per namespace > 75, operators down, ... - alert, if prometheus goes down? slaves by master, master by cron
- dashboards per category: namespace, deployment, controllers, scheduler, API server,
kubelet, etcd, package version (k8s, docker, dbus, rpcbind,...), prometeus, ...
- all dashboards are available on github - warroom filled with monitors, slack integration for alerts, paging - two federated prometheuses doing the aggregation in two separate data centers - security? nginx-ingress+tsl, network encrypted between datacenters
Building images on Kubernetes - don't create your own mechanism - a security nightmare - OpenShift build
- no CRI-O - priviledged pods needed
- container builder interface
- CRD - security problems: needs docker sock
- argoproj
- CRD - dind - security problems
- google container builder
- NOT on-cluster - VM sandbox - future: kaniko, FTL, more k8s native
- K8s needs a build API - caching is important, running isolated builds in VMs loses that - VMs are good at isolation - check: kaniko
Automating GPU infrastucture for Kubernetes and Container Linux - nvidia driver + library installation is picky - container linux is really stripped down, so kernel modules are hard to compile - cos-gpu-installer: ds - cool demo: live object detection: laptop webcam -> gke -> detection -> live display, rate: 4 fps
General
- multitenancy hurts/prohibits k8s native apps. major drag.
- fairly many demoers use vscode
- "demo gods" chant everytime a live demo shown