AHM 2016 Ganeti Tutorial

From neicext
Jump to navigation Jump to search

What is Ganeti

Ganeti is a cluster virtual server management software tool built on top of existing virtualization technologies such as Xen or KVM and other open source software. It is an opens source software developed mainly by Google but with significant community contributions. In Google Ganeti is used for managing staff support services like office servers and remote desktops for developers (access to sensitive data!).

It is highly suitable for hosting services at the small to medium scale with little overhead in the 2-50 nodes per cluster range.

Tutorial setup

You have received USB keys with a text file detailing which hostnames you should use for your team, as well as ssh keys that are enabled for direct root logins to the hardware. Since the USB keys are in "compatible with all laptops" vfat filesystem, permissions are not present. Start by copying them:

cp -r team-V ~/gnt-tut-keys
chmod 700 ~/gnt-tut-keys
chmod 600 ~/gnt-tut-keys/id_rsa

Then you can login directly as root on the hardware with (remember to use the hostnames from the text file whenever there are hostnames in lines like this):

ssh -i ~/gnt-tut-keys/id_rsa root@d-m12.ndgf.org

Later on it we will need to use direct logins from the master node to all other nodes, so it is appropriate to:

scp -i ~/gnt-tut-keys/id_rsa ~/gnt-tut-keys/id_rsa* root@d-m12.ndgf.org:.ssh/

Please note that this is real servers on real production networks, and while I'll wipe the servers after we're done, please don't make catastrophic security mistakes ("lets change the root password to 'admin' and allow password-based logins!") or anything malicious.

Installation tutorial

This is a condensed version of [1] where I've already made the choice of doing KVM virtual machines with DRBD and using bridged networks. In addition we're using a packaged version of Ganeti with proper dependencies instead of manual install.

We're also using one shortcut of using the host kernel for the virtual machines. This is due to grub being difficult for debootstrap instances - in a real production environment one would either script this well, instantiate based on template images, or hook into a full-blown automated installer (i.e. pxeboot instances and let them install like real servers). Setting this up from scratch is out of scope of this tutorial though (and would take days to weeks).

We are installing on three nodes per team, and using default names for network interfaces, LVM volume groups, etc to keep things simple. There are two networks on each node, a public bridged network and a private "secondary" network in rfc1918 space for replication and migration. I've already configured the network [2], since if you make mistakes there you'd need access to HPC2N systems in order to get hold of the console of the hardware.

The Ganeti commands usually look something like gnt-task verb --options, and if you issue --help you get help for that option. For instance gnt-instance add --help

First some steps that need to be done on all nodes in the cluster:

Setup DRBD

First we need to setup DRBD [3], without the normal DRBD helper tools (Ganeti will manage the volumes).

echo "options drbd minor_count=128 usermode_helper=/bin/true" > /etc/modprobe.d/drbd.conf
echo "drbd" >> /etc/modules
depmod -a
modprobe drbd

Setup LVM

The default name for the volume group Ganeti automatically creates instances from is xenvg, and on this hardware it is supposed to be created on /dev/sdb. Setting this up [4] on our hardware then becomes:

pvcreate /dev/sdb
vgcreate xenvg /dev/sdb

Install Ganeti

We're installing a slightly newer version of Ganeti than available in the main repository, from the Debian maintainer's Ubuntu PPA. There are newer versions available by source or as Debian packages, if you want to run a large Ganeti shop you might want to consider Debian as a base OS since that's what most of the developers and other contributors are running.

add-apt-repository ppa:pkg-ganeti-devel/lts
apt-get update
apt-get install ganeti

These three steps (DRBD, LVM, Install) apply to all nodes in the cluster, please do them all before proceeding to:

Initialize the cluster

On one of the nodes, we create the cluster [5]. The last argument is the name of the cluster, which is also a service IP that is held by the current cluster master. '-s' gives a secondary IP for the private network [6] of this host, the internal IP has a DNS name like d-m12-i.ndgf.org but you need to use the IP in numerical form.

gnt-cluster init --enabled-hypervisors=kvm -s 172.19.126.nnn v-gnt.tut.ndgf.org

We set some properties of the cluster separately, they can also be loaded into one big init line but for clarity we set the kernel and initrd to boot the host kernel:

gnt-cluster modify -H kvm:kernel_path=/boot/vmlinuz-3.16.0-59-generic,initrd_path=/boot/initrd.img-3.16.0-59-generic

The default limits are a bit too restrictive for tiny test guest, memory and disk sizes are in megabytes if you provide them without units:

gnt-cluster modify --ipolicy-bounds-specs min:cpu-count=1,disk-count=1,disk-size=256,memory-size=128,nic-count=1,spindle-use=1/max:cpu-count=8,disk-count=16,disk-size=1048576,memory-size=32768,nic-count=8,spindle-use=12

And some default sizes for guests so we don't have to repeat ourselves too much when creating guests:

gnt-cluster modify --ipolicy-std-specs disk-size=512,memory-size=256

Add nodes

From the master (where you just ran gnt-cluster init), add the remaining of your nodes [7]:

gnt-node add -s 172.19.126.nnn d-m13.ndgf.org
gnt-node add -s 172.19.126.nnn d-m14.ndgf.org

For this to work though, you need to be able to login directly from the master to the new nodes. In our setup, that's handled by the key on the USB stick you were handed, you just need to copy id_rsa and id_rsa.pub to /root/.ssh/ on the master node.

This will replace the ssh keys of the new nodes with the master's keys and do all kinds of things to the nodes. For a production setup it is recommended to understand what is done, or handle ssh configuration yourself while still fulfilling Ganeti's requirements of passwordless login between all master candidate nodes to all other nodes, etc.

We can see all the nodes by:

gnt-node list

Configure OS for guests

Create an Ubuntu Trusty [8] debootstrap config file /etc/ganeti/instance-debootstrap/variants/trusty.conf with the following content:

MIRROR="http://se.archive.ubuntu.com/ubuntu/"
SUITE="trusty"
EXTRA_PKGS="acpid,udev,linux-image-virtual,openssh-server,vim"
COMPONENTS="main,restricted,universe,multiverse"
ARCH="amd64"

And add the variant to /etc/ganeti/instance-debootstrap/variants.list:

default
trusty

Copy it out to all the cluster nodes [9]:

gnt-cluster copyfile /etc/ganeti/instance-debootstrap/variants.list
gnt-cluster copyfile /etc/ganeti/instance-debootstrap/variants/trusty.conf

See that it exists:

gnt-os list

Create guests

First we create the instance. I use a team letter v and a guest number 0 that doesn't exist, choose an appropriate guest from the text file:

gnt-instance add -t drbd -s 512 -o debootstrap+trusty v-g0.tut.ndgf.org

Wait 1-2 minutes. Afterwards we can see it existing with:

gnt-instance list

Then we need to fix a few things, grab the console:

gnt-instance console v-g0.tut.ndgf.org

Login as root, then fix a few things (exit the console with 'Control-]' [10]), hit enter to get a login prompt, then root as username:

passwd
mkdir .ssh
vim .ssh/authorized_keys
vim /etc/network/interfaces
ifup eth0
^]

Paste your team ssh key into root's authorized_keys, and a typical /etc/network/interfaces looks like:

auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
  address 109.105.126.nnn/24
  gateway 109.105.126.129
iface eth0 inet6 static
  address 2001:948:58:1::nnn/64
  gateway 2001:948:58:1::1

Guest migration

Login to your guest and start something continuously updating, like dstat or top - or ping it from the outside. Then in another terminal login as root on the master node and run:

gnt-instance migrate v-g0.tut.ndgf.org

See if you can notice the pause in updates as the instance is live migrated [11] between it's primary and secondary node.

Balancing and evacuation

This is using the hbal cluster balancing tool, for another strategy see "Evacuating nodes" in the reference documentation.

Create another handful of guests according to the recipe in #Create guests, create 5 more so you get 6 in total - you can skip the step of configuring network etc through the console for most of them, it doesn't matter for this particular step. The resource allocator will spread the primary and secondary copy of the data evenly, and run roughly as many on each node. Migrate a couple of the instances:

gnt-instance migrate v-g8.tut.ndgf.org
gnt-instance migrate v-g9.tut.ndgf.org

See where everything is currently running.

gnt-instance list

The gnt-instance migrate command moves the running VM between it's primary and secondary node, back and forth if you repeat it.

And then run the balancing tool hbal to see what it would like to do to even the load out:

hbal -L

This tool can also be used to evacuate a node that you want to take out of service:

hbal -L -O d-m12.ndgf.org

This would make it migrate data and instances around to fully free up the node d-XXX, moving all the disks around takes extra time and effort compared with gnt-instance migrate. Try it with the execute flag (this will take a few minutes):

hbal -X -L -O d-m12.ndgf.org

If there are any issues, just re-run it until no actions are taken. The node should then be free to be either taken out or rebooted or something, unless it is the master node. To move the master, see gnt-cluster master-failover. If you create a new instance (#Create guests) now it is likely to end up on the node you just evacuated.

Afterwards, compare

gnt-instance list

To how it looked previously.

Optionally, if you want to rebalance the cluster back, you can run:

hbal -X -L

All this instance migration is a bit on the slow side, in production I recommend at least 10Gbit/s secondary network for moving data around. This is also the reason for keeping both memory and disk sizes as small as possible in this tutorial.

Changing RAM and CPU of a guest

Sometimes your initial guess on how big a machine is supposed to be doesn't match reality. To change this after initialization, modify the guest you can first check the parameters of the guest:

gnt-instance info v-g8.tut.ndgf.org

There are several parameters regarding memory, but they are all in the backend section, and we're interested in maxmem since that's what ganeti will choose unless under special circumstances[12].

gnt-instance modify -B maxmem=256 v-g8.tut.ndgf.org

And then we need to reboot from outside the guest to make it pick up on the new value:

gnt-instance reboot v-g8.tut.ndgf.org

Afterwards you can check with free or top and see that the memory is indeed 256M.

The same way you can change the number of CPUs available to the guest:

gnt-instance modify -B vcpus=3 v-g0.tut.ndgf.org
gnt-instance reboot v-g0.tut.ndgf.org

In case you have a homogeneous cluster with high CPU usage, it might also be worth to set the KVM backend parameter for native CPU features:

gnt-instance modify -H cpu_type=host v-g0.tut.ndgf.org

Note the difference in the flags in /proc/cpuinfo - for some workloads this is very beneficial. The downside is that you can not live-migrate a guest with cpu_type=host between different model CPUs.

Adding and growing disks

Now if we realize that the root disk is too small on one of our guests, what can we do about that? Well, we could nuke and reinstall, but since we are managing services here we could start by adding a disk:

gnt-instance modify --disk add:size=256 v-g0.tut.ndgf.org
gnt-instance reboot v-g0.tut.ndgf.org

Then when logging in to the guest, the new disk should be available as vdb or so, lsblk for a full view.

But instead of adding a disk, we could also:

gnt-instance info v-g9.tut.ndgf.org

See what disks that instance has, in this case we want to enlargen disk/0, then grow it with 1GB:

gnt-instance grow-disk v-g9.tut.ndgf.org 0 1G
gnt-instance reboot v-g9.tut.ndgf.org

Log in to the guest and

fdisk /dev/vda

Delete the partition and recreate it, with the same starting block and new ending block, then:

resize2fs /dev/vda1

And see how much space there is now! (Sometimes you need to reboot to re-read the partition table)

This is not exactly convenient, so it might be a good idea for a production setup to use LVM in the guest. Then you can just pvresize etc to get more space.

After this and changing the sizes of guests, it might be interesting to see who hbal will want to redistribute the cluster.

Self-test

To check if something is wrong, a good first approach is to

gnt-cluster verify

It will run a bunch of sanity checks and warn if something looks wrong.

The Watcher

There is a regular cron job called The Watcher[13] that checks the intended state of guests vs the current state. This is the cron job in /etc/cron.d/ganeti that is installed with the ganeti package:

*/5 * * * * root [ -x /usr/sbin/ganeti-watcher ] && /usr/sbin/ganeti-watcher

So every 5 minutes things should get restarted. You can test this by finding an appropriate guest, preferably one that you have configured networking to. Starting pinging it, then look for the kvm process that runs that guest, the process looks something like this:

qemu-system-x86_64 -enable-kvm -name v-g0.tut.ndgf.org [...]

Kill it and see the pongs stop in your other window. Then wait up to five minutes and see if they automatically resume. Tip: If you're impatient, kill it shortly before an even 5 minute number on the clock.

This makes for an interesting time to run

gnt-cluster verify

And see that it actually reports errors when something is broken!

A graphical console through VNC

Sometimes even though we run servers, we might need a graphical console due to bad software. For that case you can hook up a VNC console to the guest. Note that access to this console goes via the primary node for that guest, not through the Ganeti master node.

First you need to define a VNC console, binding to loopback only:

gnt-instance modify -H vnc_bind_address=127.0.0.1 v-g9.tut.ndgf.org

Figure out which port and primary host the guest has from:

gnt-instance show v-g9.tut.ndgf.org

Then either forward an appropriate port with ssh, or use a built-in support in a VNC client like:

env VNC_VIA_CMD="ssh -f -i ~/gnt-tut-keys/id_rsa -L %L:%H:%R %G sleep 20" xtightvncviewer -via root@d-m13.ndgf.org 127.0.0.1:nnnnn

Installing a proper kernel

On a running system, it is quite possible to install grub (grub-pc package) and then

gnt-instance modify -H kernel_path=,initrd_path= v-g0.tut.ndgf.org
gnt-instance reboot v-g0.tut.ndgf.org

And make the guest boot it's own kernel. Try it, it might take a little bit of fiddling with grub to get it going. Change back to the cluster initialization values above if you break booting.

You can tell that you get a native kernel booted, by the fact that the hardware has a newer kernel thanks to a non-default Trusty kernel package, so the guest should have an older one.

A final experiment

After you feel totally done with everything, you can try to just poweroff the master node and see how much the watcher will automatically resolve, and how much you need to manually fix to get a fully redundant 2-node cluster. Some hints are gnt-cluster master-failover and gnt-cluster verify. Please note that after you power a node off, you have no way of powering it on again, so don't do this until you feel done with everything else.