AHM 2016 Ganeti Tutorial
What is Ganeti
Ganeti is a cluster virtual server management software tool built on top of existing virtualization technologies such as Xen or KVM and other open source software. It is an opens source software developed mainly by Google but with significant community contributions. In Google Ganeti is used for managing staff support services like office servers and remote desktops for developers (access to sensitive data!).
It is highly suitable for hosting services at the small to medium scale with little overhead in the 2-50 nodes per cluster range.
Tutorial setup
You have received USB keys with a text file detailing which hostnames you should use for your team, as well as ssh keys that are enabled for direct root logins to the hardware. Since the USB keys are in "compatible with all laptops" vfat filesystem, permissions are not present. Start by copying them:
cp -r team-V ~/gnt-tut-keys chmod 700 ~/gnt-tut-keys chmod 600 ~/gnt-tut-keys/id_rsa
Then you can login directly as root on the hardware with (remember to use the hostnames from the text file whenever there are hostnames in lines like this):
ssh -i ~/gnt-tut-keys/id_rsa root@d-m12.ndgf.org
Later on it we will need to use direct logins from the master node to all other nodes, so it is appropriate to:
scp -i ~/gnt-tut-keys/id_rsa ~/gnt-tut-keys/id_rsa* root@d-m12.ndgf.org:.ssh/
Please note that this is real servers on real production networks, and while I'll wipe the servers after we're done, please don't make catastrophic security mistakes ("lets change the root password to 'admin' and allow password-based logins!") or anything malicious.
Installation tutorial
This is a condensed version of [1] where I've already made the choice of doing KVM virtual machines with DRBD and using bridged networks. In addition we're using a packaged version of Ganeti with proper dependencies instead of manual install.
We're also using one shortcut of using the host kernel for the virtual machines. This is due to grub being difficult for debootstrap instances - in a real production environment one would either script this well, instantiate based on template images, or hook into a full-blown automated installer (i.e. pxeboot instances and let them install like real servers). Setting this up from scratch is out of scope of this tutorial though (and would take days to weeks).
We are installing on three nodes per team, and using default names for network interfaces, LVM volume groups, etc to keep things simple. There are two networks on each node, a public bridged network and a private "secondary" network in rfc1918 space for replication and migration. I've already configured the network [2], since if you make mistakes there you'd need access to HPC2N systems in order to get hold of the console of the hardware.
The Ganeti commands usually look something like gnt-task verb --options
, and if you issue --help
you get help for that option. For instance gnt-instance add --help
First some steps that need to be done on all nodes in the cluster:
Setup DRBD
First we need to setup DRBD [3], without the normal DRBD helper tools (Ganeti will manage the volumes).
echo "options drbd minor_count=128 usermode_helper=/bin/true" > /etc/modprobe.d/drbd.conf echo "drbd" >> /etc/modules depmod -a modprobe drbd
Setup LVM
The default name for the volume group Ganeti automatically creates instances from is xenvg
, and on this hardware it is supposed to be created on /dev/sdb
. Setting this up [4] on our hardware then becomes:
pvcreate /dev/sdb vgcreate xenvg /dev/sdb
Install Ganeti
We're installing a slightly newer version of Ganeti than available in the main repository, from the Debian maintainer's Ubuntu PPA. There are newer versions available by source or as Debian packages, if you want to run a large Ganeti shop you might want to consider Debian as a base OS since that's what most of the developers and other contributors are running.
add-apt-repository ppa:pkg-ganeti-devel/lts apt-get update apt-get install ganeti
These three steps (DRBD, LVM, Install) apply to all nodes in the cluster, please do them all before proceeding to:
Initialize the cluster
On one of the nodes, we create the cluster [5]. The last argument is the name of the cluster, which is also a service IP that is held by the current cluster master. '-s' gives a secondary IP for the private network [6] of this host, the internal IP has a DNS name like d-m12-i.ndgf.org
but you need to use the IP in numerical form.
gnt-cluster init --enabled-hypervisors=kvm -s 172.19.126.nnn v-gnt.tut.ndgf.org
We set some properties of the cluster separately, they can also be loaded into one big init line but for clarity we set the kernel and initrd to boot the host kernel:
gnt-cluster modify -H kvm:kernel_path=/boot/vmlinuz-3.16.0-59-generic,initrd_path=/boot/initrd.img-3.16.0-59-generic
The default limits are a bit too restrictive for tiny test guest, memory and disk sizes are in megabytes if you provide them without units:
gnt-cluster modify --ipolicy-bounds-specs min:cpu-count=1,disk-count=1,disk-size=256,memory-size=128,nic-count=1,spindle-use=1/max:cpu-count=8,disk-count=16,disk-size=1048576,memory-size=32768,nic-count=8,spindle-use=12
And some default sizes for guests so we don't have to repeat ourselves too much when creating guests:
gnt-cluster modify --ipolicy-std-specs disk-size=512,memory-size=256
Add nodes
From the master (where you just ran gnt-cluster init
), add the remaining of your nodes [7]:
gnt-node add -s 172.19.126.nnn d-m13.ndgf.org gnt-node add -s 172.19.126.nnn d-m14.ndgf.org
For this to work though, you need to be able to login directly from the master to the new nodes. In our setup, that's handled by the key on the USB stick you were handed, you just need to copy id_rsa
and id_rsa.pub
to /root/.ssh/
on the master node.
This will replace the ssh keys of the new nodes with the master's keys and do all kinds of things to the nodes. For a production setup it is recommended to understand what is done, or handle ssh configuration yourself while still fulfilling Ganeti's requirements of passwordless login between all master candidate nodes to all other nodes, etc.
We can see all the nodes by:
gnt-node list
Configure OS for guests
Create an Ubuntu Trusty [8] debootstrap config file /etc/ganeti/instance-debootstrap/variants/trusty.conf
with the following content:
MIRROR="http://se.archive.ubuntu.com/ubuntu/" SUITE="trusty" EXTRA_PKGS="acpid,udev,linux-image-virtual,openssh-server,vim" COMPONENTS="main,restricted,universe,multiverse" ARCH="amd64"
And add the variant to /etc/ganeti/instance-debootstrap/variants.list
:
default trusty
Copy it out to all the cluster nodes [9]:
gnt-cluster copyfile /etc/ganeti/instance-debootstrap/variants.list gnt-cluster copyfile /etc/ganeti/instance-debootstrap/variants/trusty.conf
See that it exists:
gnt-os list
Create guests
First we create the instance. I use a team letter v
and a guest number 0
that doesn't exist, choose an appropriate guest from the text file:
gnt-instance add -t drbd -s 512 -o debootstrap+trusty v-g0.tut.ndgf.org
Wait 1-2 minutes. Afterwards we can see it existing with:
gnt-instance list
Then we need to fix a few things, grab the console:
gnt-instance console v-g0.tut.ndgf.org
Login as root, then fix a few things (exit the console with 'Control-]' [10]), hit enter to get a login prompt, then root as username:
passwd mkdir .ssh vim .ssh/authorized_keys vim /etc/network/interfaces ifup eth0 ^]
Paste your team ssh key into root's authorized_keys, and a typical /etc/network/interfaces
looks like:
auto lo iface lo inet loopback auto eth0 iface eth0 inet static address 109.105.126.nnn/24 gateway 109.105.126.129 iface eth0 inet6 static address 2001:948:58:1::nnn/64 gateway 2001:948:58:1::1
Guest migration
Login to your guest and start something continuously updating, like dstat or top - or ping it from the outside. Then in another terminal login as root on the master node and run:
gnt-instance migrate v-g0.tut.ndgf.org
See if you can notice the pause in updates as the instance is live migrated [11] between it's primary and secondary node.
Balancing and evacuation
This is using the hbal
cluster balancing tool, for another strategy see "Evacuating nodes" in the reference documentation.
Create another handful of guests according to the recipe in #Create guests, create 5 more so you get 6 in total - you can skip the step of configuring network etc through the console for most of them, it doesn't matter for this particular step. The resource allocator will spread the primary and secondary copy of the data evenly, and run roughly as many on each node. Migrate a couple of the instances:
gnt-instance migrate v-g8.tut.ndgf.org gnt-instance migrate v-g9.tut.ndgf.org
See where everything is currently running.
gnt-instance list
The gnt-instance migrate
command moves the running VM between it's primary and secondary node, back and forth if you repeat it.
And then run the balancing tool hbal
to see what it would like to do to even the load out:
hbal -L
This tool can also be used to evacuate a node that you want to take out of service:
hbal -L -O d-m12.ndgf.org
This would make it migrate data and instances around to fully free up the node d-XXX
, moving all the disks around takes extra time and effort compared with gnt-instance migrate
. Try it with the execute flag (this will take a few minutes):
hbal -X -L -O d-m12.ndgf.org
If there are any issues, just re-run it until no actions are taken. The node should then be free to be either taken out or rebooted or something, unless it is the master node. To move the master, see gnt-cluster master-failover
. If you create a new instance (#Create guests) now it is likely to end up on the node you just evacuated.
Afterwards, compare
gnt-instance list
To how it looked previously.
Optionally, if you want to rebalance the cluster back, you can run:
hbal -X -L
All this instance migration is a bit on the slow side, in production I recommend at least 10Gbit/s secondary network for moving data around. This is also the reason for keeping both memory and disk sizes as small as possible in this tutorial.
Changing RAM and CPU of a guest
Sometimes your initial guess on how big a machine is supposed to be doesn't match reality. To change this after initialization, modify the guest you can first check the parameters of the guest:
gnt-instance info v-g8.tut.ndgf.org
There are several parameters regarding memory, but they are all in the backend section, and we're interested in maxmem since that's what ganeti will choose unless under special circumstances[12].
gnt-instance modify -B maxmem=256 v-g8.tut.ndgf.org
And then we need to reboot from outside the guest to make it pick up on the new value:
gnt-instance reboot v-g8.tut.ndgf.org
Afterwards you can check with free or top and see that the memory is indeed 256M.
The same way you can change the number of CPUs available to the guest:
gnt-instance modify -B vcpus=3 v-g0.tut.ndgf.org gnt-instance reboot v-g0.tut.ndgf.org
In case you have a homogeneous cluster with high CPU usage, it might also be worth to set the KVM backend parameter for native CPU features:
gnt-instance modify -H cpu_type=host v-g0.tut.ndgf.org
Note the difference in the flags in /proc/cpuinfo
- for some workloads this is very beneficial. The downside is that you can not live-migrate a guest with cpu_type=host
between different model CPUs.
Adding and growing disks
Now if we realize that the root disk is too small on one of our guests, what can we do about that? Well, we could nuke and reinstall, but since we are managing services here we could start by adding a disk:
gnt-instance modify --disk add:size=256 v-g0.tut.ndgf.org gnt-instance reboot v-g0.tut.ndgf.org
Then when logging in to the guest, the new disk should be available as vdb
or so, lsblk
for a full view.
But instead of adding a disk, we could also:
gnt-instance info v-g9.tut.ndgf.org
See what disks that instance has, in this case we want to enlargen disk/0
, then grow it with 1GB:
gnt-instance grow-disk v-g9.tut.ndgf.org 0 1G gnt-instance reboot v-g9.tut.ndgf.org
Log in to the guest and
fdisk /dev/vda
Delete the partition and recreate it, with the same starting block and new ending block, then:
resize2fs /dev/vda1
And see how much space there is now! (Sometimes you need to reboot to re-read the partition table)
This is not exactly convenient, so it might be a good idea for a production setup to use LVM in the guest. Then you can just pvresize
etc to get more space.
After this and changing the sizes of guests, it might be interesting to see who hbal
will want to redistribute the cluster.
Self-test
To check if something is wrong, a good first approach is to
gnt-cluster verify
It will run a bunch of sanity checks and warn if something looks wrong.
The Watcher
There is a regular cron job called The Watcher[13] that checks the intended state of guests vs the current state. This is the cron job in /etc/cron.d/ganeti
that is installed with the ganeti package:
*/5 * * * * root [ -x /usr/sbin/ganeti-watcher ] && /usr/sbin/ganeti-watcher
So every 5 minutes things should get restarted. You can test this by finding an appropriate guest, preferably one that you have configured networking to. Starting pinging it, then look for the kvm
process that runs that guest, the process looks something like this:
qemu-system-x86_64 -enable-kvm -name v-g0.tut.ndgf.org [...]
Kill it and see the pongs stop in your other window. Then wait up to five minutes and see if they automatically resume. Tip: If you're impatient, kill it shortly before an even 5 minute number on the clock.
This makes for an interesting time to run
gnt-cluster verify
And see that it actually reports errors when something is broken!
A graphical console through VNC
Sometimes even though we run servers, we might need a graphical console due to bad software. For that case you can hook up a VNC console to the guest. Note that access to this console goes via the primary node for that guest, not through the Ganeti master node.
First you need to define a VNC console, binding to loopback only:
gnt-instance modify -H vnc_bind_address=127.0.0.1 v-g9.tut.ndgf.org
Figure out which port and primary host the guest has from:
gnt-instance show v-g9.tut.ndgf.org
Then either forward an appropriate port with ssh, or use a built-in support in a VNC client like:
env VNC_VIA_CMD="ssh -f -i ~/gnt-tut-keys/id_rsa -L %L:%H:%R %G sleep 20" xtightvncviewer -via root@d-m13.ndgf.org 127.0.0.1:nnnnn
Installing a proper kernel
On a running system, it is quite possible to install grub (grub-pc
package) and then
gnt-instance modify -H kernel_path=,initrd_path= v-g0.tut.ndgf.org gnt-instance reboot v-g0.tut.ndgf.org
And make the guest boot it's own kernel. Try it, it might take a little bit of fiddling with grub to get it going. Change back to the cluster initialization values above if you break booting.
You can tell that you get a native kernel booted, by the fact that the hardware has a newer kernel thanks to a non-default Trusty kernel package, so the guest should have an older one.
A final experiment
After you feel totally done with everything, you can try to just poweroff
the master node and see how much the watcher will automatically resolve, and how much you need to manually fix to get a fully redundant 2-node cluster. Some hints are gnt-cluster master-failover
and gnt-cluster verify
. Please note that after you power a node off, you have no way of powering it on again, so don't do this until you feel done with everything else.