User:Feystorm28387

= Me = My real name is Patrick Hemmer. My main job is as a nix admin for an email provider. I love designing things and solving problems, both of which I'm extremely good at. My HA cluster guide is just one example of my boredom. I decided to take up the challenge of getting minecraft into pacemaker since it seemed nobody else had done it.

I do have a life outside computers, so I probably wont check my talk page very much (read: never). But if you need/want to get ahold of me, you can email me at mcw [@t] stormcloud9 [d0t] net

= High-Availability cluster =

This guide will provide the steps for creating a HA (high availability) minecraft server.

A high availability cluster is one designed to keep services running with the absolute minimum of downtime. Upon completion of this guide, minecraft will be managed by HA clustering software such that should a server fail, the software crash, you wish to shut a server down, or any other number of events, the minecraft server will stay running.

This guide is a simple active/passive setup where the second server in the cluster does not serve up any services. In another version of this guide, I will provide instructions on how to run an active/active cluster with each server running a separate minecraft world.

This guide was created on a machine running Gentoo linux, and thus some of the commands may not work on your distro (namely the package management) and some paths may be different.

Prerequisites
 * 2 servers on the same network
 * An IP to use to front the server.
 * Linux or unix variant capable of running Corosync & Pacemaker.
 * Perl. Some of the scripts used are written in perl.
 * Moderate understanding of Linux & server management. HA clustering is an advanced topic. I have tried to take care of much of the hard stuff, but when things break (and they always seem to do so), having a general knowledge of how things work will allow you to solve most issues.

Installation
All the commands in this section should be run on both servers except where noted.

DRBD
DRBD is a block device replication daemon. It allows us to have a drive on each server that is identical. Think of it as RAID-1 across multiple servers.

emerge drbd drbd-kernel

You dont have to install drbd-kernel as DRBD is included in recent kernel versions. However DRBD recommends that you make sure that the DRBD userspace version matches the kernel version. So I find it easier to just use the drbd-kernel package instead of having to worry about it every time I change kernels.

Pacemaker
Pacemaker is the utility that will manage all the services and ensure they stay running.

emerge pacemaker corosync mkdir /etc/ocf ln -s /etc/ocf /usr/lib/ocf/resource.d/etc wget -O /etc/ocf/minecraft http://stormcloud9.net/minecraft/ocf/minecraft chmod a+x /etc/ocf/minecraft wget -O /usr/local/bin/ping_minecraft.pl http://stormcloud9.net/minecraft/ping.pl chmod a+x /usr/local/bin/ping_minecraft.pl  wget -O /usr/local/bin/mux_server http://stormcloud9.net/minecraft/mux_server chmod a+x /usr/local/bin/mux_server wget -O /usr/local/bin/mux_client http://stormcloud9.net/minecraft/mux_client chmod a+x /usr/local/bin/mux_client

Minecraft server
On gentoo, the minecraft server is available in a layman overlay, and it seems to be kept fairly up-to-date, so we'll be using this.

emerge layman layman -L # we already know the overlay we want, but this makes layman download the list of available overlays layman -a java-overlay echo "source /var/lib/layman/make.conf" >> /etc/make.conf emerge minecraft-server

Run the following on one server

useradd -d /var/lib/minecraft -g games -s /sbin/nologin -r minecraft getent passwd minecraft | awk -F: '{ print $3 }'

Take the uid which is the output of the second command above (getent) and use it in the following command on the other server

useradd -d /var/lib/minecraft -g games -s /sbin/nologin -r -u  minecraft

Configuration
In all the examples in this section the following confirguation values will be used. These configuration values may not be the same as in your setup, so substitute as appropriate.

Server1 IP: 192.168.2.11

Server2 IP: 192.168.2.12

Minecraft IP: 192.168.2.21

Minecraft data device: /dev/sdb

I may also refer to server1 as node1, and server2 as node2.

In this guide I am storing data on an actual block device. You can use a file as the storage medium for the data, but you get less performance as you have to go through the filesystem layer. However as you may be limited in your options, if you wish to use a file, run the following commands on each server

dd if=/dev/zero of=/var/lib/minecraft/disk.img bs=1M count=1024

That will create a 1gb file device, which is actually pretty small. Change the `1024` value to the size in MB you wish it to be. Then in all places where /dev/sdb is referenced, substitute /var/lib/minecraft/disk.img

DRBD
Create the file /etc/drbd.d/minecraft.res with the following contents (on both nodes) resource minecraft { protocol B;	on server1 { device 		minor 1; disk			/dev/sdb; address		192.168.2.11:7789; meta-disk	internal; }	on server2 { device 		minor 1; disk			/dev/sdb; address		192.168.2.12:7789; meta-disk	internal; } } You might get slightly better performance with `protocol A` in the above configuration at the increased risk of data inconsistency in the event of a node failure. The choice is up to you. Consult the DRBD protocol documentation for the behavior of each.


 * Create the device metadata

drbdadm create-md resource


 * Start DRBD

/etc/init.d/drbd start

Do not add DRBD to any runlevel, we do not want it to automatically start on boot. We are starting it now so that we may configure it.

If the drives on both nodes contain no data on them, you can skip the initial sync that will happen when the resource is brought up with the following command. This just saves time and is not required.
 * Start replication

drbdadm new-current-uuid --clear-bitmap minecraft

On the node that you want to become the primary for the drive (data on the other node will be erased with data from this one) run

drbdadm primary --force minecraft


 * Create the filesystem

mkfs.xfs /dev/drbd/by-res/minecraft


 * Configure minecraft data directory

mount /dev/drbd/by-res/minecraft /var/lib/minecraft chown -R minecraft:games /var/lib/minecraft ln -s /dev/null /var/lib/minecraft/server.log umount /var/lib/minecraft

We are done configuring DRBD at this point, so shut it back down (both nodes again).
 * Shut down DRBD

/etc/init.d/drbd stop

Service configuration
Create /etc/corosync/corosync.conf with the following contents compatibility: none

totem { version: 2 secauth: off threads: 0 rrp_mode: passive interface { ringnumber: 0 bindnetaddr: 192.168.2.0 mcastaddr: 226.94.1.2 mcastport: 5405 } }

logging { fileline: off to_stderr: no	to_logfile: no	to_syslog: yes syslog_facility: local2 syslog_priority: debug timestamp: on	logger_subsys { subsys: AMF debug: off } }

amf { mode: disabled } The `bindnetaddr` value must be set to the network address of your network. If your servers have 2 network interfaces, the ideal solution is to set up a crossover cable between them and use that as the first interface, then create a second `interface {}` section with `ringnumber` set to 1 for the non-crossover interface. Having 2 paths for the servers to talk to each other is to help avoid a split brain situation in the event that the primary network goes down. You can also use this crossover link for DRBD if you wish and keep the DRBD replication data off the network. Also I prefer to keep the `syslog_priority` at debug, and then filter out the debug level on the syslog daemon. This is because you cant change the priority once corosync/pacemaker is started, but you can change the syslog filter. So if youre having issues you wish to troubleshoot, you dont have to shut down pacemaker to change the log level.

Create /etc/corosync/service.d/pcmk with the following contents service { name: pacemaker ver: 0 }

Pacemaker is fronted by corosync. Pacemaker is the actual brains of the cluster management, but corosync is used as the communications layer.
 * Start up pacemaker

/etc/init.d/corosync start rc-update add corosync default

Resource configuration
Run the following command until you see one of the nodes listed as 'Online'. This can take several minutes. Whenever both nodes have been down at the same time, startup can take a while. If only one node is down at a time, the service comes up immediately
 * Wait for pacemaker to start

crm status


 * Enter crm configuration mode

crm configure

The following are all the configuation commands that need to be entered into corosync. I will provide a brief explanation of each command without going beyond the scope of this guide. Just type each command in at the configure prompt.

property stonith-enabled="false"

Pacemaker will refuse to operate without a stonith configuration unless this property is set.

property no-quorum-policy="ignore"

If the other node dies, pacemaker wont do anything because it cant determine if it lost network, or if the other node lost network. This disables that behavior.

property default-resource-stickiness="INFINITY"

Whenever a node fails, and the resources migrate over to the other node, this keeps the resources from migrating back when the node comes back online. This is desirable as all your clients will get disconnected when the resource migrates.

property cluster-recheck-interval="30s"

This setting doesnt make sense now, but we will use it so that if something causes minecraft to die on one node, and it moves to the other, then after 30 seconds it will be able to move back to the first node should the second node fail.

primitive drbd ocf:linbit:drbd params drbd_resource="minecraft" meta migration-threshold="1" failure-timeout="30s" op monitor interval="10s"

This tells pacemaker to manage our DRBD replication and check it every 10 seconds to make sure its operational. Then if the resource fails once, it will shut everything down and move to the other node, and then after 30 seconds will be allowed back on this node again.

ms drbd-ms drbd meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"

This tells pacemaker that the `drbd` resource is a master/slave resource. Meaning that one node is the primary (sends replication data), and the other node is the secondary (receives replication data).

primitive filesystem ocf:heartbeat:Filesystem params device="/dev/drbd/by-res/minecraft" directory="/var/lib/minecraft" fstype="xfs" options="noatime" meta migration-threshold="1" failure-timeout="30s"

Here we tell pacemaker to mount the DRBD drive to /var/lib/minecraft. We also set 'noatime' for a minor performance gain (very very minor, but cant hurt).

primitive ip ocf:heartbeat:IPaddr2 params ip="192.168.2.21" nic="eth0" cidr_netmask="24"

This tells pacemaker to bring up the ip 192.168.2.21 on the node that is running minecraft

primitive minecraft ocf:etc:minecraft params mux_server_path="/usr/local/bin/mux_server" mux_client_path="/usr/local/bin/mux_client" mux_socket_path="/var/run/minecraft.mux" jar_path="/usr/share/minecraft-server/lib/minecraft-server.jar" datadir="/var/lib/minecraft" user="minecraft" mem_max="1024M" log_facility="local0" log_tag="minecraft" ping_path="/usr/local/bin/ping_minecraft.pl" op monitor interval="10s" meta migration-threshold="1" failure-timeout="30s" target-role="Stopped"

And here is where we actually start minecraft itself. We use the mux_server script to start it under the 'minecraft' user. Hopefully most of the options there are easy enough to figure out. Also, if you configured minecraft to listen on a port other than the default '25565', you will need to specify that like 'check_port="25566"' after the `ping_path` parameter. The `target-role="Stopped"` is so that we can generate the world. This process takes a while, and pacemaker will think minecraft is dead because its not responding. If you copied a pre-existing world to /var/lib/minecraft earlier, you can leave this parameter off and skip the world generation steps below.

colocation minecraft_1 inf: minecraft ip colocation minecraft_2 inf: minecraft filesystem colocation minecraft_3 inf: filesystem drbd-ms:Master

These say that 'minecraft' must run on the same node that 'ip' is running on, that 'minecraft' must run on the same node that the filesystem is mounted on, and so forth

order order_startup inf: drbd-ms:promote filesystem:start ip:start minecraft:start order order_stop inf: minecraft:stop ip:stop filesystem:stop drbd-ms:demote

These should be obvious. They tell pacemaker what order to start & stop the resources.

commit

This commits all the changes we just made. At this point if you did everything properly, all the resources except minecraft should start up. If you have an error in the configuration, you can type `edit` to find and fix whatever was broken (it will use what your EDITOR environment variable is set to)

Type `exit` or CTRL+D or CTRL+C to get out.

At this point we now need to create the initial minecraft world (spawn area). We could do this by just setting the pacemaker timeout really high until the world has finished generated, then lower it, but I think this is simpler.

crm status

See which node is currently running the filesystem and run the next commands on that server

cd /var/lib/minecraft /usr/local/bin/mux_server --user=minecraft -s /var/run/minecraft.mux -c "java -Xms64M -Xmx1024M -jar /usr/share/minecraft-server/lib/minecraft-server.jar nogui" -f local0 -t minecraft

Once the world is done generating send the `stop` command to shut the server down. Now tell pacemaker to start minecraft up

crm resource start minecraft

Thats it, youre done!

Frosting
The information in this section is not required, and goes above and beyond the core goal of getting the cluster up and running.

The information here is provided in a more general sense than the rest of the guide. Commands and such may not match up exactly with your system, but you should get a good idea of whats being done.

syslog-ng
I use syslog-ng for managing all my logging everywhere. Its powerful and flexible. The cluster will log quite a bit of information and so tweaking the logging configuration is almost a requirement.

minecraft
In the guide, we configured mux_server to log its data to the local0 facility. So now we setup a log filter to handle this data.

Create /etc/syslog-ng/minecraft.conf with the following contents filter f_minecraft { facility(local0) and program("minecraft"); }; rewrite r_minecraft { subst('^\> \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} ', '> ', value('MESSAGE') type('pcre')); }; destination d_minecraft { file("/var/log/minecraft.log"); }; log { source(s_local); filter(f_minecraft); rewrite(r_minecraft); destination(d_minecraft); };

Add the following to /etc/syslog-ng/syslog-ng.conf @include "minecraft.conf"

Pacemaker
In my syslog configuration for pacemaker, I like to split out all the services to separate files for easy reading, and then keep a 'messages' file with everything combined for timeline viewing. Tweak as you will.

/etc/syslog-ng/ha.conf filter f_ha { facility(local2); }; destination d_ha { file("/var/log/ha/messages"); file("/var/log/ha/$PROGRAM"); }; log { source(s_local); filter(f_ha); destination(d_ha); };

Add to /etc/syslog-ng/syslog-ng.conf @include "ha.conf"

Auto save
Create a cron job with something like the following /usr/local/bin/mux_client -s /var/run/minecraft.mux --send "say World save in 15 seconds" && sleep 15 && /usr/local/bin/mux_client -s /var/run/minecraft.mux --send "save-all"

Minecraft
You can connect to the minecraft console with the following command on the host that is currently running the service

/usr/local/bin/mux_client -s /var/run/minecraft.mux

CTRL+C to exit the minecraft console (will not shut down the server).

You can have as many client connections as you want.

Pacemaker
Pacemaker is a fairly advanced bit of software. If all goes well, you should never have to mess with it, but in the event that you have to, or you wish to learn more about it, I recommend the Clusters from Scratch documentation.


 * Check pacemaker status

crm status

This will show you if the services are running, and which node they are currently running on


 * Real-time status

crm_mon -f

The -f tells it to show fail counts so you can see when a resource failed on a node

There are multiple ways of doing this, but the way I prefer is to put the node in standby. Once the node is in standby, pacemaker will shut down all resources on that node and then start them up on the other node.
 * Move service to other node

crm node standby server1

To bring the node back online

crm node online server1

If you want to stop minecraft, but leave the filesystem mounted & DRBD running (to do maintenance or something)
 * Stop the minecraft service

crm resource stop minecraft

DRBD
DRBD is pretty good about maintaining itself. Any issues like a server shutdown or crash should be automatically handled when pacemaker starts DRBD back up. However in the event of more serious issues the documentation covers most scenarios.


 * Check DRBD status

cat /proc/drbd

On a healthy cluster, you should see "ds:UpToDate/UpToDate". This means that both the drives are currently in sync.

If the node you are running the command on is the primary, you will see "ro:Primary/Secondary". If the node is secondary, it will be 'Secondary/Primary'

mux_server/mux_client
I wrote the mux_server and mux_client scripts for several reasons and they offer significant advantages over other utils like `screen`.
 * `screen` suffers from an issue where you cant script any commands to it until you actually connect a screen client at least once.
 * mux_client allows you to send a command to the server and then wait for a response (pipe to grep, head, etc). Very useful for scripting or some type of web/management UI.
 * You can have several mux_client sessions connected at the same time (yes screen can do this too)
 * The mux_server process can send all data received from clients and output from minecraft to a log. Minecraft creates a log file yes, but you cant see any commands that were sent to it by a client.
 * When connected with the mux_client, you can see, and it is very clear what output came from the minecraft server, and what output came from other mux_client sessions.

ping_minecraft.pl
The ping_minecraft.pl script actually connects to the server, sends a command, and looks for a valid response to verify the server is operational, and not just running or accepting connections. When configuring the resource in pacemaker, you can specify a `check_ip` value of the IP that is fronting minecraft, however there seems to be an issue with this. When pacemaker goes to start minecraft, it checks to see if its already running. This check is done via the ping script. But the issue is that for some reason you will not get a 'connection refused' message connecting to the closed port on this IP within 21 seconds of it being brought up. Our start timeout is set to 20 seconds, so this is obviously a problem. Instead by not specifying `check_ip` is uses '127.0.0.1' which does have this delay issue. You could specify the `check_ip` if you want and up the timeout, but its your call

Further help
I would be willing to help with any issues you encounter along this guide. However please note there are some things I cannot do, namely helping you with the particulars of your linux distribution. I have extensive and ongoing experience with Gentoo and RedHat Enterprise, but it has been years (if ever) since I've touched other distros and cannot help you with installing packages or anything specific to those distributions (but I can probably help with the RedHat family like centos or fedora).

So with that said, if you wish to get ahold of me, just send an email to mcw {@t} stormcloud9 {d0t} net. I do have a full time job, so I may not be able to help right away, but I will try my best to get back within 12 hours, even if its to tell you I cant help.

You can also try #linux-ha on irc.freenode.net. There are a fair number of people who hang out here that have experience with Pacemaker and DRBD.