OpenBSDRestartingJabber

How to restart a Jabber server

Please note: I don't know how much of this page is site-specific. Use the information on this page with caution.

Overview

We have a Jabber server on an OpenBSD machine at work, and it crashes regularly. You will find below the procedure to restart it cleanly.

The interest of this exercice is that, often, the Jabber server crashes not because of Jabber itself, but because it uses MySQL as a back-end. Let's just say I don't like MySQL a lot. Actually, I hate it. Use PostgreSQL if you want a real database, people, OK? Even plain text files are better than this pile of sh*t.

(Sorry, that's the end of my rant.)

One very specific thing about OpenBSD is that you don't have any shell script to start a service. For instance, you won't find any /etc/init.d/ scripts, like the scripts you could find under Red Hat. So, "restarting" a daemon (= service) really is (a) writing your own script or (b) killing and restarting a daemon. The best place to start a service when the machine reboots is in the /etc/rc.local file, which is executed when the machine starts.

1. How can you tell the Jabber server has crashed?

That's rather simple. A functional Jabber server spawns 5 or six processes. Looking through the process list should give you a quick answer.

For instance:

bash-3.2# ps auxwww | grep -i jabber
_jabberd 32048  0.0  0.1   392   400 p1  I     12:16PM    0:00.01 -sh -c -sh (sh)
_jabberd  4567  0.0  0.7  1824  3780 p1  S     12:16PM    0:00.14 perl -w -x /usr/local/sbin/jabberd
_jabberd  2205  0.0  0.5  1072  2612 p1  S     12:16PM    0:00.08 /usr/local/bin/../libexec/jabberd/router \
                                                                   -c /etc/jabberd/router.xml
_jabberd 20706  0.0  0.5   880  2784 p1  S     12:16PM    0:00.04 /usr/local/bin/../libexec/jabberd/c2s \
                                                                   -c /etc/jabberd/c2s.xml
_jabberd 13150  0.0  0.6   880  2832 p1  S     12:16PM    0:00.03 /usr/local/bin/../libexec/jabberd/sm \
                                                                   -c /etc/jabberd/sm.xml
_jabberd 20193  0.0  0.5   776  2476 p1  S     12:16PM    0:00.02 /usr/local/bin/../libexec/jabberd/s2s \
                                                                   -c /etc/jabberd/s2s.xml
_jabberd 15735  0.0  0.5   720  2384 p1  S     12:16PM    0:00.07 /usr/local/bin/../libexec/jabberd/resolver \
                                                                   -c /etc/jabberd/resolver.xml
_jabberd  4370  0.0  0.1   564   396 p1  S     12:17PM    0:00.01 -sh -c -sh (sh)
_jabberd 23294  0.0  0.5   952  2344 p1  S     12:17PM    0:00.02 /usr/local/bin/mu-conference -c /etc/jabberd/muc-jcr.xml

The above is - obviously - a functional Jabber server.

The following, on the other hand, is clearly not a functional server:

bash-3.2# ps auxwww | grep -i jabber
_jabberd 28536  0.0  0.1   408   396 p1  I     12:01PM    0:00.01 -sh -c -sh (sh)
_jabberd 15201  0.0  0.4   684  2212 p1  S     12:01PM    0:00.07 /usr/local/bin/mu-conference \
                                                                   -c /etc/jabberd/muc-jcr.xml

See? Simple!

This is probably mandatory if you have any problem, on any UNIX, with any application: check the logs!

Here is how to do it for the Jabber server:

bash-3.2# cd /usr/local/jabberd-2.0s11/var/jabberd/log
bash-3.2# pwd
/usr/local/jabberd-2.0s11/var/jabberd/log

bash-3.2# ls -alhF
total 3012
drwxrwxr-x  2 _jabberd  _jabberd   512B May 25 09:47 ./
drwxrwx--x  5 _jabberd  _jabberd   512B May 30  2008 ../
-rw-r--r--  1 _jabberd  _jabberd   3.0K May 28 12:17 c2s.log
-rw-r--r--  1 _jabberd  _jabberd   764K May 25 09:43 c2s.log.bz2
-rw-r--r--  1 _jabberd  _jabberd   7.4K May 28 12:17 mu-conference.log
-rw-------  1 root      _jabberd   183B May 28 12:17 nohup.out
-rw-r--r--  1 _jabberd  _jabberd  94.8K May 28 12:17 resolver.log
-rw-r--r--  1 _jabberd  _jabberd  31.9K May 28 12:17 router.log
-rw-r--r--  1 _jabberd  _jabberd   288K May 28 12:17 s2s.log
-rw-r--r--  1 _jabberd  _jabberd   233K May 28 12:17 sm.log

Now, looking around a little bit, and you find this interesting information in the logs above:

bash-3.2# tail -n 25 ./sm.log
Thu May 28 11:55:41 2009 [notice] starting up
Thu May 28 11:55:41 2009 [notice] id: jabbersrv.bigcorp.com
Thu May 28 11:55:41 2009 [info] process id is 15695, written to \
/usr/local/jabberd-2.0s11/var/jabberd/pid/sm.pid
Thu May 28 11:55:41 2009 [error] mysql: connection to database failed: \
Can't connect to local MySQL server through socket '/var/run/mysql/mysql.sock' (2)
Thu May 28 11:55:41 2009 [notice] initialisation of storage driver 'mysql' failed
Thu May 28 11:55:41 2009 [error] failed to initialise one or more storage drivers, aborting

Aha! Here is the problem: the database probably crashed and, since Jabber uses a database as a back-end, it cannot run properly and it shuts itself down. This is also confirmed by taking a look at the nohup.out file, which says essentially the same thing, but in a less detailed way.

Now that you have determined whether or not your Jabber server has crashed, you can restart it properly, starting with the database.

2. Restarting the database

Of course, this only applies if your version of Jabber uses MySQL as a back-end. Please refer to your local documentation to make sure you know which back-end is used.

First, let's identify the processes:

bash-3.2# ps auxwww | grep -i mysql
root     24191  0.0  0.1   396   420 C0  I+    Mon09AM    0:00.01 /bin/sh /usr/local/bin/mysqld_safe \
--datadir=/var/mysql --pid-file=/var/mysql/jabbersrv.bigcorp.com.pid
_mysql    2311  0.0  0.8 11732  3896 C0  S+    Mon09AM    0:01.00 /usr/local/libexec/mysqld \
 --basedir=/usr/local --datadir=/var/mysql --user=_mysql --pid-file=/var/mysql/jabbersrv.bigcorp.com.pid \
--socket=/var/mysql/mysql.sock
bash-3.2# kill -9 24191 2311

To restart, we are going to have a look at the /etc/rc.local file, as noted above, as it contains the commands necessary to start the database. On my machine, these commands are:

bash-3.2# grep -i mysql /etc/rc.local
/bin/mkdir -p /var/run/mysql
/bin/ln -s /var/mysql/mysql.sock /var/run/mysql

So, let's do this:

bash-3.2# /bin/mkdir -p /var/run/mysql
bash-3.2# /bin/ln -s /var/mysql/mysql.sock /var/run/mysql
bash-3.2# /bin/sh /usr/local/bin/mysqld_safe --datadir=/var/mysql --pid-file=/var/mysql/jabbersrv.bigcorp.com.pid &
[1] 21478
bash-3.2# Starting mysqld daemon with databases from /var/mysql

Please note the third line (starting with /bin/sh), which is actually taken from the ps -auxwwww line above.

Simple enough:

bash-3.2# ps auxwww | grep -i mysql
root     21478  0.0  0.1   488   444 p1  S     12:16PM    0:00.01 /bin/sh /usr/local/bin/mysqld_safe \
--datadir=/var/mysql --pid-file=/var/mysql/jabbersrv.bigcorp.com.pid
_mysql     669  0.0  0.8 11628  3960 p1  S     12:16PM    0:00.02 /usr/local/libexec/mysqld \
--basedir=/usr/local --datadir=/var/mysql --user=_mysql --pid-file=/var/mysql/jabbersrv.bigcorp.com.pid \
--socket=/var/mysql/mysql.sock

OK, now we know the database has restarted. Let's move on to the main attraction!

3. Restarting the Jabber server:

We will simply do the exact same thing than for the database daemon:

bash-3.2# grep -i jabber /etc/rc.local
nohup su - _jabberd -c "/usr/local/sbin/jabberd" &
nohup su - _jabberd -c "/usr/local/bin/mu-conference -c /etc/jabberd/muc-jcr.xml" &

bash-3.2# nohup su - _jabberd -c "/usr/local/sbin/jabberd" &
[2] 32048
bash-3.2# sending output to nohup.out

bash-3.2# nohup su - _jabberd -c "/usr/local/bin/mu-conference -c /etc/jabberd/muc-jcr.xml" &
[3] 4370
bash-3.2# sending output to nohup.out

Now, check if the processes are all there:

bash-3.2# ps -auxwwww | grep -i jabber
_jabberd 32048  0.0  0.1   392   400 p1  I     12:16PM    0:00.01 -sh -c -sh (sh)
_jabberd  4567  0.0  0.7  1824  3780 p1  S     12:16PM    0:08.65 perl -w -x /usr/local/sbin/jabberd
_jabberd  2205  0.0  0.5  1100  2652 p1  S     12:16PM    0:00.50 /usr/local/bin/../libexec/jabberd/router \
-c /etc/jabberd/router.xml
_jabberd 20193  0.0  0.5   776  2484 p1  S     12:16PM    0:00.07 /usr/local/bin/../libexec/jabberd/s2s \
-c /etc/jabberd/s2s.xml
_jabberd 13150  0.0  0.6   972  2952 p1  S     12:16PM    0:00.29 /usr/local/bin/../libexec/jabberd/sm \
-c /etc/jabberd/sm.xml
_jabberd 20706  0.0  0.6  1036  2900 p1  S     12:16PM    0:00.44 /usr/local/bin/../libexec/jabberd/c2s \
-c /etc/jabberd/c2s.xml
_jabberd 15735  0.0  0.5   736  2416 p1  S     12:16PM    0:00.09 /usr/local/bin/../libexec/jabberd/resolver \
-c /etc/jabberd/resolver.xml
_jabberd  4370  0.0  0.1   564   396 p1  I     12:17PM    0:00.01 -sh -c -sh (sh)
_jabberd 23294  0.0  0.5   964  2364 p1  S     12:17PM    0:00.17 /usr/local/bin/mu-conference \
-c /etc/jabberd/muc-jcr.xml

Now, that looks very good indeed!

Check the logs to make sure this is the case (see above):

Thu May 28 12:16:57 2009 [notice] starting up
Thu May 28 12:16:57 2009 [notice] id: cyrakus.sungard-finance.fr
Thu May 28 12:16:57 2009 [info] process id is 13150, written to \
/usr/local/jabberd-2.0s11/var/jabberd/pid/sm.pid
Thu May 28 12:16:57 2009 [notice] initialised storage driver 'mysql'
Thu May 28 12:16:57 2009 [notice] version: jabberd sm 2.0s11
Thu May 28 12:16:57 2009 [notice] attempting connection to router at 127.0.0.1, port=5347
Thu May 28 12:16:57 2009 [notice] connection to router established
Thu May 28 12:16:58 2009 [notice] ready for sessions

You now have a functional Jabber server!

And that's probably all you need to know about how to restart a Jabber server! ;-)

PLEASE NOTE: To make sure the screen output would be readable, I have modified the different outputs you can see above by placing things on several lines. On your machine, of course, all the lines shortened with a '\' would appear all on one line.