Kea DHCP service failing to open network socket on boot

System – Kea IPv4 DHCP server (ver 2.2.0) running on Debian 12, serving addresses on second network interface (PCIe card).

On reboot the server starts, but warns that it could not open the socket on the prescribed interface, and so is not serving any addresses. Running

systemctl status kea-dhcp4-server.service

The error is


kea-dhcp4[582]: WARN DHCPSRV_OPEN_SOCKET_FAIL failed to open socket: the interface enp3s0 is not running
kea-dhcp4[582]: INFO DHCP4_OPEN_SOCKETS_FAILED maximum number of open service sockets attempts: 0, has been exhausted without success

Restarting the service afterwards works normally. This is a known problem. There are various suggestions to fix the ordering, but the simplest (albeit slightly inelegant) way to get it to work is to tell it to retry a bunch of times with the service-sockets-max-retries and service-sockets-retry-wait-time options. So the start of the configuration looks like this:

"Dhcp4": {
    "interfaces-config": {
        "interfaces": [ "enp3s0" ],
        "dhcp-socket-type": "raw",
        "service-sockets-max-retries": 100,
        "service-sockets-retry-wait-time": 5000
    },

This retries every 5 seconds up to 100 times, which should be enough to get things in order. Note that, annoyingly, there is no message logged when a retry succeeds.

Running svnadmin verify as root permission issues on Berkeley DB repositories

After a system crash I ran svnadmin verify on some of the relevant repositories to check things were ok – this was done as root. Afterwards normal network access (via Apache) was broken with

Internal error: Berkeley DB error for filesystem

appearing in the logs. The fix was to chown -R www-data:www-data on the broken repositories. It looks like running svnadmin verify as root changes permissions on SVN repositories somewhere (at least Berkeley DB ones).

Lesson learned – check normal SVN access after doing this sort of thing!

OpenProject Apache reverse proxy with https secure connection

These are some notes on setting up OpenProject on a backend server (let’s call it backsrv.example.com), and accessing it via a front-end system (frontsrv.example.com). Normally we’d do the SSL termination at the reverse proxy, and there is some documentation on this. In this case I wanted to do things properly, and protect the login credentials all the way. This means using an https connection between the reverse proxy and the back end server.

Firstly, the reverse proxy has to trust the SSL certificate that the back end uses. There are several ways to go about this. I chose to set up a local certificate authority using the easy-rsa scripts (using another small virtual machine set up only for this purpose). For one connection this is probably overkill, but for multiple backends in the future it will make the administration a lot easier.

  • Set up CA
    • Debian 10, install easy-rsa package, do required setup.
  • Copy CA root certificate to frontsrv
    • For Debian systems, copy to /usr/local/share/ca-certificates/ and run update-ca-certificates
  • Create CSR on backsrv, copy it to CA, sign it and copy resulting certificate to backsrv. Put cert and key in sensible places (/etc/ssl/private/ and /etc/ssl/local-certs/). Make sure permissions are correct.
  • Configure Apache on backsrv and check cert works (for OpenProject edit /etc/openproject/installer.dat to put in the correct certificate paths and run openproject configure to update the config).

Set up Apache to do proxy stuff on frontsrv. Here’s the beginning fragment of default-ssl.conf that should work:

<IfModule mod_ssl.c>
        <VirtualHost _default_:443>
                ServerAdmin webmaster@localhost

                DocumentRoot /var/www/html

                RequestHeader edit Destination ^https http early

                SSLProxyEngine on
                SSLProxyCheckPeerName off

                # To openproject server on backserv
                ProxyPass /openproject https://backsrv.example.com/openproject
                ProxyPassReverse /openproject https://backsrv.example.com/openproject
                <Location /openproject>
                        ProxyPreserveHost On
                        Require all granted
                </Location>

You also need to go to the OpenProject web interface admin area, go to System Settings – General and change the Host name to the reverse proxy, and set protocol to https. It will complain if there’s a hostname mismatch (case sensitive, even!). You may also want to go to EmailEmail notifications and change the Emission email address to be consistent.

Don’t forget, need SSLProxyEngine on!

For OpenProject the subdirectory locations on the front and back ends do need to match.

The ProxyPreserveHost On is required per the OpenProject documentation. Unfortunately, that means it tries to match the name frontsrv.example.com to the back end cert, and the SSL handshake fails. This is the reason for the SSLProxyCheckPeerName off directive – it disables checking the certificate CN or Subject Alternative Names.

Apparently the SSLProxyCheckPeerName off can go in a <Proxy>...</Proxy> matching block with Apache 2.4.30 or newer, which would be nice. As it is this will turn it off for the whole vhost, which is a small lessening of security.

I suppose in principle we could create the certificate for the back end with the name of the front end, or add it to the SANs. I haven’t tried this and it seems like it could be a recipe for confusion and subtle bugs.

ANSI control codes in Jekyll output breaking emails

Also see Publishing websites with Jekyll, Apache and SVN

If you send console output via email (like, say the output of jekyll build as part of a SVN post-commit hook script) if there are ANSI control characters in the string (e.g. colour codes) this can break things. In this case the mail command (Debian 9 default exim) was only sending text up to the first ANSI code, which meant that the jekyll build error messages (which are yellow and red) were missing.

To fix this pipe the text through ansi2txt (comes with the colorized-logs package in Debian and Ubuntu). This strips out all ANSI control codes making the string email safe.

(After this I pipe it through unix2dos to convert to CRLF line endings, as this appears to be the standard for email. On Debian this comes with the dos2unix package.)

The last line in the hook script then becomes

echo "$LOGVAR" | /usr/bin/ansi2txt | /usr/bin/unix2dos | mail -s "$REPOS_BASENAME build $REV" "$BUILD_EMAIL"

FDQNs in FlexNet license files

When trying to query FlexNet licenses using lmutil or similar from systems with a different DNS suffix, make sure that the license file server name contains the FDQN for the SERVER line. If not you can find that lmutils complains that the lmgrd process is not running, even though you can run the actual program with the appropriate license.

For example, with a line in the licence file like

SERVER servername 4eca3b4b8326 1055

running

lmutil lmstat -a -c 1055@servername.physics.gla.ac.uk

results in a HOST_NOT_FOUND error. Running the Client ANSLIC_ADMIN Utility gives the same error. However, ANSYS fires up as normal.

To fix this put the FDQN in the license file (refreshing the licence server afterwards!)

SERVER servername.physics.gla.ac.uk 4eca3b4b8326 1055

The lmutil queries should then work as normal.

Monitoring GPU temperatures with nvidia-smi and Check MK (OMD)

In the previous post on this subject we used code from Technische Universität Kaiserslautern to monitor our GPUs using OMD checkmk (now checkmk raw). With some new RTX2080s installed this broke, as the nvidia-smi check doesn’t report anything for ECC errors (rather than 0, as previous cards did). The solution was to remove the ECC checking completely.

The new scripts are:

On the client system in /usr/lib/check_mk_agent/local/ (or plugins/)

if which nvidia-smi >/dev/null; then
   echo '<<<nvidia_smi>>>'
   nvidia-smi -q -x > /tmp/.check_mk_nvidia_smi
   cards=$(xml_grep --text_only 'nvidia_smi_log/attached_gpus' /tmp/.check_mk_nvidia_smi | tr -d ' ')
   IFS=$'\n' names=($(xml_grep --text_only 'nvidia_smi_log/gpu/product_name' /tmp/.check_mk_nvidia_smi | tr -d ' '))
   IFS=$'\n' fan_speed=($(xml_grep --text_only 'nvidia_smi_log/gpu/fan_speed' /tmp/.check_mk_nvidia_smi | tr -d ' '))
   IFS=$'\n' gpu_utilization=($(xml_grep --text_only 'nvidia_smi_log/gpu/utilization/gpu_util' /tmp/.check_mk_nvidia_smi | tr -d ' '))
   IFS=$'\n' mem_utilization=($(xml_grep --text_only 'nvidia_smi_log/gpu/utilization/memory_util' /tmp/.check_mk_nvidia_smi | tr -d ' '))
   IFS=$'\n' temperature=($(xml_grep --text_only 'nvidia_smi_log/gpu/temperature/gpu_temp' /tmp/.check_mk_nvidia_smi | tr -d ' '))
   IFS=$'\n' power_draw=($(xml_grep --text_only 'nvidia_smi_log/gpu/power_readings/power_draw' /tmp/.check_mk_nvidia_smi | tr -d ' '))
   IFS=$'\n' power_limit=($(xml_grep --text_only 'nvidia_smi_log/gpu/power_readings/power_limit' /tmp/.check_mk_nvidia_smi | tr -d ' '))

   for i in $(seq 1 $cards) ; do
       index=$(($i - 1))
       fan_speed[$index]=${fan_speed[$index]/\%/}
       gpu_utilization[$index]=${gpu_utilization[$index]/\%/}
       mem_utilization[$index]=${mem_utilization[$index]/\%/}
       temperature[$index]=${temperature[$index]/C/}
       power_draw[$index]=${power_draw[$index]/W/}
       power_limit[$index]=${power_limit[$index]/W/}
       echo "$index ${names[$index]} ${fan_speed[$index]} ${gpu_utilization[$index]} ${mem_utilization[$index]} ${temperature[$index]} ${power_draw[$index]} ${power_limit[$index]}"
   done
fi
[/code title="nvidia_smi" lang="python"]

Don't forget to make it executable! You also need xml_grep installed.

On the OMD server at <code>/omd/sites/omd_XYZ/local/share/check_mk/checks/</code>


#!/usr/bin/python
# -*- encoding: utf-8; py-indent-offset: 4 -*-
# +------------------------------------------------------------------+
# |             ____ _               _        __  __ _  __           |
# |            / ___| |__   ___  ___| | __   |  \/  | |/ /           |
# |           | |   | '_ \ / _ \/ __| |/ /   | |\/| | ' /            |
# |           | |___| | | |  __/ (__|   <    | |  | | . \            |
# |            \____|_| |_|\___|\___|_|\_\___|_|  |_|_|\_\           |
# |                                                                  |
# | Copyright Mathias Kettner 2012             mk@mathias-kettner.de |
# +------------------------------------------------------------------+
#
# This file is part of Check_MK.
# The official homepage is at http://mathias-kettner.de/check_mk.
#
# check_mk is free software;  you can redistribute it and/or modify it
# under the  terms of the  GNU General Public License  as published by
# the Free Software Foundation in version 2.  check_mk is  distributed
# in the hope that it will be useful, but WITHOUT ANY WARRANTY;  with-
# out even the implied warranty of  MERCHANTABILITY  or  FITNESS FOR A
# PARTICULAR PURPOSE. See the  GNU General Public License for more de-
# ails.  You should have  received  a copy of the  GNU  General Public
# License along with GNU Make; see the file  COPYING.  If  not,  write
# to the Free Software Foundation, Inc., 51 Franklin St,  Fifth Floor,
# Boston, MA 02110-1301 USA.

#######################################
# Check developed by
#######################################
# Dr. Markus Hillenbrand
# University of Kaiserslautern, Germany
# hillenbr@rhrk.uni-kl.de
#######################################

# the inventory functions

def inventory_nvidia_smi_fan(info):
    inventory = []
    for line in info:
        if line[2] != 'N/A':
           inventory.append( ("GPU"+line[0], "", None) )
    return inventory
def inventory_nvidia_smi_gpuutil(info):
    inventory = []
    for line in info:
        if line[3] != 'N/A':
           inventory.append( ("GPU"+line[0], "", None) )
    return inventory
def inventory_nvidia_smi_memutil(info):
    inventory = []
    for line in info:
        if line[4] != 'N/A':
           inventory.append( ("GPU"+line[0], "", None) )
    return inventory
def inventory_nvidia_smi_temp(info):
    inventory = []
    for line in info:
        if line[5] != 'N/A':
           inventory.append( ("GPU"+line[0], "", None) )
    return inventory
def inventory_nvidia_smi_power(info):
    inventory = []
    for line in info:
        if line[6] != 'N/A' and line[7] != "N/A":
           inventory.append( ("GPU"+line[0], "", None) )
    return inventory

# the check functions

def check_nvidia_smi_fan(item, params, info):
    for line in info:
        if "GPU"+line[0] == item:
           value = int(line[2])
           perfdata = [('fan', value, 90, 95, 0, 100 )]
           if value > 95:
              return (2, "CRITICAL - %s fan speed is %d%%" % (line[1], value), perfdata)
           elif value > 90:
              return (1, "WARNING - %s fan speed is %d%%" % (line[1], value), perfdata)
           else:
              return (0, "OK - %s fan speed is %d%%" % (line[1], value), perfdata)
    return (3, "UNKNOWN - GPU %s not found in agent output" % item)

def check_nvidia_smi_gpuutil(item, params, info):
    for line in info:
        if "GPU"+line[0] == item:
           value = int(line[3])
           perfdata = [('gpuutil', value, 100, 100, 0, 100 )]
           return (0, "OK - %s utilization is %s%%" % (line[1], value), perfdata)
    return (3, "UNKNOWN - GPU %s not found in agent output" % item)

def check_nvidia_smi_memutil(item, params, info):
    for line in info:
        if "GPU"+line[0] == item:
           value = int(line[4])
           perfdata = [('memutil', value, 100, 100, 0, 100 )]
           if value > 95:
              return (2, "CRITICAL - %s memory utilization is %d%%" % (line[1], value), perfdata)
           elif value > 90:
              return (1, "WARNING - %s memory utilization is %d%%" % (line[1], value), perfdata)
           else:
              return (0, "OK - %s memory utilization is %d%%" % (line[1], value), perfdata)
    return (3, "UNKNOWN - GPU %s not found in agent output" % item)

def check_nvidia_smi_temp(item, params, info):
    for line in info:
        if "GPU"+line[0] == item:
           value = int(line[5])
           perfdata = [('temp', value, 80, 90, 0, 95 )]
           if value > 90:
              return (2, "CRITICAL - %s temperature is %dC" % (line[1], value), perfdata)
           elif value > 80:
              return (1, "WARNING - %s temperature is %dC" % (line[1], value), perfdata)
           else:
              return (0, "OK - %s temperature is %dC" % (line[1], value), perfdata)
    return (3, "UNKNOWN - GPU %s not found in agent output" % item)

def check_nvidia_smi_power(item, params, info):
    for line in info:
        if "GPU"+line[0] == item:
           draw = float(line[6])
           limit = float(line[7])
           value = draw * 100.0 / limit
           perfdata = [('power', draw, limit * 0.8, limit * 0.9, 0, limit )]
           if value > 90:
              return (2, "CRITICAL - %s power utilization is %d%% of %dW" % (line[1], value, limit), perfdata)
           elif value > 80:
              return (1, "WARNING - %s power utilization is %d%% of %dW" % (line[1], value, limit), perfdata)
           else:
              return (0, "OK - %s power utilization is %d%% of %dW" % (line[1], value, limit), perfdata)
    return (3, "UNKNOWN - GPU %s not found in agent output" % item)

# declare the check to Check_MK

check_info['nvidia_smi.fan']     = (check_nvidia_smi_fan,     "%s fan speed"      , 1, inventory_nvidia_smi_fan)
check_info['nvidia_smi.gpuutil'] = (check_nvidia_smi_gpuutil, "%s utilization"    , 1, inventory_nvidia_smi_gpuutil)
check_info['nvidia_smi.memutil'] = (check_nvidia_smi_memutil, "%s memory"         , 1, inventory_nvidia_smi_memutil)
check_info['nvidia_smi.temp']    = (check_nvidia_smi_temp,    "%s temperature"    , 1, inventory_nvidia_smi_temp)
check_info['nvidia_smi.power']   = (check_nvidia_smi_power,   "%s power"          , 1, inventory_nvidia_smi_power)

To get the pretty indicators put this in /omd/sites/omd_XYZ/share/check_mk/web/plugins/perfometer/

#!/usr/bin/python

def perfometer_nvidia_smi_fan(row, check_command, perf_data):
    varname, value, unit, warn, crit, minn, maxx = perf_data[0]
    perc_used = 100 * (float(value) / float(maxx))
    perc_free = 100 - float(perc_used)
    return str(value)+" %", '<table><tr>' \
                               + perfometer_td(perc_used, '#0f8') \
                               + perfometer_td(perc_free, '#fff') \
                               + '</tr></table>'
def perfometer_nvidia_smi_gpuutil(row, check_command, perf_data):
    varname, value, unit, warn, crit, minn, maxx = perf_data[0]
    perc_used = 100 * (float(value) / float(maxx))
    perc_free = 100 - float(perc_used)
    return str(value)+" %", '<table><tr>' \
                               + perfometer_td(perc_used, '#0f8') \
                               + perfometer_td(perc_free, '#fff') \
                               + '</tr></table>'
def perfometer_nvidia_smi_memutil(row, check_command, perf_data):
    varname, value, unit, warn, crit, minn, maxx = perf_data[0]
    perc_used = 100 * (float(value) / float(maxx))
    perc_free = 100 - float(perc_used)
    return str(value)+" %", '<table><tr>' \
                               + perfometer_td(perc_used, '#0f8') \
                               + perfometer_td(perc_free, '#fff') \
                               + '</tr></table>'
def perfometer_nvidia_smi_temp(row, check_command, perf_data):
    varname, value, unit, warn, crit, minn, maxx = perf_data[0]
    perc_used = 100 * (float(value) / float(maxx))
    perc_free = 100 - float(perc_used)
    return str(value)+" C", '<table><tr>' \
                               + perfometer_td(perc_used, '#0f8') \
                               + perfometer_td(perc_free, '#fff') \
                               + '</tr></table>'
def perfometer_nvidia_smi_power(row, check_command, perf_data):
    varname, value, unit, warn, crit, minn, maxx = perf_data[0]
    perc_used = 100 * (float(value) / float(maxx))
    perc_free = 100 - float(perc_used)
    return str(value)+" W", '<table><tr>' \
                               + perfometer_td(perc_used, '#0f8') \
                               + perfometer_td(perc_free, '#fff') \
                               + '</tr></table>'

perfometers['check_mk-nvidia_smi.fan']     = perfometer_nvidia_smi_fan
perfometers['check_mk-nvidia_smi.gpuutil'] = perfometer_nvidia_smi_gpuutil
perfometers['check_mk-nvidia_smi.memutil'] = perfometer_nvidia_smi_memutil
perfometers['check_mk-nvidia_smi.temp']    = perfometer_nvidia_smi_temp
perfometers['check_mk-nvidia_smi.power']   = perfometer_nvidia_smi_power

Publishing websites with Jekyll, Apache and SVN

Now I’ve got this working to some extent here are some notes about setting up Jekyll with SVN and Apache:

Server – Debian 9 Stretch, normal command-line only install. Set up system to use email server (campus smarthost in our case).

Install SVN and Apache and set up accordingly.

Install Jekyll:

apt install jekyll

Create an SVN repository for the site files.

Create new project directories at a temporary location, e.g.

jekyll new /tmp/newsite

Commit these files to the SVN repository (I normally check out the repository on my local workstation, copy the directory in /tmp from the server into the working directory on the workstation, add them and commit). Delete the directory in /tmp.

On the server, create the actual website file location by exporting from the SVN via a temporary location:

svn export file:///path/to/repository /tmp/buildfiles
jekyll build --source /tmp/buildfiles /var/www/sitename
rm -Rf /tmp/buildfiles

Configure Apache to serve from /var/www/sitename. In our case we ultimately wanted to serve multiple sites through a reverse proxy, so we used a vhost serving on an alternate port. This can be a handy testing configuration – you don’t have to worry about fiddling with the other website settings. For example, using port 8081:

<VirtualHost *:8081>

    ServerAdmin webmaster@localhost
    DocumentRoot /var/www/sitename

</VirtualHost>

(Remember to change ports.conf to listen on the new port!)

Test by pointing a webserver at server:8081

Once that’s all working, set up the post-commit hook script to automatically build the site on a commit. Our current setup is:

#!/bin/sh

# POST-COMMIT HOOK

REPOS="$1"
REV="$2"
REPOS_BASENAME=$(/usr/bin/basename "$REPOS")
TMP_SVN_EXPORT="/tmp/$REPOS_BASENAME"

# These two need configured!
PUBLIC_WWW="/var/www/sitename"
BUILD_EMAIL="your.email@this.address"

"$REPOS"/hooks/mailer.py commit "$REPOS" $REV "$REPOS"/hooks/mailer.conf

LOGVAR=$(/export0/svn_config/jekyll_build.sh "$REPOS" $REV "$TMP_SVN_EXPORT" "$PUBLIC_WWW" 2>&1)

echo "$LOGVAR" | /usr/bin/unix2dos | mail -s "$REPOS_BASENAME build $REV" "$BUILD_EMAIL"

(Note that on Debian you need to install the dos2unix package. Needed as plain text email expects CRLF line terminators as specified in RFC 2822.)

#!/bin/sh

REPOS="$1"
REV="$2"
TMP_SVN_EXPORT="$3"
PUBLIC_WWW="$4"

/usr/bin/svn export --quiet file:///"$REPOS" "$TMP_SVN_EXPORT"
/usr/bin/jekyll build --source "$TMP_SVN_EXPORT" --destination "$PUBLIC_WWW"
rm -Rf "$TMP_SVN_EXPORT"

Note that the build process runs under the Apache user account, so set permissions appropriately. Also, when troubleshooting remember that on Debian 9 the Apache process is configured by default to use a private /tmp directory!

This works for our current needs, although it isn’t optimised. Improvements would be:

  • Unify the setup for the commit email and build email scripts.
  • Build the site in the background (although you’d have to tweak how the logging output works in that case).

Of course, the professionals would use something like a combination of GitLab and Jenkins to automate this stuff properly…

 

Private /tmp directories in Debian 9 Stretch with Apache

In Debian 9 Stretch Apache is configured to use systemd‘s PrivateTmp feature by default. This means that the Apache tmp directory actually lives in /tmp/systemd-private-BIGLONGSTRING--apache2.service-STRING.

So if you are running an SVN server that uses Apache for serving, anything written to /tmp in the hook scripts ends up in the private directory rather than the normal userspace one.

OneDrive for Business High CPU

Had an issue where OneDrive for Business (installed with Office 365) was constantly using one CPU core. None if the fixes involving the cache or resetting the client worked. The problem seems to have started with recent (possibly after Office 365 1712 8827.2148) updates. What did work was the solution in this thread:

https://social.technet.microsoft.com/Forums/en-US/c968088a-cabb-45bb-b171-0fe937ac1e1c/onedrive-for-business-uses-high-cpu-since-office-365-1712-88272148?forum=sharepointgeneral

Condensed version: Stop using the old client (groove.exe) and use the personal client instead, which now seems to work (at least, the latest Windows 10 version) with business accounts as well.

(Note that this doesn’t apply to connections to onsite hosted sharepoint drives – there is apparently a fix coming sometime.)

The sequence is:

  1. Stop the OneDrive for Business client (right-click the system tray icon and choose exit, or kill it otherwise).
  2. Disable it from starting (use msconfig as an easy way to do this).
  3. Remove the existing OneDrive for Business folders (move them to a backup location)
  4. Open the personal OneDrive settings.
  5. Add an account and connect to the business account.