Dell Latitude 7490 freezing when unplugging USB3 WD15 dock

Setup – Dell Latitude 7490 running Ubuntu 18.04 Bionic and Dell WD15 USB-C dock.

Problem – system freezes when dock unplugged.

This problem started after updates. The solution found was to revert to the previous kernel (4.15.0-43-generic) from 4.15.0-44-generic. Did this by setting GRUB to remember the boot setting – change /etc/grub/grub with:

GRUB_DEFAULT=saved
GRUB_SAVEDEFAULT=true

and run

update-grub

Then hit esc at the loading screen to get to the grub menu.

Making nis authentication work with Ubuntu 16, Debian 8, Fedora 28 etc.

After updating anything to use systemd-235 NIS logins either don’t work at all (usually for GUI logins), or take a long time to login (console or ssh, sometimes). The culprit is a line in the systemd-logind.service:

IPAddressDeny=any

This sandboxes the service and doesn’t allow it to talk to the network. Unfortunately this affects nis lookups done via the glibc NSS API. See the links at https://github.com/systemd/systemd/pull/7343

The quick solution is to turn off the sandboxing, either by commenting out or changing the line in systemd-logind.service, or creating a drop-in snippet that overrides it. This can be done by creating a file /etc/systemd/system/systemd-logind.service.d/IPAddress_clear.conf with the contents:

[Service]
IPAddressDeny=

The file can be called anything you like (.conf).

Then restart things:

systemctl daemon-reload
systemctl restart systemd-logind.service

You can check that the drop-in is being loaded with

systemctl status systemd-logind.service

In the output you should see something like:

   Loaded: loaded (/lib/systemd/system/systemd-logind.service; static; vendor preset: enabled)
  Drop-In: /etc/systemd/system/systemd-logind.service.d
  └─IPAddress_clear.conf

The other test is to see if NIS logins work correctly, of course…

The slightly slower solution is to use nscd to cache the lookup requests, and apparently does so in a way that plays nicely with the sandboxing. The much slower solution is to switch to using sssd or similar and ditch NIS once and for all…

Note – this may also affect systemd-udevd.

Making Debian 9 (Stretch) configure GRUB to boot partitions with UUIDs

As per this bug:

Debian Bug report logs – #852323

The Debian 9 installer (up to v9.5 at least) does not always configure GRUB to find the boot partition using UUID, but leaves it pointing to /dev/sdb or whatever. This can be a problem if you change disks in the system. In particular, if you install from a USB stick and then remove it when the system reboots after the install this can change the disk /dev id. The result is a unbootable system. This can be a bit fiddly to fix (although the first thing that’s always worth trying is to reverse the changes you made and see if it boots, e.g. plug the install USB stick back in to the same usb slot).

You can check this by looking at the /boot/grub/grub.cfg file. A quick check is:

grep "/boot/v" /boot/grub/grub.cfg

The fix is easy. Let the system reboot while leaving the install media in place (obviously make sure you don’t boot from the install media again!). Log in to the system and run (as root/using sudo)

update-grub

Compare the grub.cfg before and after. Then test by removing the install media and rebooting.

FreeNAS kernel python3.6 was killed: out of swap space

Ran into an error with FreeNAS (version 11.1-RELEASE) where the system was complaining about running out of swap space (first noticed there was a problem when getting repeated emails about ‘Unauthorized system reboot’). The error on the console and in /var/log/messages is:

servername kernel: pid 26118 (python3.6), uid 0, was killed: out of swap space

Fortunately NFS was still working, and console login (using the second console) worked. Used this to enable ssh:

service sshd start

and then I could login remotely which is more civilised. The problem was there was no swap:

# swapinfo
Device 1K-blocks Used Avail Capacity

Checking the disks showed there were 2 Gb swap partitions on all the data drives (default configuration):

# gpart show
=> 40 286749400 mfisyspd0 GPT (137G)
 40 1024 1 bios-boot (512K)
 1064 286748368 2 freebsd-zfs (137G)
 286749432 8 - free - (4.0K)

=> 40 286749400 mfisyspd1 GPT (137G)
 40 1024 1 bios-boot (512K)
 1064 286748368 2 freebsd-zfs (137G)
 286749432 8 - free - (4.0K)

=> 40 7814037088 mfisyspd2 GPT (3.6T)
 40 88 - free - (44K)
 128 4194304 1 freebsd-swap (2.0G)
 4194432 7809842688 2 freebsd-zfs (3.6T)
 7814037120 8 - free - (4.0K)

=> 40 7814037088 mfisyspd3 GPT (3.6T)
 40 88 - free - (44K)
 128 4194304 1 freebsd-swap (2.0G)
 4194432 7809842688 2 freebsd-zfs (3.6T)
 7814037120 8 - free - (4.0K)

=> 40 7814037088 mfisyspd4 GPT (3.6T)
 40 88 - free - (44K)
 128 4194304 1 freebsd-swap (2.0G)
 4194432 7809842688 2 freebsd-zfs (3.6T)
 7814037120 8 - free - (4.0K)

=> 40 7814037088 mfisyspd5 GPT (3.6T)
 40 88 - free - (44K)
 128 4194304 1 freebsd-swap (2.0G)
 4194432 7809842688 2 freebsd-zfs (3.6T)
 7814037120 8 - free - (4.0K)

=> 40 234441568 nvd0 GPT (112G)
 40 88 - free - (44K)
 128 4194304 1 freebsd-swap (2.0G)
 4194432 230247168 2 freebsd-zfs (110G)
 234441600 8 - free - (4.0K)

=> 40 234441568 nvd1 GPT (112G)
 40 88 - free - (44K)
 128 4194304 1 freebsd-swap (2.0G)
 4194432 230247168 2 freebsd-zfs (110G)
 234441600 8 - free - (4.0K)

The swap could be remounted with

# swapon /dev/mfisyspd0p1

etc.

This works:

# swapinfo -h
Device 1K-blocks Used Avail Capacity
/dev/nvd0p1 2097152 516M 1.5G 25%
/dev/nvd1p1 2097152 77M 1.9G 4%
/dev/mfisyspd2p1 2097152 12M 2.0G 1%
/dev/mfisyspd3p1 2097152 11M 2.0G 1%
/dev/mfisyspd4p1 2097152 10M 2.0G 1%
/dev/mfisyspd5p1 2097152 10M 2.0G 1%
Total 12582912 636M 11G 5%

The question is – did these fall off at some point, or were they not mounted at boot (a couple of months ago?). Will experiment with a test system.

Monitoring GPU temperatures with nvidia-smi and Check MK (OMD)

The Nvidia monitoring setup described at https://elwe.rhrk.uni-kl.de/howto/ worked in Check MK 1.2.8, but fails in 1.4. After some modification things now work – it required some modification of the check script /omd/yoursite/local/share/check_mk/checks/nvidia_smi. The two modifications needed were:

Remove the grouping of nvidia_smi.errors1 and 2 (I can live with this as our GTX1070 doesn’t report this anyway).

Remove the unicode degree characters from the temperature output, as this seems to cause the system to choke on the textual output.

Needed to delete and recreate the host to get it to work properly – possibly unicode characters hanging around in the generated graph definitions or similar?

#!/usr/bin/python
# -*- encoding: utf-8; py-indent-offset: 4 -*-
# +------------------------------------------------------------------+
# | ____ _ _ __ __ _ __ |
# | / ___| |__ ___ ___| | __ | \/ | |/ / |
# | | | | '_ \ / _ \/ __| |/ / | |\/| | ' / |
# | | |___| | | | __/ (__| < | | | | . \ | # | \____|_| |_|\___|\___|_|\_\___|_| |_|_|\_\ | # | | # | Copyright Mathias Kettner 2012 mk@mathias-kettner.de | # +------------------------------------------------------------------+ # # This file is part of Check_MK. # The official homepage is at http://mathias-kettner.de/check_mk. # # check_mk is free software; you can redistribute it and/or modify it # under the terms of the GNU General Public License as published by # the Free Software Foundation in version 2. check_mk is distributed # in the hope that it will be useful, but WITHOUT ANY WARRANTY; with- # out even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. See the GNU General Public License for more de- # ails. You should have received a copy of the GNU General Public # License along with GNU Make; see the file COPYING. If not, write # to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, # Boston, MA 02110-1301 USA. ####################################### # Check developed by ####################################### # Dr. Markus Hillenbrand # University of Kaiserslautern, Germany # hillenbr@rhrk.uni-kl.de ####################################### # Tweaked by Jamie Scott # University of Glasgow # Jamie.Scott@glasgow.ac.uk ####################################### # the inventory functions def inventory_nvidia_smi_fan(info): inventory = [] for line in info: if line[2] != 'N/A': inventory.append( ("GPU"+line[0], "", None) ) return inventory def inventory_nvidia_smi_gpuutil(info): inventory = [] for line in info: if line[3] != 'N/A': inventory.append( ("GPU"+line[0], "", None) ) return inventory def inventory_nvidia_smi_memutil(info): inventory = [] for line in info: if line[4] != 'N/A': inventory.append( ("GPU"+line[0], "", None) ) return inventory def inventory_nvidia_smi_errors1(info): inventory = [] for line in info: if line[5] != 'N/A': inventory.append( ("GPU"+line[0], "", None) ) return inventory def inventory_nvidia_smi_errors2(info): inventory = [] for line in info: if line[6] != 'N/A': inventory.append( ("GPU"+line[0], "", None) ) return inventory def inventory_nvidia_smi_temp(info): inventory = [] for line in info: if line[7] != 'N/A': inventory.append( ("GPU"+line[0], "", None) ) return inventory def inventory_nvidia_smi_power(info): inventory = [] for line in info: if line[8] != 'N/A' and line[9] != "N/A": inventory.append( ("GPU"+line[0], "", None) ) return inventory # the check functions def check_nvidia_smi_fan(item, params, info): for line in info: if "GPU"+line[0] == item: value = int(line[2]) perfdata = [('fan', value, 90, 95, 0, 100 )] if value > 95:
return (2, "CRITICAL - %s fan speed is %d%%" % (line[1], value), perfdata)
elif value > 90:
return (1, "WARNING - %s fan speed is %d%%" % (line[1], value), perfdata)
else:
return (0, "OK - %s fan speed is %d%%" % (line[1], value), perfdata)
return (3, "UNKNOWN - GPU %s not found in agent output" % item)

def check_nvidia_smi_gpuutil(item, params, info):
for line in info:
if "GPU"+line[0] == item:
value = int(line[3])
perfdata = [('gpuutil', value, 100, 100, 0, 100 )]
return (0, "OK - %s utilization is %s%%" % (line[1], value), perfdata)
return (3, "UNKNOWN - GPU %s not found in agent output" % item)

def check_nvidia_smi_memutil(item, params, info):
for line in info:
if "GPU"+line[0] == item:
value = int(line[4])
perfdata = [('memutil', value, 100, 100, 0, 100 )]
if value > 95:
return (2, "CRITICAL - %s memory utilization is %d%%" % (line[1], value), perfdata)
elif value > 90:
return (1, "WARNING - %s memory utilization is %d%%" % (line[1], value), perfdata)
else:
return (0, "OK - %s memory utilization is %d%%" % (line[1], value), perfdata)
return (3, "UNKNOWN - GPU %s not found in agent output" % item)

def check_nvidia_smi_errors1(item, params, info):
for line in info:
if "GPU"+line[0] == item:
value = int(line[5])
if value > 500:
return (2, "CRITICAL - %s single bit error counter is %d" % (line[1], value))
if value > 100:
return (1, "WARNING - %s single bit error counter is %d" % (line[1], value))
else:
return (0, "OK - %s single bit error counter is %d" % (line[1], value))
return (3, "UNKNOWN - GPU %s not found in agent output" % item)

def check_nvidia_smi_errors2(item, params, info):
for line in info:
if "GPU"+line[0] == item:
value = int(line[6])
if value > 500:
return (2, "CRITICAL - %s double bit error counter is %d" % (line[1], value))
if value > 100:
return (1, "WARNING - %s double bit error counter is %d" % (line[1], value))
else:
return (0, "OK - %s double bit error counter is %d" % (line[1], value))
return (3, "UNKNOWN - GPU %s not found in agent output" % item)

def check_nvidia_smi_temp(item, params, info):
for line in info:
if "GPU"+line[0] == item:
value = int(line[7])
perfdata = [('temp', value, 80, 90, 0, 95 )]
if value > 90:
return (2, "CRITICAL - %s temperature is %dC" % (line[1], value), perfdata)
elif value > 80:
return (1, "WARNING - %s temperature is %dC" % (line[1], value), perfdata)
else:
return (0, "OK - %s temperature is %dC" % (line[1], value), perfdata)
return (3, "UNKNOWN - GPU %s not found in agent output" % item)

def check_nvidia_smi_power(item, params, info):
for line in info:
if "GPU"+line[0] == item:
draw = float(line[8])
limit = float(line[9])
value = draw * 100.0 / limit
perfdata = [('power', draw, limit * 0.8, limit * 0.9, 0, limit )]
if value > 90:
return (2, "CRITICAL - %s power utilization is %d%% of %dW" % (line[1], value, limit), perfdata)
elif value > 80:
return (1, "WARNING - %s power utilization is %d%% of %dW" % (line[1], value, limit), perfdata)
else:
return (0, "OK - %s power utilization is %d%% of %dW" % (line[1], value, limit), perfdata)
return (3, "UNKNOWN - GPU %s not found in agent output" % item)

# declare the check to Check_MK

check_info['nvidia_smi.fan'] = (check_nvidia_smi_fan, "%s fan speed" , 1, inventory_nvidia_smi_fan)
check_info['nvidia_smi.gpuutil'] = (check_nvidia_smi_gpuutil, "%s utilization" , 1, inventory_nvidia_smi_gpuutil)
check_info['nvidia_smi.memutil'] = (check_nvidia_smi_memutil, "%s memory" , 1, inventory_nvidia_smi_memutil)
#check_info['nvidia_smi.errors1'] = (check_nvidia_smi_errors1, "%s errors single" , 0, inventory_nvidia_smi_errors1)
#check_info['nvidia_smi.errors2'] = (check_nvidia_smi_errors2, "%s errors double" , 0, inventory_nvidia_smi_errors2)
check_info['nvidia_smi.temp'] = (check_nvidia_smi_temp, "%s temperature" , 1, inventory_nvidia_smi_temp)
check_info['nvidia_smi.power'] = (check_nvidia_smi_power, "%s power" , 1, inventory_nvidia_smi_power)

#checkgroup_of['nvidia_smi.errors1'] = 'hw_errors'
#checkgroup_of['nvidia_smi.errors2'] = 'hw_errors'

ResourceSpace cron and database notes

Ran into a couple of issues today:

Note: system setup is Debian 9 with standard options (Apache 2.4, PHP 7.0, MariaDB 10.1)

Cron

The documentation implies you should run cron_copy_hitcount.php as a cron job. However, the new correct way seems to be to run batch/cron.php, which runs a bunch of sub-jobs. I’ve got this set up in cron.daily as:

#!/bin/sh
wget -q -r http://localhost/resourcespace/batch/cron.php

We’ll see if this works. Certainly running it directly by browsing to it seems to work.

LDAP

Trying to activate the simpleldap plugin threw up two problems:

php-ldap wasn’t installed – easy enough. Note apache needs a restart after installing…

Second error was a problem with the database – the plugin couldn’t create a table, with error

Specified key was too long; max key length is 767 bytes

This seems to be because when I created the database the character set used was utf8mb4_general_ci, which in the worst case uses 4 bytes per character. If you try to create a index key with 255 characters you run into this limit.

The solution was to change the database to use utf8_general_ci. This allowed the plugin to create the simpleldap_groupmap table with utf8_general_ci. The rest of the database is still utf8mb4_general_ci, but as it has been created already without an issue we should be ok.

User missing from login screen – OSX with FileVault

Situation: new MacBook with OSX Sierra. Set up with an admin account, enable FileVault (taking note of recovery key obviously!) and install the necessary. Create account for end user and give it to them. All is well (after getting some USB-A to USB-C converters…)

User restores all his stuff from a Time Machine backup to the account on the new system – this overwrites all the current user settings. After rebooting the system, his account has disappeared from the login screen.

Solution: Log on as the other administrative user (luckily we have one!) and open the Settings – Security & Privacy – FileVault. A notice at the bottom of the dialog box appears informing you that there are some users that are not enabled to use FileVault, with a button to enable the users. This brings up a list showing the missing user. To enable the user their password needs to be entered.