Proxmox clustering and multicast (also, DNS)

After much problems with getting a new Proxmox cluster up and running two things have helped:

Putting the SAN IP addresses in the hosts file, avoiding DNS dependancies (especially when one of the systems isn’t in there yet…). This put me on track: http://blog.rhavenindustrys.com/2013/04/curious-proxmox-clustering-fix.html

Important bit:

127.0.0.1 localhost.localdomain localhost
169.254.0.1 proxmox1.local proxmox1 pvelocalhost
169.254.0.2 proxmox2.local proxmox2

10.10.5.101 proxmox1.example.com
10.10.5.102 proxmox2.example.com

This associates the short aliases with the network used by corosync, while leaving the long addresses to the outside world.

Then tried tests detailed at: https://pve.proxmox.com/wiki/Troubleshooting_multicast,_quorum_and_cluster_issues

The multicast test failed – i.e. running

omping -c 10000 -i 0.001 -F -q <list of all nodes>

on both nodes at the same time resulted in 100% loss. Fixed this by disabling IGMP snooping on the SAN VLAN. ExtremeOS command is:

disable igmp snooping <vlanname>

Hey presto, after getting the second node to join properly:

pvecm add 192.168.xxx.xxx -force

it gets quorum immediately. I suspect this issue was causing a lot of the historical issues with getting quorum to work on this switch.

Configuring Proxmox hosts (and other postfix installs) to send email via smarthost V2

In this post suggested using the Satellite system option. However, this seems to do the same as the mail sent by smarthost; no local mail option in exim – i.e. even local mail to root tries to go via the smarthost, which then complains. The Internet with smarthost option is probably the better choice (equivalent to exim’s mail sent by smarthost; received via SMTP or fetchmail).

N.B. Normal proxmox setup seems to be for postfix to use /etc/aliases directly. Double check this file!

updatedb.mlocate hangs on stale NFS mounts (Proxmox)

After properly configuring postfix on the Proxmox hosts to send email, emails to root were noticed daily:

/etc/cron.daily/mlocate:
 /usr/bin/updatedb.mlocate: `/var/lib/mlocate/mlocate.db' is locked (probably by an earlier updatedb)
 run-parts: /etc/cron.daily/mlocate exited with return code 1

After logging on to one of the boxes, killing (-9) the updatedb process and running it with verbose on:

updatedb.mlocate -v

The output indicated it was hanging on one of the ”/mnt” directories. Looking at the output of ”mount” a couple of old nfs shares were noticed, that should have been deleted (and didn’t appear in the GUI). Unmounting these shares made updatedb run happily.

If the unmount complains the device is busy, check the updatedb.mlocate process has been killed.

Configuring Proxmox hosts (and other postfix installs) to send email via smarthost

Proxmox uses postfix as its MTA. To configure this to send email via smarthost run

dpkg-reconfigure postfix

For general type we want Satellite

SMTP server - usual

Mailbox size – could use the suggested 51200000 here – should be more than big enough seeing as nothing should be ending up in the system mailbox anyway.

Once this is done, edit ”/etc/aliases”. It should look like:

postmaster: root
nobody: root
hostmaster: root
webmaster: root
www: root

Add a line at the end:

root: Your.Email@example.com

Regenerate the database with ”newaliases” (or possibly ”postalias filename” on other distributions)

Test with something like

echo "test" | mail -s "test mail sent to root" root

Windows Server 2003 change to multiprocessor HAL

According to http://technet.microsoft.com/en-us/library/cc782277(v=ws.10).aspx this should be simple. Trying it on a Server 2003 VM (that had been converted from Virtual Server 2005) didn’t give the options of the other HALs. The answer is at http://www.pimp-my-rig.com/2008/08/article-acpi-uniprocessor-to.html – use the command line devcon.exe tool and the .cmd script at http://www.pimp-my-rig.com/2008/10/acpi-multiprocessor-hal-upgrade-script.html .

Devcon can be downloaded from Microsoft at http://support.microsoft.com/kb/311272

The script is:

@echo off
@title "Upgrading to ACPI Multi-Processor HAL.."
cls

echo ====================================================
echo Upgrading to ACPI Multi-Processor HAL..
echo ====================================================
echo.
echo please wait..

devcon sethwid @ROOT\PCI_HAL\0000 := !E_ISA_UP !ACPIPIC_UP !ACPIAPIC_UP !ACPIAPIC_MP !MPS_UP !MPS_MP !SGI_MPS_MP !SYSPRO_MP !SGI_MPS_MP > nul
devcon sethwid @ROOT\ACPI_HAL\0000 := !E_ISA_UP !ACPIPIC_UP !ACPIAPIC_UP !ACPIAPIC_MP !MPS_UP !MPS_MP !SGI_MPS_MP !SYSPRO_MP !SGI_MPS_MP > nul
devcon sethwid @ROOT\PCI_HAL\0000 := +ACPIAPIC_MP > nul
devcon sethwid @ROOT\ACPI_HAL\0000 := +ACPIAPIC_MP > nul
devcon update %windir%\inf\hal.inf ACPIAPIC_MP > nul

echo.
echo ====================================================
echo Script Completed: press any key to reboot..
echo ====================================================

pause > nul

devcon reboot

Migration of virtual machines from Proxmox 1.9 to 2.1

Procedure

  1. Create a storage area that both clusters can see (e.g. NFS on freenas box)
  2. Backup VM from 1.9 system to backup area.
  3. SSH to backup area and move backup tgz file from root of share (where 1.9 backs up) to dump directory (should have been created by connecting 2.1).
  4. Restore from backup in 2.1 (may want to keep the same VMID, to avoid inconsistent disk image numbers)
  5. Change hardware if required (VM won’t start if pointing to non-existent CD image, change network to appropriate bridge)

Network interface in Windows VMs

Not sure at moment whether changing the bridge affects anything, or whether it’s just due to the migration, but windows sees the network interface as a new device, so sets it using DHCP. Check via console!

More Proxmox 2.1 setup notes – corosync

Way to change primary interface – edit /etc/hosts to change ip address returned when cman does lookup of system hostname.

e.g.

127.0.0.1 localhost.localdomain localhost
192.168.40.6 Hildasay.physics.gla.ac.uk Hildasay pvelocalhost

# The following lines are desirable for IPv6 capable hosts

::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

With this config cman will return 192.168.40.6 as the address totem should bind to for it’s multicast stuff.

There may be certificate problems doing this. Probably best in future to install proxmox with the initial interface on the SAN (or other appropriate private network).

Proxmox 2.1 setup notes

Proxy

In addition to the web interface setting (for the cluster), need to configure aptitude as well. Add to somewhere appropriate (e.g. /etc/apt/apt.conf.d/70debconf)

Acquire::http::Proxy "http://wwwcache.gla.ac.uk:8080";
Acquire::ftp::Proxy "http://wwwcache.gla.ac.uk:8080";

 Time

Have to set up NTP on each system as before. Edit /etc/ntp.conf to add:

server login.physics.gla.ac.uk iburst

and restart the service

/etc/init.d/ntp restart

Check with ntpq -p , output will look something like:

remote refid st t when poll reach delay offset jitter
==============================================================================
*puck.physics.gl 130.159.196.117 3 u 41 64 17 0.244 0.154 0.115
s02.be.it2go.eu .STEP. 16 u - 64 0 0.000 0.000 0.000
utl-ntp.evo.hlm .STEP. 16 u - 64 0 0.000 0.000 0.000
218-32-169-193. .STEP. 16 u - 64 0 0.000 0.000 0.000
wan1.dgeb.info .STEP. 16 u - 64 0 0.000 0.000 0.000

Time needs to be reasonably accurate before attempting cluster join!

Cluster

Instructions at http://pve.proxmox.com/wiki/Proxmox_VE_2.0_Cluster

Quick guide

Login to the a node. To create cluster:

pvecm create YOUR-CLUSTER-NAME

Use a unique name. This name cannot be changed later!

To check the state of cluster:

pvecm status

To add a node, log in to that node and run:

pvecm add IP-ADDRESS-CLUSTER

Where the address is that of one of the existing cluster nodes. It will ask for the root password of the node you are connecting to.

Remember to add the new node’s address to nfs shares etc.