After much problems with getting a new Proxmox cluster up and running two things have helped:
Putting the SAN IP addresses in the hosts file, avoiding DNS dependancies (especially when one of the systems isn’t in there yet…). This put me on track: http://blog.rhavenindustrys.com/2013/04/curious-proxmox-clustering-fix.html
Important bit:
127.0.0.1 localhost.localdomain localhost
169.254.0.1 proxmox1.local proxmox1 pvelocalhost
169.254.0.2 proxmox2.local proxmox2
10.10.5.101 proxmox1.example.com
10.10.5.102 proxmox2.example.com
This associates the short aliases with the network used by corosync, while leaving the long addresses to the outside world.
Then tried tests detailed at: https://pve.proxmox.com/wiki/Troubleshooting_multicast,_quorum_and_cluster_issues
The multicast test failed – i.e. running
omping -c 10000 -i 0.001 -F -q <list of all nodes>
on both nodes at the same time resulted in 100% loss. Fixed this by disabling IGMP snooping on the SAN VLAN. ExtremeOS command is:
disable igmp snooping <vlanname>
Hey presto, after getting the second node to join properly:
pvecm add 192.168.xxx.xxx -force
it gets quorum immediately. I suspect this issue was causing a lot of the historical issues with getting quorum to work on this switch.