Target: Host
Previous Status: Green
New Status: Red
Alarm Definition:
([Event alarm expression: vSphere HA agent on a host has an error; Status = Red] OR [Event alarm expression: vSphere HA detected a network isolated host; Status = Red] OR [Event alarm expression: vSphere HA detected a network-partitioned host; Status = Red] OR [Event alarm expression: vSphere HA detected a host failure; Status = Red] OR [Event alarm expression: Host has no port groups enabled for vSphere HA; Status = Red] OR [Event alarm expression: vSphere HA agent is healthy; Status = Green])
Event details:
vSphere HA detected that host (host) is in a different network partition than the master (Cluster) in Datacenter
I had been getting this message randomly over the last couple months on some of my datacenter hosts. These alerts didn’t seem to be causing any problems within the cluster, but I wanted to get to the bottom of this. I opened a ticket with VMware and uploaded the logs from both the host and vCenter, but they didn’t see anything out of the ordinary. On the second webex with VMware I noticed a couple strange things with the management network that might be the cause.
- The first thing I noticed was that the NICs were set for “Auto Negotiate”. I originally set up our environment on ESXi 4 before upgrading to ESXi 5.1. When I initially set this up I hard coded (KB1004089) these to 1000GB/Full. I am wondering if at some point during the upgraded that they defaulted back. On our switches it was set at 1000GB/Full so it is important that we set this on the host NICs to 1000GB/Full as well.
- The second thing that I noticed that in the Management network that I had the Load Balancing set to “Route based on IP hash”. The problem here is that for this to work correctly you need a port channel configured (I do not have this configured this way). This might be the cause of the HA problem if the traffic is going across these NICs is getting confused because of the Load Balancing configuration. I changed this to “Route based on the originating virtual port ID”, which makes the traffic go out on the port that it came in on. There is a good read found here…http://blogs.vmware.com/kb/2013/03/troubleshooting-network-teaming-problems-with-ip-hash.html.
This case is still ongoing with VMware and I should know in the next couple weeks if this solves my problem; my gut tells me it will.