Last week I was doing a write-up on how to replace MAC-Based Forwarding (MBF) with Policy Based Routing (PBR). This week I want to give you some background on why this should be your general Best Practice on the example of Hyper-V.
I was facing a strange NetScaler troubleshooting engagement the other day. Services were randomly flapping and the overall connectivity to the applications published through NetScaler was bad.
NetScaler was a separate physical box but the same problem should apply to any form-factor.
At first I was blaming the network. But further analysis showed it only seems to affect a small subset of services – those hosted on Hyper-V. So I started blaming Hyper-V but turned out it wasn’t that simple.
Drilling deeper into the issue I could see a significant high number of TCP re-transmissions on the mentioned sessions. It took me some time but then I’ve noticed the different MAC addresses involved in one session. One Intel and one Microsoft-tagged. Microsoft being the VMs MAC, Intel being the current host’s physical NIC.
Interesting enough most issues seemed to follow a pattern:
- Microsoft MAC speaks to NetScaler happily
- Intel MAC comes into picture
- Session struggles
So it seems Hyper-V bounces MACs in certain scenarios. While this is not necessarily a problem on its own, it can become one. Meet MAC-Based Forwarding.
Ok, blaming MBF is only one part of the problem. The other resides in the fact that Hyper-V obviously seems to strictly expect responses on the virtual MAC while sending out traffic sourcing from both MACs.
Combining this with MBF will cause Hyper-V to drop packets the moment it decides to use the physical MAC and that’s exactly what happened in the above trace.
I’ve seen the same behavior with some routers using VRRP in the past. They did send out all traffic on the physical interface but expected all responses on the virtual interface.
However, this was only a small subset of routers I’ve encountered. Not all that use VRRP or other HA mechanisms. But still, this shows that it seems to be a sporadically reoccurring issue between NetScaler, MBF and some vendors.
In the case of Hyper-V, the problem can be solved on both ends of the connection.
Another fellow blogger, Dale Scriven, pointed out in his article on the vHorizon blog that this can also be solved changing the Hyper-V NIC teaming load balancing settings from “Dynamic” to “Hyper-V Port”. Read the full story here: NetScaler VPX and Hyper-V flapping
Me personally I prefer to solve it at the NetScaler and prevent the “unexpected” responses to the sudden MAC change. This can be done by Disabling MAC-Based Forwarding (MBF) and Enabling Policy Based Routing (PBR).
Now I don’t know who’s right or wrong here. NetScaler or Hyper-V. I couldn’t find sufficient evidence in the RFCs I’ve skimmed to blame either of them. But I know that for the last years I’ve avoided MBF and my PBR concept never faced me with any of the nasty network-issue surprises that I occasionally face with MBF.