I recently had a piece published in the Cloud Computing Journal. In it, I talked about how a PMTUD black hole can cause a particularly subtle set of issues in hybrid cloud-based environments where the cloud resources are connected to a corporate office or other datacenter via IPSec tunnels. PMTUD black holes basically cause certain (but not all) traffic to not make it through the tunnel.
PMTUD—Path Maximum Transmission Unit Discovery—is a protocol for discovering the maximum size of a packet you can send along a path between two hosts. It is fragile since it relies on the intermediary routers to send back information about this maximum size. But if those informational packets are dropped along the way, the maximum packet size will be unknown and any traffic that exceeds it will fall into a PMTUD black hole. This means something like a file transfer or large content deliveries from web servers will fail or hang, but your ssh session and small web pages work just fine.
Here’s more from the article:
PMTUD … is a protocol/algorithm defined in RFC 1191 that determines the best packet size for IPv4 datagrams flowing between any two given hosts. In this way, it attempts to optimize traffic through the Internet by using the largest possible datagram that doesn’t require intermediary routers to fragment the traffic (since fragmentation and, more importantly, reassembly are expensive operations for routers to perform).
It works like this … if I’m a host (say an FTP server) who wants to send the largest packet I possibly can to another host (say an FTP client), I will start with what I know, which is the maximum transmission unit (MTU) of my underlying ethernet interface (normally 1500 bytes). I will then send this packet out with the Don’t Fragment (DF) bit set in the IPv4 header. If a router in the path would need to fragment this packet in order to send it along to the next hop (because its outbound interface was, say 1450 bytes), it will send back an Internet Control Message Protocol (ICMP) reply indicating that fragmentation was needed, but the DF bit was set. The implication is that the packet was dropped. In this ICMP reply, it also specifies what the MTU should be (in this case 1450 bytes). The originating host will then resend a packet of this size. This can happen multiple times if, for example, another router later on down the path has an even smaller MTU on its outbound interface. Eventually the packet gets to the destination host using a size that is the most optimal for the path and all subsequent ‘large’ packets will also be this optimal size.
Nowadays, PMTUD often comes into play when we’re talking about tunneling traffic. Since tunnels require that some extra headers be inserted, they have an effective MTU that is the original ethernet MTU less the size of the additional headers.
But there’s a problem with PMTUD …
Read the full article online at Cloud Computing Journal: Don’t Let Your Hybrid Cloud Collapse into a Black Hole