IPv6 doom: the rogue RA bug

Malcolm Scott

Last updated 24th January 2010

In this article I describe a problem which I and others have seen in the wild many times, whereby a single broken host will severely cripple the ability of other hosts on the local network to reach IPv6-capable (dual-stack) servers, regardless of whether the local network has deployed IPv6. This is primarily aimed at network administrators, although the problem affects many people.

Background, from my perspective

I run some servers in the University of Cambridge. We have supported IPv6 for a few years, but most clients still have only IPv4 connectivity. A fairly regular complaint from students on college networks is "your site takes ages to load", and this almost always translates to "my browser is trying to access your server via IPv6, but its local network provides only IPv4, so I have to wait for the IPv6 connection to time out before I fall back to IPv4". This complaint mainly comes from Mac OS X users, occasionally Linux, never Windows. If this page was slow to load for you, you likely just experienced this very problem.

Of course, this shouldn't happen; clients should have no IPv6 address and should automatically use IPv4 even if (as is probably the case) their operating system is IPv6-capable.

Why is this happening?

All Windows hosts running Vista or later and some running XP will set up 6to4 and/or Teredo tunneling (seemingly preferring 6to4 but falling back to Teredo when behind a NAT). This gets them IPv6 connectivity even when the local network does not provide an IPv6 router. This by itself does not affect other users.

Some people use Windows Internet Connection Sharing (ICS) on their home LAN, then plug their computer into the college network in Cambridge. Even if they have ensured that ICS is disabled on the network interface they use in Cambridge, a bug in ICS means that Windows will announce itself as an IPv6 router to all interfaces, i.e. everyone on your local network (or zone thereof, depending on how far the broadcast domain reaches). It will announce the IPv6 address prefix it has obtained from 6to4 (or perhaps from Teredo, although I have not seen this in the wild). Other hosts on the local network will assign themselves an address from this prefix.

(It is possible something other than ICS is behaving similarly; more data needed. I also do not know which versions of ICS are responsible; I fear that all post-XP are.)

What does this break?

It causes problems for many users on affected networks when they attempt to access dual-stack servers (i.e. ones which present both an IPv4 and an IPv6 address in DNS). One such site is the SRCF, as mentioned, but there are others. On World IPv6 Day, 8th June 2011, many sites will enable IPv6 in this manner and users will face problems connecting to them. More generally, we should expect an increasing number of sites to switch to dual-stack over the coming year due to the impending exhaustion of the IPv4 address pool; several large sites such as Heise.de have already gone dual-stack on a permanent basis.

To reiterate, this breaks connectivity for everybody on the LAN, not just those with misbehaving ICS.

Is it all doom and gloom?

The light at the end of the tunnel is RFC3484, which specifies that IPv6 connectivity should only be prioritised by clients over IPv4 when not using 6to4, as 6to4 is an unreliable transition mechanism. (6to4 can still be used when connecting to an IPv6-only host.) In my experience, the router advertisements from ICS use a 6to4 prefix; an RFC3484-compliant client will not allow this to cause problems. (This is, of course, removing the symptom rather than the problem—removing broken software from your network is the only long-term solution.)

RFC3484 is implemented by all recent operating systems. In practical terms, this means that Windows users are largely immune from the effects of this bug, but all users of Mac OS X versions older than 10.6.5 will be affected, as will users of oldish (pre-2010, approximately) Linux distributions on networks which use NAT. (Glibc prioritises private addresses below 6to4, but many distributions have swapped this around using a custom /etc/gai.conf, e.g. Ubuntu since 10.04 Lucid Lynx, Debian since Squeeze, Fedora since 13 Goddard, Mandriva since 2010.1 Spring, openSUSE since 11.3, Gentoo since 2010-04-25.) Some software which does its own DNS resolution will also be affected regardless of the operating system, for example older versions of Opera.

Unfortunately, there is a sting in the tail of RFC3484: it does not deprioritise Teredo addresses. Windows sets up Teredo tunneling in some cases (when it is behind IPv4 NAT?), and ICS could announce this prefix in its router advertisement. This is just a hypothesis; I have not seen this happen, possibly because there is no NAT on the networks I've been watching. In this case, even RFC3484-compliant hosts will try to use the bad route. If this happens, the only solution will likely be to remove the rogue router.

What can I (as a network administrator) do?

If your users report long delays connecting to a small number of sites, have them test their connectivity at test-ipv6.com. That site is user-friendly but quite thorough, and explicitly tests for this particular problem amongst others. Note that IPv4-only hosts with no bad IPv6 routes score greater than 0 as this is not an immediate problem; if a user sees a score of 0 then you have run into the problem I describe, or something else equally bad.

Ban ICS. It has no place on your LAN anyway, and stating this might make users remember to turn it off. (This is of course impractical in some situations, but in Cambridge we are used to college IT departments publishing lists of banned software.)

Watch out for the problem. Monitor for broken routes, for example by regularly checking the IPv6 addresses of a host on each of your LANs (which has your operating system's automatic 6to4 or Teredo support turned off, to avoid confusion). If you see that the host has assigned itself an unexpected IPv6 address which does not start with the (harmless) link-local prefix fe, you need to go and find the source of your stray router advertisements—hunt down the culprit, and turn off ICS if it is enabled. (If ICS is not to blame, I would very much appreciate knowing what you find: email me.) There is a shortcut which may help you track down the culprit: if the address starts with 2002:, that is 6to4, and can be converted into the corresponding IPv4 address. (If the address starts with 2001:0:, that is Teredo; bad luck.)

You may find ramond useful (although I have not tried it myself yet). This is a tool "designed to 'clear' (by sending spoofed zero lifetime adverts) rogue-routes sent by users running 6to4 gateways on a campus network." This is not a complete solution as rogue routes may still be present on clients for brief periods. Furthermore, Mac OS X is known to treat zero-lifetime RAs incorrectly, and to still install a default route; given the majority of affected users are running Mac OS X it is unclear how much ramond can solve the problem. However, you could use ramond to monitor for the rogue RA problem (and others). The supplied example configuration does not recognise Teredo addresses, but this should be simple to add.

Some newer models of switch have the ability to filter router advertisement packets such that only those from authorised sources (i.e. official IPv6 routers) are allowed. If your switches can do this, great—however I envy your budget.

Some people have chosen to respond to this problem by filtering out all IPv6 traffic from their network. This is not a good solution. You will need to provide IPv6 connectivity at some point; eventually someone will be forced to set up an IPv6-only server or service. Similarly, turning off IPv6 on all hosts is a sledgehammer solution which will come back to bite you for the same reason.

Further reading

ARIN IPv6 Wiki: Customer problems that could occur

Any other insights?

I (along with other IPv6 operators, with whom I am in contact) am still learning about this problem. I would like to know any of your thoughts about it. Please get in touch.

Changelog