I just wasted an entire day trying to inform Comcast about a problem with their network. This problem caused a service outage for me, and I'm sure that it has effected hundreds of other customers. Unfortunately, it will never be fixed. Senior engineers at Comcast are not aware of the problem, and they will never be aware, because it is impossible to inform them. Comcast's first and second level support staff don't understand the problem and have been trained to aggressively blow off anyone who attempts to report it to them. As far as I can tell, there is absolutely no way to get through the first and second level support barrier to someone who actually understands DHCP. Comcast's support staff does not know when to escalate something that they do not understand. I am posting this mostly as a personal catharsis having spent an entire day being told that I don't know what I'm talking about by people who barely know the first thing about how the Internet works. Pushing this further is not worth the frustration. Perhaps someone else who is experiencing the same problem will come upon this blog post in a Google search and will be saved the same frustration. That is the only thing that I can do at this point. The problem manifests as follows: Some devices are intermittently unable to obtain a DHCP lease. What makes this complicated is that other devices ARE still able to obtain a lease. In my case my router stopped getting IPs from Comcast, but I could get an IP with my laptop. The router had been working fine as my gateway for months and had no problems getting IPs and then one day I woke up and it wasn't working anymore. My router could not get a lease, but my laptop could get a lease if I plugged it into my cable modem directly. The naive assumption when confronted with this set of circumstances is that the problem is with the router. The network is obviously able to hand a lease out. The router must just not be asking for one properly or accepting one as it should. The first time I encountered this behavior on Comcast's network, I bought into this assumption and went out and purchased a new router. Then, a few months later, it happened to my new router as well. In this case, the naive assumption is wrong; both routers are working properly. Internet protocols are complicated and sometimes they fail in subtle ways that defy naive assumptions. Unfortunately, it is impossible to get Comcast to look at this problem more carefully, because their low level technical support staff don't understand how to look at it more carefully, and believe that the naive assumption is the only possible explanation. Because Comcast's network has this problem, people likely call up technical support on a regular basis complaining about it, and they are told that their routers must be broken. Comcast's technical support staff has gotten good at arguing people down about their "broken routers" because they see it all time. Of course, they are seeing it all the time because the problem is with their network, and not the routers. If you press them, they'll explain that they don't support your router, and if you want support you'll have to pay extra for "home network service" with a router they supply. So, not only do they remain blind to the problem, but in some cases it becomes a revenue generating opportunity for them, although I wonder whether the rate at which they have to replace the routers they support is higher then it ought to be. Here is an example of a valid DHCP transaction between my laptop and Comcast's network. Some repeated packets have been removed for the sake of simplicity.
Source Destination Protocol Info
0.0.0.0 255.255.255.255 DHCP DHCP Discover
192.168.100.1 192.168.100.10 DHCP DHCP Offer
0.0.0.0 255.255.255.255 DHCP DHCP Request
0.0.0.0 255.255.255.255 DHCP DHCP Discover
76.97.XXX.X 76.97.XXX.XXX DHCP DHCP Offer
0.0.0.0 255.255.255.255 DHCP DHCP Request
76.97.XXX.X 76.97.XXX.XXX DHCP DHCP ACK
The DHCP Discover message is my laptop asking for an address. When the cable modem is first plugged in, it offers an address of 192.168.100.10. The laptop sends back a DHCP Request to confirm the use of that IP, and that Request is ignored by the cable modem. Eventually the laptop gives up on asking for 192.168.100.10, and sends out a general Discover message again. By now the cable modem has connected to the cable network, and Comcast's DHCP servers respond to the Discover with an Offer, which results in a Request from my laptop which the network ACKs. Here is an example of Comcast's network failing to provide a DHCP lease to my router:
0.0.0.0 255.255.255.255 DHCP DHCP Discover
192.168.100.1 192.168.100.10 DHCP DHCP Offer
0.0.0.0 255.255.255.255 DHCP DHCP Request
192.168.100.1 192.168.100.10 DHCP DHCP ACK
192.168.100.10 192.168.100.1 DHCP DHCP Request
192.168.100.10 255.255.255.255 DHCP DHCP Request
0.0.0.0 255.255.255.255 DHCP DHCP Discover
Here we see an interesting difference. The cable modem responds to the DHCP Request for 192.168.100.10 with an ACK! Why does the cable modem send the router an ACK, when no ACK was sent to the laptop? I'm not sure. I think the reason might be that router is faster. It takes the laptop over a second to send its Request after the Offer, but the router sends its Request in about a millisecond. Its possible that by the time the laptop has sent its Request, the cable network is connected, and so the cable modem isn't bothering to offer DHCP anymore, but the router gets its Request out faster and receives an ACK. The DHCP lease from the cable modem is very short, so the router continues to send out Requests and Discovers after the cable network is connected. The cable network ought to respond to these messages. One important difference is that in this case the router is asking for 192.168.100.10 rather than any address, because it received an ACK for that address before. That may be why the DHCP server is ignoring its requests, but Comcast's DHCP servers ought to anticipate this because of the way their cable modems work. Regardless, its clear that the router speaks DHCP and is capable of taking leases. The differences between the two scenarios are differences in the behavior of the cable modem and Comcast's DHCP servers. Therefore, the problem clearly lies in one of those two places. The naive assumption about the router being broken is clearly wrong. At the end of a day full of arguing with Comcast's first level staff I finally received a voice mail message from a second level support person who offered that I could replace the cable modem (but that there could not possibly be a problem with Comcast's DHCP servers). I might replace the modem, but I doubt it will make a difference. I don't think the cable modem is the problem. While I was troubleshooting this, I plugged my old router back in, and it was able to get a lease just fine. For some reason, Comcast's DHCP servers are ignoring Discoveries from my router's MAC address. I have no idea why that would be the case, but that is the most reasonable explanation. Perhaps someone with a bit more experience with DHCP might immediately understand why if they read this explanation. It might be that their servers think my device already has a lease for a different IP. This might cause them to ignore new lease requests, particularly when they are for a different, specific address (192.168.100.10). Fortunately, the condition seems to be temporary. Now that I have two routers, I can just switch them when one is being blocked. Problem solved, sort of, although I imagine there are a lot of other people out there who don't understand why they have a router fail every couple of months. There is a lesson in here about how not to do technical support, but I'm not sure exactly what it is. Certainly, support call resolutions ought to be tracked for frequency. If unexpected answers like "Customer's COTS router doesn't work" end up passing a certain threshold this ought to prompt further investigation. When it comes to support call resolutions, strange and frequent probably means wrong. Anyway, thanks Comcast for running a broken network, repeatedly insulting my intelligence, and wasting a huge amount of my time. I'd replace you with a DSL circuit, but you and I both know that the phone company is just as bad, and they force you to buy a POTS line, which ought to be illegal. |