«
»

Internet, Technology, Webcomics

That’s why there’s no “CSI: TCP/IP Network Debugging”

November 16th, 2005 by Jeff | Dump Core

Just an amusing anecdote that occurred yesterday within the sacred, hallowed virtual halls of Keenspot. None of the names have been changed because… everyone is guilty.

In addition to the public comic sites, Keenspot has a number of private internal sites to provide information and communication channels for its members who are scattered across the globe. One of these sites just happens to be a Wiki that stores information about how the various Autokeen tags, how and when people get paid, how to set up a Keenspot print book, etc. In a highly unofficial capacity, I’ve become the wikimaster of sorts at Keen, largely because I’m the one who has taken the time to learn the entire Wiki markup and enter the largest volume of information. This site is a highly valuable resource for us, especially when somebody new is “Spotted” (that is, invited to join Keen), because we can just point them to the Newbies section of the Wiki and say, “Have fun.”

However, recently the Wiki had to be moved from one hosting service to another, as Keenspot is slowly but steadily moving away from its old ISP. There was also a question of security as a Wiki, by its very nature, is designed to be edited by anyone. Needless to say, since the Wiki contains somewhat delicate internal information like financial disbursements, we want to make it freely available to our members but not available to those outside the organization. So the Wiki was down for quite some time as the Keen Tech Crew determined where to move the database, the best way to secure it, and… well… when they could dig up the spare time to devote to it. (Sometimes, when there are a lot of forest fires to put out, the little sparks get left to smolder for a while.)

Recently, though, the Wiki was reopened. So I heartily typed in the URL to check it out, as there have been several articles I’ve wanted to add since it went down. However, when I tried to reach the site, I could never connect. I started pulling out all of my network analysis tricks. The DNS resolved to an IP address, but pings never responded. A traceroute seemed to die at one of Keenspot’s ISP’s internal routers. I could reach other Keen sites without a problem, so I determined the trouble must be over on Keen’s side, or at least at their ISP. I posted all the evidence on the private members forum and waited to hear back from Keen’s chief whipping boy and co-founder, Darren “Gav” Bleuel.

Darren looked over the post and was befuddled. He asked if I could reach Keenswag, Keenspot’s online store, because the Wiki was housed on the same physical machine. Sure enough, I couldn’t. I hadn’t been able to reach Keenswag in months… ever since it had been moved to this new machine. Now that the Wiki was there, my network traffic to it was going into a black hole as well. What made this even more confusing was that Chris Crosby, Keen’s other founder, hadn’t been able to reach Keenswag either… with virtually the same symptoms.

Ah, a mystery. Please don your deerstalker caps here.

Darren began to hypothesize. His theory was that certain machines (i.e., Windows) have an annoying knack for choosing a single network path to a given machine and not wanting to let go of it. The Internet is designed to be flexible, so if one router goes down, alternate paths can be resolved. However, once a Windows box finds a given path it wants to continue using it, even if more efficient paths (or paths that actually work, if a router in the old path completely goes down) can be found. It’s an odd theory, but I’ve heard it before and seen evidence to support it. If both my machine and the Crosbys’ machine locked themselves into a path to the old server which was no longer up, they wouldn’t see the new machine.

This didn’t explain, however, why my DNS was resolving to the correct IP, and why my Linux box (which should have a more robust TCP/IP networking stack) was producing the same result. I even SSHed into the GPF server and pinged the Wiki machine and got the same IP. So that couldn’t have been the problem. The Crosbys are physically in South Dakota, while I’m in North Carolina; that makes it unlikely (not impossible, but improbable) that we were going through the same backbone router and thus the same path to the machine. Darren said that the Crosbys were able reach the site through their AOL account, so I tried using my work machine. Although I use the same cable modem as my home machines to access the company network via VPN, all outbound traffic from the company goes through a proxy server, so it should have another route. Like clockwork, I was able to reach the site through there.

By now, Dan Shive was in on the sleuthing. Bear in mind, of course, that none of the three of us are networking experts. Dan and I are both software guys, used to writing high-level languages that call lower level APIs, while Darren is a nuclear physicist who just happened to have system administration thrust upon him. I know the fundamentals of how TCP/IP networking works, but I’m certainly no guru. All three of us were trying to wrack our brains to find out why one set of IPs could reach the site while others couldn’t. And if the Crosbys and I were having problems, who’s to say how many others may be having problems? With Keenswag “down,” this could theoretically be revenue impacting.

I came up with another theory: What if someone DDoSed the ISP’s subnet, and one of the machines attacking them was in the same subnet that my ISP assigned to me. As a defense mechanism, Keen’s ISP blocked all traffic from that subnet, killing the attack but effectively cutting off anyone in that subnet from reaching theirs. Perhaps it wasn’t a DDoS, which is more of a brute-force attack, but more of an individual hacker trying to crack their systems. That’s akin to banning a block of IPs on a forum; you stop a hacker or spammer from messing up the board, but you can potentially keep legitimate users from getting in as well. At Keen, we only choose IP banning as a last resort, and when we do it’s only temporary. Maybe Keen’s ISP had enough problems out of my cable modem’s IP pool that they kept the block up indefinitely.

Darren didn’t seem to agree. While Keen had been the target of a recent DDoS, it was against the forum server, not Keenswag; those are two separate machines. Plus, he didn’t give Keen’s ISP that much credit. In the past, they haven’t been very helpful when it’s come to thwarting DDoS attacks.

I started trying a few more experiments based on his suggestions when Dan suddenly IMed me. While I was typing a response, Darren had posted a new message to the forum. He had checked ifconfig on the Keenswag/Wiki machine and the broadcast IP and netmask were set incorrectly. He asked me to give it another try. Sure enough, I was able to reach the machine just fine. Undoubtedly, the faulty netmask was, purely by chance, blocking both my IP and the Crosbys’ from accessing the machine.

I IMed the results back to Dan, to which he quipped, “How very anti-climatic.”

“That’s why there’s no ‘CSI: TCP/IP Network Debugging,'” I responded.

Dump your own core:

You can skip to the end and dump core. Pinging is currently not allowed.

Be nice. Keep it clean. Stay on topic. No spam. Or else.

You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

You must be logged in to dump core.


«
»