Choose the option that best describes you.

Friday, May 2. 2008

broken cables, leaky switches

I had one hell of a networking problem crop up at a client the other day. I was called in to troubleshoot an odd problem where some machines suddenly could not get to the internet... sometimes.

Damn I just LOVE problems that start like this. I poked around and checked their router and rebooted their shittier-than-turds Dell VLAN switches and came up with nothing.

The router has several interfaces and the only ones being affected where those which have VLAN's on them, so I was fairly suspicious of those Dell switches since they've given me trouble with VLAN's before, but the symptoms here were downright bizarre.

I did some packet monitoring and some other experimentation and came up with the following observations:

- Local internetwork traffic is mostly passed just fine (there was one case where we saw some packet loss from one VLAN subnet to another but it could have been a fluke)
- From the affected machines, Internet bound traffic goes in the switch and never comes out-- doesn't even make it to the router.
- Moving devices from one switch port to another would sometimes make them work--or if they were working, fail to work. But the problem was never static across any given ports.

After several hours of troubleshooting and headscratching, I had but one pathetically weak theory. I knew that new device had been connected to the network, and we eventually found that after unplugging this device, the problem seemed to gradually clear up, however plugging the device in did not reintroduce the problem, at least not within a reasonable period of time. The new device was connected across a hallway using a patch cable that was getting walked on. The cable under the hallway rug was stupid yes, but I didn't immediately associate it with this strange network-wide problem, but that became my main hypothesis, because everything else had been eliminated.

So we tested it. We went and started jumping up and down on the cable and sure enough, some of the machines on the network lost Internet connectivity. And unplugging the cable and rebooting the switches restored it.

So there you have it. Bad patch cables can seriously fuck up your network, especially if you're using Dell switches. But wait, how does one broken cable connected from a switch to just one device cause only certain packets with certain destinations to disappear? I HAVE NO FUCKING CLUE!