Know your tools!


It’s Wednesday afternoon and your Skype chat with Grandpa suddenly gets choppy. It’s Monday morning and your colocated Asterisk server loses its connection for half a second, causing a huge influx of calls. Or worse – Your SSH session to your server is simply slow for no good reason. Before sending that brief and offensive email to your ISPs or hosting company with 1 line of broken English and 3 pages of traces – know your tools.

Traceroute and similar software such as Mtr and Tcptrace are great commands, but for the love of god, use them properly! You may not know this, but when you submit that ticket to your ISP (or host), you’re making a first impression that will impact the amount of attention your issue will receive. This could make the difference between an immediate resolution or hours and hours of frustration. The first thing the ISP will attempt to determine is if they’re having a major network issue. Meaning if there’s an influx of  complaints from other users. If not, its an uphill battle. Their network is the best in the world until you prove otherwise. To do this, you must present yourself to be technically savvy and show them you’re not just another newb sending an email that looks something like ‘NETWORK IS SLOW AGAIN SEE TRACE BELOW!!’

In order to troubleshoot the issue you need a few key pieces of information. Your source address, the destination address, access to running commands at the source, and hopefully access to run them from the destination as well. Remember all of the communication is round trip. A trace or MTR in 1 direction is only going to give half the story. The majority of ISPs and hosts out there today are rolling with a good amount of asymmetrical routing.

Start with the basic troubleshooting:

  • Run a PING to the destination, verify you’re seeing some sort of packet loss.
  • Run the same from the destination back to you. This is very important. If you don’t see any loss, then focus on the path from the source to the destination.
  • Run an MTR, look for nodes that have specific loss. Be very careful here. Many routers are going to rate-limit ICMP packets and just because they’re showing loss does NOT mean they’re having a problem. If a hop shows loss, and every hop after that displays the same % of loss or higher, than you can be confident this is the hop with an issue.

Why be so re-active? What if there is loss at one of the hops? How do you know it wasn’t there yesterday? If you have an important link then it’s in your best interest to monitor it with software designed to do so such as smokeping and solarwinds, or better yet – use a 3rd party services. These tools will alert you the second performance starts to degrade. 3rd party services such as Gomez or Webmetrics can monitor your connectivity from popular locations around the world and even obscure countries you may not have known even exist. Using these tools is going to take guess work out of tracing and MTRing hastily when there is an issue. Instead, you’ll be using them to confirm what your monitors are showing you. You can also copy and paste the graphs and details from these tools directly to your ISP. Show them that for the last 6 months the average speed was X, then an hour ago, from 15 locations around the globe, it suddenly dropped. Trust me, they’re going to take you much more seriously and you’ve also provided them with a wealth of technical information to actually help troubleshoot the problem quicker.