Tuesday, 5 March 2013

nslookup vs /etc/hosts

TLDR: nslookup command does NOT use /etc/hosts for domain name to IP resolution.

Recently I needed to simulate outage of Disqus server providing REST endpoints for commenting.

Because client application, using this service, was deployed into horrible CMS server, changing URL for Disqus service was quite painful. OS on the server is Red Hat Enterprise Linux, so I decided to simply add entry into /etc/hosts translating "disqus.com" domain name to 127.0.0.1 (localhost). Purpose was to hijack DNS resolution into real public Disqus IP address, obtained from DNS server, to make disqus.com calls fail. Dead simple.

I've added line 127.0.0.1 disqus.com into /etc/hosts and executed

nslookup disqus.com
expecting to see the difference, but there wasn't any and nslookup was still returning Disqus public IP address. Suspecting DNS caching for causing this, I've started looking for a way to flush DNS translation cache. Linux standard way seems to have nscd daemon running and refreshing it
nscd -I hosts
should do the trick. But surprisingly, nscd was not running on that server. Neither did any of named, bind, rscd, dnsmasq...

Probably only way to flush dns cache in this situation could be restart of whole networking subsystem (/etc/init.d/network restart), but that wasn't something I could do on that server.

Then, just to be double sure, I executed

ping disqus.com
and spotted that it is pinging 127.0.0.1 as I wanted! But nslookup, executed again, was still showing public Disqus IP address!
As it has turned out that, nslookup is always doing DNS server lookup and it is ignoring your /etc/hosts file

Lesson learned - Don't test your /etc/hosts changes with nslookup. Use ping instead.

It is still some DNS caching done in Java, because ping reflected IP changes immediately, but application running in Java server changed IP after while. I haven't measured it precisely, but it seemed to be about one minute. As it turned out, this delay was actually caused by Varnish and has nothing to do with Java... sorry

All of this happened of RHEL 5.7 (cat /etc/*-release)

PS: Just for completeness. To make sure that /etc/hosts file will take precedence over DNS servers, in process of hostname resolution, following should be true
  • File /etc/host.conf should contain line order hosts,bind
  • File /etc/nsswitch.conf should contain line hosts: files dns

Monday, 4 March 2013

ConnectTimeoutTest

From time to time I need to simulate one special type of timeout and that is connect timeout.

It differs from more common read timeout, which may happen when connection is already opened, request is sent, but it takes too long to get response. Connection refused is also different beast, meaning that server actively refused your request, probably while you have been accessing wrong host or port. You should not ignore this connect timeout just because it is less common. When it hits, it will get you down you down as quickly as read timeout.

Connect timeout can occur during connection opening process. That is after connection attempt was not refused and before any possible read timeout.

Fairly rare occasions lead to this type of timeout. It may be firewall misconfiguration silently throwing away your packets, network failure or server can be dying down being unable to even to open socket correctly. Any of those conditions is quite hard to meet so when you need to write some piece of software anticipating this type of timeout, some deterministic way to induce it on demand will come handy.

Basic idea is to create ServerSocket with one position long backlog. Backlog is basically request queue and if we make artificial request to fill it up, any consequent request is doomed.

Update: Backlog queue implementation is differs between platforms. This does work on Mac OS X and does NOT on Linux or Windows. Damn it!

Test method uses standard java HttpURLConnection to make the request and it's setConnectionTimeout to set connection timeout to some rasonable value.

Here comes the complete test case:

Oddly enough, different exception (java.net.ConnectException: Operation timed out) is thrown after 75 seconds in case that connection timeout is NOT set, comparing to case when it IS set (java.net.SocketTimeoutException: connect timed out).

Bonus code! Setup code for Apache Httpclient might look like:

HttpConnectionParamBean connectionBean = new HttpConnectionParamBean(httpParams);
connectionBean.setConnectionTimeout(getConnectTimeout());//httpParams.setParameter(CoreConnectionPNames.CONNECTION_TIMEOUT, 1000L);
connectionBean.setSoTimeout(getReadTimeout());//httpParams.setParameter(CoreConnectionPNames.SO_TIMEOUT, 5000L);
DefaultHttpClient httpClient = new DefaultHttpClient(httpParams);

Enjoy your timeouts!