Monday, June 14, 2010

Blocking Sends When ActiveMQ Broker is Down

As it turns out, even though you may have specified useAsyncSend on an ActiveMQ broker, your sends will block if that broker goes down (or if the broker was not running when the client first attempts a send). Don't waste any cycles (like I did) tweaking the delivery mode, whether or not the session is transacted, or other connection settings - the "problem" will occur with use of the failover transport (i.e., my broker URL is failover://(tcp://localhost:61616)) . This transport will automatically attempt reconnection until success, but during these attempts, your client will block. This may not be the behavior you want; I'm not able just yet to give you a satisfying solution, but I can give you some hooks to come up with something on your own (and if so, please let me know!).

One so-called solution is of course to back off using the failover transport - e.g., use TCP instead. But this is heavy-handed and arguably overkill - the failover transport serves an excellent purpose in the face of broker failures, and I'm reluctant to let it go just to solve an exceptional condition. In either event, you'll receive something like a java.net.ConnectException with the TCP protocol when the broker is down; you can try-catch that and do whatever makes sense to you. But that's a questionable workaround. Instead, here's some advice from the ActiveMQ website:

If you use failover, and a broker dies at some point, your sends will block by default. Using TransportListener can help with this regard. It is best to set the Listener directly on the ActiveMQConnectionFactory so that it is in place before any request that may require an network hop. Additionally you can use timeout option which will cause your current send to fail after specified timeout. The following URL, for example

failover:(tcp://primary:61616)?timeout=3000

will cause send to fail after 3 seconds if the connection isn't established. The connection will not be killed, so you can try sending messages later at some point using the same connection (presumably some of your brokers will be available again). 

Now, I first tried the timeout parameter approach - as it turns out, your application will get an exception at the timeout expiration. But this introduces its own problems around performance - if you have thousands (or, even dozens) or messages, each one timing out after say 2-3 seconds, your throughput will be hurt badly. So, I bailed out on this approach and took a closer look at the TransportListener. Here's what I implemented in my publisher class:

    public void onCommand(Object command) { /* EMPTY */ }

    public void onException(IOException error)
    {    Logging.publisher.warn("Transport exception encountered", error);  }

    public void transportInterupted()
    {    Logging.publisher.warn("Transport interrupted! Is Broker down?");  }

    public void transportResumed()
    {    Logging.publisher.warn("Transport connectivity is restored");  }

My test program received callbacks for all of these methods, as expected, except for the onException. The ActiveMQ javadocs aren't much help here ("An unrecoverable exception has occured on the transport"). From this point, I tried combining the timeout parameter with the transport callback - e.g.

    public void transportInterupted()
    {    this.amqFactory.setSendTimeout(100); }

    public void transportResumed()
    {    this.amqFactory.setSendTimeout(-1); }

My strategy here is to minimize the timeout, and hence maximize the throughput in the face of dozens of sends, by reducing the timeout to a bare minimum (and then restoring it to the original timeout value when the broker comes back up). But, I had no success here; the timeout remained at the original value as set in the URL (which I consider to be too high in this context) even after transportInterupted (allegedly) changed it. I'm not willing to set my initial timeout to a low value either, since this may cause exceptions even under nominal operating conditions - not good.

From here, I leave it to you to find some clever use of the transport listener that can deal with a broker that goes down. Good luck, and let me know.


1 comment: