Friday, April 5, 2013

Fixing code and binary incompatibilities for cross Scala version library development

Scala is a fantastic language that unfortunately has a tradition of having no binary compatibility between versions. The result is that library developers have to go through a lot of pain to release their software for multiple scala versions. Even though starting with scala 2.9 minor versions are binary compatible, with scala 2.10 the situation has worsened because there are now some code incompatibilities as well.

This post shows some techniques for library developers to build releases against multiple scala versions, taking care of binary and code incompatibilities.

SBT — Simple Build Tool

The only viable option I know to build cross scala versions is SBT (Simple Build Tool). I am going to assume you are somewhat familiar with SBT. The most important cross build settings in your build.sbt are (full version on Github):

scalaVersion := "2.10.1" crossScalaVersions := Seq("2.9.1", "2.9.1-1", "2.9.2", "2.10.1") crossVersion := CrossVersion.binary

Key scalaVersion sets the current scala version to use, key crossScalaVersions contains all scala versions to use during cross builds.

The last settings has the effect that the correct scala version is appended to the name of your artifact. ‘Correct’ in this case means the full version for scala versions 2.9.x and lower, or just the 2 highest numbers for 2.10.0 and later. So if you have a setting name := "libname", the generated artifacts will be named libname_2.9.1, libname_2.9.1-1, libname_2.9.2 and libname_2.10.

Kick of a cross build by prepending ‘+’ to your command. E.g. sbt +test.

Code incompatibilities

Scala 2.10 brings some nasty code incompatibilities. The popular Akka library for example has partly moved into the main scala library. The consequence is that code for scala 2.9 needs to depend on Akka and import akka.dispatch.Future, while code for scala 2.10 needs no additional dependencies and import scala.concurrent.Future.

Another example are the changes around concurrent maps. In 2.9 one needs to do new java.util.concurrent.ConcurrentHashMap[A, B](1024).asScala to get a scala.collection.mutable.ConcurrentMap. In Scala 2.10 you are better of with scala.collection.concurrent.TrieMap.empty to get a scala.collection.concurrent.Map. All interfaces stay the same while all names changed.

Dependency incompatibilities

To define dependencies based on the current scala version you can use the following trick:

libraryDependencies <++= (scalaVersion) { v: String => if (v.startsWith("2.10")) Seq("com.yammer.metrics" % "metrics-core" % "2.1.5", "org.specs2" %% "specs2" % "1.13" % "test") else if (v.startsWith("2.9")) Seq("com.yammer.metrics" % "metrics-core" % "2.1.5", "com.typesafe.akka" % "akka-actor" % "2.0.5", "org.specs2" %% "specs2" % "1.12.3" % "test") else Seq() }

Fixing code incompatibilities

If code needs to differ between scala versions, the easiest way is to have multiple source roots. E.g.:

libname/ build.sbt src/ main/ scala/ scala_2.9/ scala_2.10/ test/

Add the following to your build.sbt to make it possible:

// The following prepends src/main/scala_2.9 or src/main/scala_2.10 to the compile path. unmanagedSourceDirectories in Compile <<= (unmanagedSourceDirectories in Compile, sourceDirectory in Compile, scalaVersion) { (sds: Seq[java.io.File], sd: java.io.File, v: String) => val mainVersion = v.split("""\.""").take(2).mkString(".") val extra = new java.io.File(sd, "scala_" + mainVersion) (if (extra.exists) Seq(extra) else Seq()) ++ sds }

Example code for 2.9 (full version on Github):

package nl.grons.sentries.cross object Concurrent { type Future[+A] = akka.dispatch.Future[A] val Future = akka.dispatch.Future val Await = akka.dispatch.Await type CMap[A, B] = scala.collection.mutable.ConcurrentMap[A, B] def defaultConcurrentMap[A,B](): CMap[A,B] = new java.util.concurrent.ConcurrentHashMap[A, B](1024).asScala }

Example code for 2.10 (full version on Github):

package nl.grons.sentries.cross object Concurrent { type Future[+A] = scala.concurrent.Future[A] val Future = scala.concurrent.Future val Await = scala.concurrent.Await type CMap[A, B] = scala.collection.concurrent.Map[A, B] def defaultConcurrentMap[A,B](): CMap[A,B] = scala.collection.concurrent.TrieMap.empty }

The rest of the code can now use the type aliases and references from here. E.g. nl.grons.sentries.cross.Concurrent.Future refers to Akka for scala 2.9 and to the standard library for scala 2.10.

Conclusions

With some hackary SBT allows you to define dependencies and source roots based on the current scala version. This allows you to overcome scala’s incompatibilities if you are a library developer that builds releases for multiple scala versions.

The techniques described in this post were developed for Sentries. The code is on Github.

Wednesday, February 6, 2013

Breaking the Circuit Breaker

The circuit breaker is this wonderful pattern to protect your application against resources that fail slowly. The idea is that you stop trying to use a resource when it has too many failures. Regular retries test the resource and will make the resource available again. The benefit is that your application can react quickly to a failed resource instead of hogging CPU, threads, network, etc. while you are waiting to find out the resource is unavailable.

So what's wrong?

Its the metaphor. In the classical description a circuit breaker has 3 states: the open state, the closed state and the half-open state. So what does it mean when the circuit breaker is open? When is a bridge open? When you can drive over it, or when you can sail through it? Only when you look at the first image you may see that a traditional open circuit breaker stops flow of electricity. To us that translates to no usage of the resource. In the 'closed' state electricity flows, which translates to having access to our resource. Now read that again and see if you can remember that!

Then we have a half-open state? Again, look at the first image. For such a switch half-open is still open. (A half-open bridge lets no traffic trough at all but that is another topic.) Why do we need the half-open state anyway? In the classical description we attempt to use the resource once while in this state. If it fails just once, we go back to the open state. This seems like a good idea, but let us think of modern networked applications. In such applications many requests are done simultaneously. So as soon as we switch to the half-open state for a retry, many, maybe hundreds of request will immediately try to use the resource, even if it is still down. This is exactly what we were trying to prevent!

Stop!

Although the circuit breaker is a great invention, I think we need a new metaphor, or at least some new terminology.

No more half-open

The first thing we can do is get rid of the half-open state. Instead, when its time to retry, we just let 1 client through to the resource. While that check is in progress we keep denying access to the resource for other clients; we stay in the same state. Only when the single check succeeds, we switch to the state in which we allow full access to the resource.

No more open

The second thing we need to do is to end the confusion on what it means to be 'open'. Instead I propose we call this state the broken state. No further explanation required. Good. In this state we do the regular retries.

Finally, to make things symmetric, I propose to rename the 'closed' state to flow state as all requests are granted.

Metaphor

Above I proposed new terminology but I failed to provide a new metaphor. Unfortunately metaphors are hard to find and too easy to get wrong. Perhaps a good metaphor should be related to the fact that we are limiting the number of errors we tolerate from a resource. If you have an idea, please let me know in a comment. I hope you liked my little rant. Any comments are always welcome.

—   ❧   —

Postscript: Sentries and the circuit breaker

The Sentries library contains a highly optimized circuit breaker implementation for Scala programs. The ideas in this article developed while writing Sentries. Feel free to have a look. As you can see there are only 2 states, the FlowState and the BrokenState. Note that the retryAt in BrokenState is a val; it can not be changed after initialization. When it is time to retry we replace the broken state with a new instance (in method attemptResetBrokenState).

Sunday, September 2, 2012

Introducing Sentries

I recently needed a friendly to use circuit breaker in a Scala program. What I found was okay, but not nearly good enough for high volume, highly concurrent applications. So I set out to make it better.

After some iterations of changing I realized that the circuit breaker could be combined with other stuff like rate limiting, monitoring and such. Now, three months later, Sentries is ready for the world.

Here is an example usage: Please visit Sentries on Githib and let me know what you think.

Update 2012-09-04: Version 0.2 will be available in Maven central around 2012-09-04 18:00 GMT.

Update 2012-10-08: Version 0.3 has just landed in Maven central. Its only feature is the new 'clean()' method on the sentry registry which is useful during testing.

Update 2013-02-06: Version 0.5 is out. This is the first to support Scala 2.10. Breaking change is that all durations are now of type akka.util.Duration or scala.concurrent.duration.Duration (depending on the Scala version) instead of Long.

Friday, April 27, 2012

Getting size of a file in shell script

This is really too silly. There seems to be no consistent way to get the size of a file (in bytes) on multiple platforms. Here is a solution that works in bash on Linux (tested under Debian and Ubuntu) and BSD (tested under OSX):

# Echo's size of a file (first argument). # Tested under Linux (Debian) and BSD (OSX). function filesize() { echo $(stat --format=%s "$1" 2>/dev/null || stat -f '%z' "$1" 2>/dev/null) } # Example usage: echo $(filesize readme.txt)

Monday, January 16, 2012

X forwarding as root

Often is useful to run jconsole on a remote (production) machine. One basically has 2 options to do so. First, you can pierce your firewall and let your application listen to the appropriate JMX and RMI ports. This however always tricky (in particular the RMI). In addition, creating the connection string that jconsole accepts is not nice at all.

The other option, is to do X forwarding. I'll describe here how to do this on MacOS X, with jconsole on any Unix server with, ssh-server, no root password and sudo installed.

  1. Start X11 (you'll need to install it from the installation CDs).
  2. Start iterm2.
  3. Login to the target system with ssh -X remote_user@system
  4. Switch to root with sudo -s.
  5. Get X authorities with xauth merge ~remote_user/.Xauthority
  6. Start jconsole

Tuesday, December 13, 2011

Breaking a Java HashSet

Can it be?

Set set1 = new HashSet(5); Set set2 = new HashSet(5); // add of bunch of strings to both sets assert set1.equals(set2) == false;

Yes it can!

We actually had this problem in an integration test. The cause was that the strings to one set were added concurrently. Interestingly, the sets seemed to be the same, when printed they contained the same strings, just in a different order (which is also interesting). However, the internals were apparently so damaged that an equals invocation return false.

We replaced the HashSet with a ConcurrentSkipSet, and all was fine again.

Conclusion: Okay, so this is a boring conclusion, but just be careful with using normal collections in concurrent situations.

Monday, August 15, 2011

DNS for Version 99 is off-line

To alleviate some pain with using the commons-logging framework, I created version 99. It was hosted at the hostname no-commons-logging.zapto.org courtesy of no-ip.com. Unfortunately, due to lack of traffic I had to affirm usage of the the hostname once per month. Though this was slightly annoying, a complete pain is that I missed the last deadline, and that recreation is out as dashes in hostnames are no longer allowed.

As of 2011-08-15 16:11:00 GMT no-ip did credit to its name; no ip for no-commons-logging.zapto.org.

As 1) the current situation is disruptive for any version-99 user anyway, 2) there is a better workaround using the provided scope, I decided to not put back version 99 online under another hostname.

If you really need to use version 99 for some more time, just add an entry to your /etc/hosts file:

83.163.41.27 no-commons-logging.zapto.org

As promised, I'll keep version 99 for 5 years, that is until October 2012. Please check back here for IP address changes.