hur.st's bl.aagh

BSD, Ruby, Rust, Rambling

Micro-Optimising in JRuby

Calling Java from JRuby for fun and profit

[ruby] [jruby] [java]

One of the neatest bits of JRuby is the simple way you can call out to Java. There's a lot of Java out there, and you can wrap it up in nice little Ruby interfaces with just a few lines of code.

Let's illustrate with some trivial examples, hooking up bits of Java to Ruby and seeing how well they perform compared to the equivalent pure Ruby.

Current time in ISO format🔗

We rarely think about the cost of generating a timestamp, and that's probably quite fair given I can make nearly 130,000 of them every second:

Time.now.utc.iso8601    128.908k (± 3.9%) i/s -    644.166k in   5.006521s

But can we do better? And with how much effort? Turns out, yes, and not much:

ISODateFormatter = java.time.format.DateTimeFormatter
                                   .ofPattern("yyyy-MM-dd'T'HH:mmX")
                                   .withZone(java.time.ZoneOffset::UTC)
Instant = java.time.Instant

def java_time_now_iso8601
  ISODateFormatter.format(Instant.now)
end

Half a dozen lines of Ruby buy us nearly five times faster timestamps:

    ISODateFormatter    608.627k (± 2.4%) i/s -      3.053M in   5.020228s

Or we can exploit the fact that Instant's default string representation is documented as ISO8601 (albeit a slightly different variant with millisecond precision):

    Instant.now.to_s    832.573k (± 2.8%) i/s -      4.179M in   5.024787s

It would of course take quite an idiosyncratic application for this to make a meaningful difference, but maybe it's a small enough tweak to live in a high performance Logger.

What about something a bit more practical?

Format number with commas🔗

123456789 is a lot more readable as 123,456,789, and some applications have a lot of numbers to format. A typical FreshBSD page has on the order of a thousand, some have tens of thousands.

Here's a traditional pure-Ruby helper you might find in any Rails application:

DELIMITED_REGEX = /(\d)(?=(\d\d\d)+(?!\d))/

def number_with_delimiter(number, delimiter: ',')
  left, right = number.to_s.split('.')
  return unless left
  left.gsub!(DELIMITED_REGEX) do |digit_to_delimit|
    "#{digit_to_delimit}#{delimiter}"
   end
  [left, right].compact.join(delimiter)
end

Let's benchmark it, using a random distribution of numbers within a few ranges:

number_with_delimiter(0-100)
                        340.342k (± 2.3%) i/s -      1.703M in   5.007786s
number_with_delimiter(0-10000)
                        175.644k (± 2.6%) i/s -    883.404k in   5.033629s
number_with_delimiter(0-1000000)
                        138.307k (± 2.5%) i/s -    694.350k in   5.024122s

Around 200,000 per second. I'm not going to loose sleep over that, but some pages are sure to be spending a significant fraction of a second just in this little helper.

What can a few lines of Java interfacing buy us?

JavaNumberFormatter = java.text.NumberFormat
                        .getInstance(java.util.Locale.forLanguageTag("en-GB"))

def java_number_format(number)
  JavaNumberFormatter.format(number)
end

Well, we didn't manage to match the semantics precisely, since the format is defined by the locale rather than a string literal, but for our needs it's just fine. Is it any faster?

java_number_format(0-100)
                          1.334M (± 1.8%) i/s -      6.667M in   4.998806s
java_number_format(0-10000)
                          1.177M (± 2.4%) i/s -      5.884M in   5.002741s
java_number_format(0-1000000)
                          1.122M (± 2.1%) i/s -      5.636M in   5.026434s

Uh, yeah, by nearly 7x. Developers of JRuby spreadsheet applications rejoice.

Respecting the Commons🔗

Let's try something a bit different: Jaro-Winkler distance, an algorithm for finding the edit distance between two strings. It's used by Rubocop for finding candidates for typos.

This is a bit more involved, because we need some dependencies. On the Ruby side, we'll use the jaro_winkler gem, which falls back to a pure-Ruby version on JRuby, and on the Java side, we'll use the venerable Apache Commons Text.

All we need to do is drop the .jar in our $LOAD_PATH and require it to have access to all its goodies:

require 'commons-text-1.4.jar'

module Similarity
  include_package 'org.apache.commons.text.similarity'
end

JaroWinklerDistance = Similarity::JaroWinklerDistance.new
JaroWinklerDistance.apply("MARTHA", "MARHTA") # => 0.9611111111111111

Eat your heart out FFI. How much faster than the Rubygem is it?

             rubygem     57.117k (± 2.9%) i/s -    287.280k in   5.033660s
             commons      1.216M (± 2.3%) i/s -      6.081M in   5.005071s

A handsome reward for such little effort.