hur.st's bl.aagh

BSD, Ruby, Rust, Rambling

Password Generation in Ruby and Rust

Writing the same small program in two different languages.

[ruby] [rust]

I've been doing a fair bit of Rust lately. Honestly, I haven't been so smitten with a language since I started writing Ruby back in 1999.

Rust is many of the things Ruby isn't—precompiled, screaming fast, meticulously efficient, static, explicit, type-safe. But it's also expressive and, above all, fun. I think this rare mix makes it a good companion language for Ruby developers, particularly with things like Helix and rutie making it easy to bridge the two.

The best way of learning is by doing, so why not avoid that and just read about me doing something instead?

The Exerciseđź”—

I'm going to be making this simple password generator, once in Ruby so we have something familiar to reference, and once in Rust, to get a feel for what the same sort of code looks like there:

-% simplepass --separator . --length 4 --number 6 --dictionary /usr/share/dict/words
leprosy.hemispheroid.diagnosable.antlerless
omnivalent.nonstellar.Latinate.convenient
narghile.mortally.toytown.heteroeciousness
blastplate.spectrological.kenosis.cheddite
gyrose.gooserumped.rastik.jigger
cogency.widow.sealant.banausic

This is a nice little starter project, exercising a reasonable subset of a language without biting off more than we can chew.

Starting Upđź”—

If you don't already have Rust installed, rustup is more or less its equivalent of rbenv or rvm. Or if your OS offers a native package, by all means use that.

Once we're ready, we'll want to make a project using cargo:

-% cargo new simplepass && cd simplepass
     Created binary (application) `simplepass` project
-% cargo run
   Compiling simplepass v0.1.0 (file:///home/freaky/code/simplepass)
    Finished dev [unoptimized + debuginfo] target(s) in 1.00s
     Running `target/debug/simplepass`
Hello, world!

I won't hold your hand too much here—cargo will feel fairly familiar if you're used to gem and bundler.

Tip: cargo install cargo-edit.

Argument Parsingđź”—

First we need to parse our command line, handling errors and providing a useful --help. Not by hand, obviously, we're not savages.

Rubyđź”—

There are lots of argument parsing libraries for Ruby, but I like to minimise run-time dependencies, and we have minimal needs, so let's just use the stdlib optparse:

require 'optparse'

Options = Struct.new(:length, :number, :separator, :dict)
                .new(4, 1, ' ', '/usr/share/dict/words')

OptionParser.new do |opts|
  opts.on('-l LEN', '--length LEN', Integer, 'Length of the password') do |v|
    Options.length = v
  end

  opts.on('-n NUM', '--number NUM', Integer, 'Number of passwords') do |v|
    Options.number = v
  end

  opts.on('-s SEP', '--separator SEP', 'Word separator') do |v|
    Options.separator = v
  end

  opts.on('-d FILE', '--dictionary FILE', 'Dictionary to use') do |v|
    Options.dict = v
  end
end.parse!(ARGV)

Could be a bit more declarative—we're having to bridge the gap between our options Struct and the flags by hand, but it's all pretty straightforward.

Usage: simplepass [options]
    -l, --length LEN                 Length of the password
    -n, --number NUM                 Number of passwords
    -s, --separator SEPARATOR        Word separator
    -d, --dictionary FILE            Dictionary to use

Rustđź”—

Rust's standard library is quite small, so we're going to need to slurp in a dependency for this unless we want to be bashing rocks together. Thankfully, Rust both has great dependency management, and also statically links by default—everything will be in one self-contained executable.

We have a lot of choice, but my favourite by far is structopt:

-% cargo add structopt
      Adding structopt v0.2.10 to dependencies

Just like with bundle add editing Gemfile, this edits our Cargo.toml so Rust knows what we're talking about when we say:

#[macro_use] extern crate structopt;

This is a bit like gem 'structopt'—it tells Rust we're using a crate. We're also telling it we're going to be using the macros it defines.

Macros are Rust's metaprogramming special-sauce, allowing for flexible code generation at compile time—that's where our argument parsing code is going to come from, specialised code generated specifically for our purposes.

use structopt::StructOpt;

Next, we use the StructOpt trait, in order to bring the methods we need in it into scope. Traits are a little bit like Ruby mixins—groups of methods that can be added to other types—and they form the basis for a large chunk of the Rust type system.

For example, IO in Rust works in terms of Read, Write and Seek traits, which can be implemented by any type. Methods that use IO-capable types limit themselves to the traits they need, rather than to concrete types. You can think of this as a bit like explicit duck typing—you don't care if it's a File or a Socket or a StringIO, you care if it supports read(), write(), and seek().

You can also see a hint of refinements in this—traits are only available if you use them. You're free to implement your own traits on other types, without fear of polluting the global namespace.

#[derive(StructOpt)]

derive is a way of asking Rust to generate code for us—in this case we're asking it to derive argument parsing code from the structure we're about to define, using the procedural macros slurped in from structopt.

#[structopt(name = "simplepass")]
struct Options {
    /// Length of the password
    #[structopt(short = "l", long = "length", default_value = "4")]
    length: usize,

    /// Number of passwords
    #[structopt(short = "n", long = "number", default_value = "1")]
    number: usize,

    /// Word separator
    #[structopt(short = "s", long = "separator", default_value = " ")]
    separator: String,

    /// Dictionary to use
    #[structopt(
        short = "d",
        long = "dictionary",
        default_value = "/usr/share/dict/words",
        parse(from_os_str)
    )]
    dict: std::path::PathBuf,
}

Now we define our struct, giving it named fields with appropriate types, decorating it with documentation comments (///) and using structopt() attributes to control the argument parsing code generation.

The only slightly tricky bit here is the filename handling. Rust Strings are always UTF-8, but filenames are OS-dependant—on Unix they can be almost any string of bytes except NULL and /, on Windows they're a wonky 16-bit Unicode format.

PathBuf is a type that abstracts away these details. It's not that we can't just use a String, but if we do that, our program won't necessarily work when it should.

Interestingly, our --help is a fair bit fancier: thanks to the Cargo.toml, structopt knows who I am and what version this has:

simplepass 0.1.0
Thomas Hurst <tom@hur.st>

USAGE:
    simplepass [OPTIONS]

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
    -d, --dictionary <dict>        Dictionary to use [default: /usr/share/dict/words]
    -l, --length <length>          Length of the password [default: 4]
    -n, --number <number>          Number of passwords [default: 1]
    -s, --separator <separator>    Word separator [default:  ]

Dictionary Loadingđź”—

Next up, we want to load the dictionary—a list of line-separated words. On Linux, BSD, etc you should have one in /usr/share/dict/words, so we'll default to that.

As a bit of defensiveness, we'll strip whitespace, and ensure the words are both unique and non-empty.

Rubyđź”—

dict = begin
  File.readlines(Options.dict)
      .map(&:strip)
      .reject(&:empty?)
      .uniq
rescue SystemCallError => e
  abort("#{Options.dict}: #{e.message}")
end

Quite straight-forward, but a little inefficient—we're making four separate Array instances here, one with each line, one with each stripped line (with a copy of the line), one without blank lines, and finally an Array without any duplicates.

We can avoid this by being a little less idiomatic and mutating in place:

  File.readlines(Options.dict).tap do |lines|
      lines.each(&:strip!)
      lines.reject!(&:empty?)
      lines.uniq!
  end

Interestingly, TruffleRuby ought to be able to do this sort of optimisation for us, eliding the temporary intermediate instances automatically without us having to sacrifice safety or looks.

Rustđź”—

This is a bit more involved, and a lot less familiar, so I'll decompose it some.

fn main() -> Result<(), String> {

Unlike Ruby, Rust needs an entry-point function for your application. Like C, it's called main. Also like C, error handling is done by returning things from functions, though Rust does it in a rather more structured way.

The bit after the -> is our return type, which probably looks a bit weird to you. Result is an enum, a so-called sum-type, an abstract type that is made up of one of several possible variants. Two, in this case:

enum Result<T, E> {
  Ok(T),
  Err(E)
}

The <..> bits are the type parameters, and we're passing in () (read: nothing) for the Ok side and String for the Err side.

If we return an Err, Rust's built-in error handling for main will exit with our message and a non-zero exit code.

    let opts = Options::from_args();

Unlike Ruby, Rust demands we declare our variables explicitly with let. We can also specify a type (let opts: Options = ...) but Rust tries very hard to work it out from context.

from_args() is implemented in that structopt::StructOpt trait we slurped in earlier, building an instance of our Options struct from the command-line arguments.

    let dict = std::fs::read_to_string(&opts.dict)

This is where our Err can come from—opening the file and slurping it into a String.

We have to pass in the filename using a &—lending it a reference, so we retain ownership of the value itself—otherwise it would want to move into the function we're calling.

This is part of Rust's "big gamble", ensuring you're very precise about the ownership of data in your program. It can be tricky to get used to, but the payoff is efficient, predictable automatic resource management, safer and more explicit mutability, and by virtue of that, a guarantee that data races simply cannot happen.

        .map_err(|e| format!("{}: {}", &opts.dict.display(), e))?;

So, that Result I mentioned? That's what read_to_string returns, not a String, but a Result<String, io::Error>. With map_err, we're asking the Result to transform the Err side of things from that rather clinical Err(io::Result) into a formatted Err(String) containing the filename.

As you might imagine, there is also a map() for transforming the Ok(String) side.

Finally we have the question mark operator. It's easy to miss, but fear not—the compiler would complain if we missed it thanks to its type checks.

If you've ever looked at Go, you'll have seen if err != nil { return _, err } just about everywhere. This pattern puts a lot of people off, considering how often you need to write it in any non-trivial application.

Rust recognises the pain of this, and reduces all that boilerplate down to a single character, ?. It will either return the entire function with the Err(String) for the caller to handle, or it'll unwrap the OK(String) to a plain String for our function to continue with.

An Interludeđź”—

If this is all a bit confusing, let's take a quick Ruby break, and imagine how Result might work in the context of a familiar language:

module Result
  def initialize(thing) @thing = thing end
  def map() self end
  def map_err() self end
  def unwrap() expect("unwrap failed") end
end

class Ok
  include Result

  def map() Ok.new(yield @thing) end
  def expect(str) @thing end
end

class Err
  include Result

  def map_err() Err.new(yield @thing) end
  def expect(str) abort(str) end
end

success = Ok.new("it worked")
failure = Err.new("it didn't work")
success.map(&:length).map_err(&:upcase) # Ok(9)
failure.map(&:length).map_err(&:upcase) # Err("IT DIDN'T WORK")
success.expect("it should have worked") # => "it worked"
success.unwrap                          # => "it worked"
failure.expect("it should have worked") # => aborts with "it should have worked"

It's worth thinking about this pattern, and the other methods you might implement. Perhaps you could have default values for failures, or chain together multiple Results, or even make them Enumerable? This is basically how errors work in Rust.

The ? operator would replace this sort of boilerplate:

dict = case result = File.read_to_string(file)
       when Ok then result.unwrap
       when Err then return result
       end
# or...
dict = File.read_to_string(file)?

If you're interested in seeing how the Result pattern might be used in Ruby, you might look at dry-monads.

Back to Rustđź”—

    let mut dict: Vec<&str> = dict
        .lines()
        .map(str::trim)
        .filter(|s| !s.is_empty())
        .collect();

Wait, didn't we already use dict for a String? How is it now a mutable Vec<&str>, given Rust is statically typed?

While that is true, we're not changing the original variable here—we're shadowing it with a new variable with the same name. This is a relatively common pattern with Rust—reusing a simple descriptive name can, at times, be clearer than having to give every step in a transformation a brand new one.

lines() returns an iterator over slices of the String on line boundaries. Slices aren't standalone objects, but references to chunks of existing ones, making them very efficient—little more than a pointer and a length. They reference the original dict, and Rust will make sure they don't outlive it.

The call to map trims the slices, similar to Ruby's map(&:strip). Here we're referring to the trim method using its fully qualified name.

filter() is basically Ruby's select()—unfortunately standard Rust has no reject(), so we use Rust's syntax for a closure here instead, much like the Ruby select { |s| !s.empty? }.

Finally, we collect() into the final Vec<&str> - a vector (array) of string slices. It's important to note that nothing actually happens until this point—collect() drives the iterator, which is otherwise completely inert, like a Ruby lazy Enumerator.

Like a lazy Enumerator, there are no intermediate vectors here—each stage runs a step at a time: finding the next line, trimming the resulting slice, and if it isn't empty, pushing it onto the Vec.

    dict.sort_unstable();
    dict.dedup();

Now we want to deduplicate the dictionary. In Ruby, uniq builds a hash table so it can remove all duplicates from arbitrary collections, but with considerable memory cost.

Rust's dedup() takes a much cheaper path: iterate over the collection and remove consecutive repeated elements. This is less flexible, but consumes very little memory.

Because it can only deduplicate consecutive items, we need to sort our dictionary. sort_unstable() is fast and in-place, but can swap the order of already-sorted items (i.e. it's allowed to use a quicksort). If we cared about that, and were willing to use more memory, we could use the more conservative sort() (which uses a variant of merge sort).

Alternatively, we could have used a similar approach to Ruby—collecting into a HashSet, for example. You might like to try that.

Password Generationđź”—

Now we need to loop over our password count, securely pluck out entries from our dictionary, and join them with our separator, before printing the result.

Rubyđź”—

Options.number.times do
  password = Options.length.times.map do
    dict.sample(random: SecureRandom)
  end.join(Options.separator)

  puts password
end

That's quite pretty, don't you think? Each line has a specific meaning, mapping precisely to our task with minimal noise. Go Ruby.

We're careful to use SecureRandom, and not the default, relatively predictable random number generator, though I had to prove to myself that it would notice if I misspelled the keyword and left it at the default...

Rustđź”—

Again, we'll need a crate here, this time for random selection. rand is a de-facto standard for this:

extern crate rand;
use rand::Rng;

// later, in main()...

    let mut rng = rand::EntropyRng::new();

Just like with structopt, we tell Rust we're using the crate, use the rng trait we need out of it, and finally we instantiate EntropyRng, its generic secure random generator.

    use std::iter::repeat_with;

    let mkpass = || {
        repeat_with(|| rng.choose(&dict).expect("dictionary shouldn't be empty"))
            .take(opts.length)
            .map(|s| *s)
            .collect::<Vec<&str>>()
            .join(&opts.separator)
    };

|| { .. } is how Rust spells lambda { || .. }, so we're making a block of code (a closure) and stuffing it into mkpass, capturing local variables from the environment like we might in Ruby.

repeat_with makes an iterator that calls the closure repeatedly (we use it so we don't need to spell out the full name later); take() is just like the method of the same name in Ruby, it limits us to the first n elements.

But what's that map(|s| *s) doing? rng.choose() returns a reference to the item it selects, so we're getting a &&str instead of a &str. So we apply the dereference operator * to get back our &str.

Finally, we collect into a vec: this time using the beloved turbofish operator to specify the type of thing we want it to collect into, and then we join in a mostly-familiar way to get our final password.

    for password in repeat_with(mkpass).take(opts.number) {
        println!("{}", password);
    }

Finally, we iterate over repeated calls to the closure we just made, and print their result. There are other ways we could have written this: for example, iterating over a range, or using for_each. Give them a try, see which you prefer.

Dubious Expectationsđź”—

If you've been paying attention, that expect() in mkpass should be bugging you.

rng.choose(&dict).expect("dictionary shouldn't be empty")

We're explicitly advising Rust to panic if our expectation isn't met:

-% simplepass -d /dev/null
thread 'main' panicked at 'dictionary shouldn't be empty', libcore/option.rs:1000:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.

Panics are a bit like Exceptions—they can actually be caught—but they have meaning more like abort, as a safe means of exiting a program, or occasionally a thread, because something went unexpectedly wrong.

Often expect and unwrap are used during development as a placeholder for future error handling, but they can also be used as a run-time assertion if the programmer is sure a value will never be None or Err.

Think about how you might fix this bug.

Conclusionđź”—

So what was the point of all of this? Why write it in Rust if it's both more effort, and less pretty?

"Speed" is the easy answer, but Rust's only about twice as fast here—450ms vs 250ms on my ancient Xeon. It's about six times more memory-efficient too, but I'm not going to get worked up over 60MB vs 10MB. Sometimes—even often—Ruby is good enough.

For me, the most striking difference is the errors I encountered during development. For example:

#<Enumerator:0x000000080782a690>
simplepass.rb:39:in `block in <main>': undefined method `join' for nil:NilClass (NoMethodError)
        from simplepass.rb:36:in `times'
        from simplepass.rb:36:in `<main>'

From this quite straight-forward mistake:

  puts Options.length.times.map do
    dict.sample(random: SecureRandom)
  end.join(Options.separator)

Specifically, Ruby first noticed something was wrong while executing the code. It parsed my arguments, slurped in the file, printed some junk output, and then exploded due to, effectively, a type error.

While I certainly experienced a lot more errors while writing the Rust version, with the exception of that expect() panic (which I expected!), every single one happened before a single line of code was executed. In fact, most were reported in my text editor without me even having to do anything.

While Rust's no panacea against buggy code, it offers a degree of confidence not easily found when writing Ruby, without painstakingly-written test suites that cover every last conditional.