hur.st's bl.aagh

BSD, Ruby, Rust, Rambling

FlashFind

High performance multithreaded Find redux

[ruby]

Following the development of FastFind, I wanted to make something with a more pleasing API and better internal design.

The result is interesting, but still in need of work before I’d consider it for production use.

Usage

FlashFind uses the builder pattern to create reusable, chainable iterators:

# Walk /tmp, skipping errors and .git directories, yielding only readable entries
FlashFind.push('/tmp')
         .skip_errors
         .prune_directory('.git')
         .permit(&:readable?)
         .each { |entry| p entry }

Each step produces a new FlashFind instance, so you can produce a generic Finder which ignores .git directories and reuse it in different contexts:

IgnoreVcsFind = FlashFind.prune_directory('.git', '.svn', '.hg')
foofind = GitIgnoreFind.push('foo')
barfind = GitIgnoreFind.push('bar')

Yielded entries are a custom DirEntry type, which wrap both Pathname and File::Stat in a convenient unified interface.

Performance

Because the filtering is all done declaratively, or using blocks which are explicitly meant to be thread-safe, FlashFind both performs lstat calls and walks multiple directories in parallel.

Unfortunately the more complex design comes with some performance cost, so it’s not necessarily faster than FastFind:

                               user     system      total        real
FlashFind(Maildir)         3.406250   5.210938   8.617188 (  1.645986)
FastFind(Maildir)          3.375000   2.875000   6.250000 (  3.805505)
Find(Maildir)              5.000000   5.593750  10.593750 ( 10.586446)

FlashFind(CVS)            10.148438  14.812500  24.960938 (  2.965290)
FastFind(CVS)              6.554688   8.992188  15.546875 (  2.404545)
Find(CVS)                  5.937500  10.031250  15.968750 ( 15.335115)

Like FastFind, only JRuby sees this sort of improved performance, though MRI can still benefit a little if you’re going to lstat the files you find.