FlashFind
High performance multithreaded Find redux
Following the development of FastFind
, I wanted to make something with a more
pleasing API and better internal design.
The result is interesting, but still in need of work before I’d consider it for production use.
Usage
FlashFind uses the builder pattern to create reusable, chainable iterators:
# Walk /tmp, skipping errors and .git directories, yielding only readable entries
FlashFind.push()
.skip_errors
.prune_directory()
.permit(&:readable?)
.each { p entry }
Each step produces a new FlashFind instance, so you can produce a generic Finder which ignores .git directories and reuse it in different contexts:
IgnoreVcsFind = FlashFind.prune_directory('.git', '.svn', '.hg')
foofind = GitIgnoreFind.push('foo')
barfind = GitIgnoreFind.push('bar')
Yielded entries are a custom DirEntry
type, which wrap both Pathname
and
File::Stat
in a convenient unified interface.
Performance
Because the filtering is all done declaratively, or using blocks which are explicitly meant to be thread-safe, FlashFind both performs lstat calls and walks multiple directories in parallel.
Unfortunately the more complex design comes with some performance cost, so it’s
not necessarily faster than FastFind
:
user system total real
FlashFind(Maildir) 3.406250 5.210938 8.617188 ( 1.645986)
FastFind(Maildir) 3.375000 2.875000 6.250000 ( 3.805505)
Find(Maildir) 5.000000 5.593750 10.593750 ( 10.586446)
FlashFind(CVS) 10.148438 14.812500 24.960938 ( 2.965290)
FastFind(CVS) 6.554688 8.992188 15.546875 ( 2.404545)
Find(CVS) 5.937500 10.031250 15.968750 ( 15.335115)
Like FastFind
, only JRuby sees this sort of improved performance, though
MRI can still benefit a little if you’re going to lstat
the files you find.