's bl.aagh

BSD, Ruby, Rust, Rambling


High performance multithreaded Find


FastFind is a drop-in replacement for the standard library Find package.

I wrote this for FreshBSD, to speed up scanning CVS repositories - walking 300,000 files takes a while, especially with an API that basically forces you to File.stat twice for each one.

FastFind manages a pool of worker threads, allowing multiple lstat calls to be in flight at any one time, at least on JRuby. This leads to significantly improved performance, particularly on SSD’s:

               user     system      total        real
Find      32.890625  27.742188  60.632813 ( 47.518944)
FastFind  35.273438  41.742188  77.015625 (  8.140893)

API is basically identical to Find, but it does support a second argument to the block which if provided will be yielded a copy of the File::Stat used to determine if an entry is a file or directory.

MRI is sadly yet to benefit significantly from this library, with the only real win being this latter feature:

               user     system      total        real
Find      10.187500  22.351562  32.539062 ( 32.545201)
FastFind   9.039062  14.226562  23.265625 ( 23.277589)

Ho hum.