Blooming Ruby
Prototypical BitArray and BloomFilter library
Yet another Ruby bloom filter, and a new bit array implementation, including a speedy JRuby-specific version.
This was extracted from an experiment in leaked password list processing, before I moved on to Golomb Compressed Sets, and might be worth extracting into a proper gem at some point.
Synopsis
BitArray
There’s also a bit_array_jruby
which uses java.util.BitSet
under the hood.
# Array of 128 bits
ba = Blooming::BitArray.new(128)
ba.set!(1).set!(2).flip!(1).flip!(2).none? # true
ba.set!(1).set!(2).cardinality # => 2
ba.set?(1) # => true
ba.to_s # => raw bits
ba.each.to_a # => [false, true, true, false, false, ...]
ba2 = BitArray.new(ba.to_s) # a second BitArray identical to the first
ba2.clear! # and now it's empty
BloomFilter
Split into two classes - a flexible calculator for parameters, and the filter itself.
# Implied by above
# Size a filter for 1,000 items with a 0.0001 false-positive rate
bfp = Blooming::BloomFilterParams.new(n: 1000, p: 0.0001)
bf = bf.to_filter
bf.add
bf <<
bf.include? # => true
bf.add? # => false
bf.add? # => true
bf.add? # => false
bf.estimate_count # => 3-ish
bf.count # => alias for above
bf.empty? # => false
bf.clear!
bf.empty? # => true
1000.times { bf << i }
bf.saturated? # => true (adding more items violates desired false-positive rate)
bf.full? # => true (alias)