Added nstar and nstar_intersection functions as well as tests.#18
Added nstar and nstar_intersection functions as well as tests.#18adewes wants to merge 1 commit intojaybaird:masterfrom
Conversation
|
btw @jaybaird thanks for this project, it's really useful :) |
|
Andreas, Maybe I'm misunderstanding what this does, but why would you use this over using len? Or in the case of the intersection, doing the intersection and taking the len() there? I do see now actually that we don't update the count when we union/intersect filters, which is a bug and might better be solved by this. Can you point me to the relevant section in the Wikipedia article you're referencing so I can catch up? Thanks! |
|
Nevermind, I found it :) |
|
Hey @jaybaird, the purpose of |
|
@jaybaird is there anything you need from my side to merge this PR or come to a decision on whether to integrate this into the code base? |
|
I plan on taking a look at this this weekend. I'm still not 100% sure it's utility and there's a larger bug that I realized needs to be fixed but this may be the solution for. I'll keep you posted. |
|
hey @jaybaird , still planning to merge this? |
I've implemented two functions
nstarandnstar_intersectionthat estimate the number of elements in a Bloom filter as well as in an intersection of two Bloom filters (a separate function is required for the latter case since usingnstar()on the intersection directly would yield a wrong result). This is very useful in many circumstances, e.g. when performing similarity searches on hierarchical data structures.The implementation follows the Wikipedia article:
http://en.wikipedia.org/wiki/Bloom_filter
I've added tests and a short explanation as well, please let me know if you need anything else in order to merge this.