LeakWatch

Example: Bloom Filters

A Bloom filter is a probabilistic data structure that loosely mimics the behaviour of a set: it is possible to add elements to a Bloom filter, and to test elements for membership. Bloom filters use hash functions to record membership of added elements, so collisions are possible; this means that a test for an element's membership may return a false positive (but not a false negative). If an attacker is able to test the Bloom filter for element membership, its internal state therefore leaks some information about the elements that have been added, although the precise amount of information leaked will depend on the number of elements added.

In this example, a preselected number of integers between 0 and 7 are added to a Bloom filter with 24 bits of internal state and two defined hash functions. As the number of integers added to the Bloom filter increases, the probability of collisions occurring also increases; this means that if an attacker were to test the Bloom filter for membership of a particular integer, the attacker's certainty that the integer was actually added to the Bloom filter (rather than simply appearing to have been added, as the side-effect of a collision) would decrease. This is reflected in the table below, which shows the average amount of information about each added integer that leaks from the Bloom filter.

Number of integers added Corrected mutual information (bits) Leakage per integer added (bits)
1 2.81 2.8
2 5.08 2.5
3 6.69 2.2
4 6.71 1.7
5 6.89 1.4
6 6.93 1.2

The Java implementation of a Bloom filter in this example is taken from Javamex, showing how existing code can easily be adapted to work with LeakWatch.