(Guest post by Steve Uurtamo)
I presented a few algorithms for the detection of superspreaders, where superspreaders are defined to be sources in a networking data stream that contact many unique destinations. The paper was New algorithms for fast detection of superspreaders, by S. Venkataraman, D. Song, P. Gibbons, and A. Blum.
The basic algorithm is to store a uniformly chosen fraction of the pairs, using a hash function to make sure that we don’t count any
pair more than once. Assuming uniformity of the hash function, if we want to detect a
-superspreader, holding onto
of the resulting datastream with
elements should be sufficient, where, if superspreaders are as rare and as chatty as expected, the storage will be modest.
Modifications of this algorithm were presented for the sliding window model, where we are continuously monitoring a datastream over a fixed length window, and for parallel monitoring, where we need to combine the information in several different datastreams to detect superspreaders.
Experimental results with synthetic data showed that the algorithms in the paper were much more space-efficient than simply combining existing approximate counting and distinct counting stream algorithms together.