In today’s lecture I presented another algorithm for estimating frequent items from [CCF02]. The main line of thought hopefully is clear. In Atri’s lecture earlier and two of my lectures, we made use of a very pervasively useful concept called pairwise independent hash functions. Since we will need this concept and its generalized version k-wise independent hash functions in a later lecture, let me briefly describe what they are.
Let be a set of
“keys” and
. A family
of functions from
to
is called a family of (
-wise independent)
-universal family if, for any set of
keys
we have
Prob
where the probability is taken over uniform choices of from
. And, the family is strongly
-universal if for any set of
keys
and any values
we have
Prob
In an earlier lecture we have used a -universal family. The family of all functions from
to
certainly fits the bill, but picking a random function from this huge family requires
bits, which is too many for our purpose. The family we used only need
random bits.
In the next two weeks, I will present several papers on estimating and some statistics on (multi)graphs. Our stating points will be the following two papers:
- Bar-Yossef, T. S. Jayram, Ravi Kumar, D. Sivakumar, and Luca Trevisan. Counting distinct elements in a data stream. In Proc. RANDOM 2002
- G. Cormode, S. Muthukrishan, Space Efficient Mining of Multigraph Streams. PODS 2005.