Posted by: atri | May 6, 2008

April 25th talk

(Guest post by Denis Mindolin)

In my presentation, I showed the results from the following paper: Stabbing the Sky: Efficient Skyline Computation over Sliding Windows, ICDE ‘05.

The paper deals with the problem of computing skylines over data streams. Given a data set S of elements with d attributes (A_1,...A_d), a skyline of S is a set of all undominated elements of S. An element X dominates another element Y if for all i, X_{A_i} \leq Y_{A_i}, and for at least one i, X_{A_i} < Y_{A_i}. This problem is a topic of active research for the past thirty years. However, the paper I presented is one of the first ones considering skylines in the data stream framework.

The stream model considered in the paper is append only. A stream element is a vector having d components each of which is a number. It also assumed that there’s a sliding window of the most recent N elements, and the queries issued are

  1. Compute the skyline of the most recent n elements (n \leq N), and
  2. Compute the skyline of the elements between the most recent n_1 and n_2 (n_1 \leq n_2 \leq N).

To answer queries of the first type (n-of-N), the authors propose to keep in main memory a set of nonredundant elements R_N. The size of R_N is O(log^d N) for some restricted streams. In order to avoid the quadratic size of the dominance graph over R_N, it is proposed to keep at most two dominance relationships per element in R_N which are enough to answer n-of-N queries. This makes the dominance graph size linear in the size of R_N. To answer skyline queries in O(log |R_N|) time, the authors propose to store the graph as a set of intervals (O(|R_N|) space) which reduces the skyline problem to the already solved problem of answering stabbing queries.

The second query model ( (n_1,n_2)-of-N ) is handled in a similar way with an exception that here the entire set of the most recent N elements should be kept in main memory. As a result, the space requirement here is higher than in the first approach.

Leave a response

Your response:

Categories