Range query (data structures)

In data structures, a range query consists of preprocessing some input data into a data structure to efficiently answer any number of queries on any subset of the input. Particularly, there is a group of problems that have been extensively studied where the input is an array of unsorted numbers and a query consists of computing some function, such as the minimum, on a specific range of the array.

Definition

A range query $q_{f}(A,i,j)$ on an array $A=[a_{1},a_{2},..,a_{n}]$ of n elements of some set $S$ , denoted $A[1,n]$ , takes two indices $1\leq i\leq j\leq n$ , a function $f$ defined over arrays of elements of $S$ and outputs $f(A[i,j])=f(a_{i},\ldots ,a_{j})$ .

For example, for $f=\sum$ and $A[1,n]$ an array of numbers, the range query $\sum _{i,j}A$ computes $\sum A[i,j]=(a_{i}+\ldots +a_{j})$ , for any $1\leq i\leq j\leq n$ . These queries may be answered in constant time and using $O(n)$ extra space by calculating the sums of the first $i$ elements of $A$ and storing them into an auxiliary array $B$ , such that $B[i]$ contains the sum of the first $i$ elements of $A$ for every $0\leq i\leq n$ . Therefore, any query might be answered by doing $\sum A[i,j]=B[j]-B[i-1]$ .

This strategy may be extended for every group operator $f$ where the notion of $f^{-1}$ is well defined and easily computable.^[1] Finally, this solution can be extended to two-dimensional arrays with a similar preprocessing.^[2]

Examples

Semigroup operators

A Constructing the corresponding cartesian tree to solve a range minimum query.

Range minimum query reduced to the lowest common ancestor problem.

When the function of interest in a range query is a semigroup operator, the notion of $f^{-1}$ is not always defined, so the strategy in the previous section does not work. Andrew Yao showed^[3] that there exists an efficient solution for range queries that involve semigroup operators. He proved that for any constant $c$ , a preprocessing of time and space $\theta (c\cdot n)$ allows to answer range queries on lists where $f$ is a semigroup operator in $\theta (\alpha _{c}(n))$ time, where $\alpha _{k}$ is a certain functional inverse of the Ackermann function.

There are some semigroup operators that admit slightly better solutions. For instance when $f\in \{\max ,\min \}$ . Assume $f=\min$ then $\min(A[1..n])$ returns the index of the minimum element of $A[1..n]$ . Then $\min _{i,j}(A)$ denotes the corresponding minimum range query. There are several data structures that allow to answer a range minimum query in $O(1)$ time using a preprocessing of time and space $O(n)$ . One such solution is based on the equivalence between this problem and the lowest common ancestor problem.

The cartesian tree $T_{A}$ of an array $A[1,n]$ has as root $a_{i}=\min\{a_{1},a_{2},\ldots ,a_{n}\}$ and as left and right subtrees the cartesian tree of $A[1,i-1]$ and the cartesian tree of $A[i+1,n]$ respectively. A range minimum query $\min _{i,j}(A)$ is the lowest common ancestor in $T_{A}$ of $a_{i}$ and $a_{j}$ . Because the lowest common ancestor can be solved in constant time using a preprocessing of time and space $O(n)$ , range minimum query can as well. The solution when $f=\max$ is analogous. Cartesian trees can be constructed in linear time.

Mode

The mode of an array A is the element that appears the most in A. For instance the mode of $A=[4,5,6,7,4]$ is 7000400000000000000♠4. In case of ties any of the most frequent elements might be picked as mode. A range mode query consists in preprocessing $A[1,n]$ such that we can find the mode in any range of $A[1,n]$ . Several data structures have been devised to solve this problem, we summarize some of the results in the following table.^[1]

Range Mode Queries
Space	Query Time	Restrictions
$O(n^{{2-2\epsilon }})$	$O(n^{\epsilon }\log n)$	$0\leq \epsilon \leq {\frac {1}{2}}$
$O\left({\frac {n^{2}\log \log n}{\log n}}\right)$	$O(1)$

Recently Jørgensen et al. proved a lower bound on the cell-probe model of $\Omega \left({\tfrac {\log n}{\log(Sw/n)}}\right)$ for any data structure that uses $S$ cells.^[4]

Median

This particular case is of special interest since finding the median has several applications.^[5] On the other hand, the median problem, a special case of the selection problem, is solvable in O(n), using the median of medians algorithm.^[6] However its generalization through range median queries is recent.^[7] A range median query $\operatorname {median} (A,i,j)$ where A,i and j have the usual meanings returns the median element of $A[i,j]$ . Equivalently, $\operatorname {median} (A,i,j)$ should return the element of $A[i,j]$ of rank ${\frac {j-i}{2}}$ . Range median queries cannot be solved by following any of the previous methods discussed above including Yao's approach for semigroup operators.^[8]

There have been studied two variants of this problem, the offline version, where all the k queries of interest are given in a batch, and a version where all the preprocessing is done up front. The offline version can be solved with $O(n\log k+k\log n)$ time and $O(n\log k)$ space.

The following pseudo code shows how to find the element of rank $r$ in $A[i,j]$ an unsorted array of distinct elements, to find the range medians we set $r={\frac {j-i}{2}}$ .^[7]

rangeMedian(A,i,j,r){

  if A.length() == 1 return A[1]

  if A.low is undefined then
    m = median(A)
    A.low  = [e in A | e <= m]
    A.high = [e in A | e > m ]

  calculate t the number of elements of A[i,j] that belong to A.low

  if r <= t return rangeMedian(A.low, i,j,r)
  else return rangeMedian(A.high, i,j, r-t)
}

Procedure rangeMedian partitions A , using A 's median, into two arrays A.low and A.high , where the former contains the elements of A that are less than or equal to the median m and the latter the rest of the elements of A . If we know that the number of elements of $A[i,j]$ that end up in A.low is t and this number is bigger than r then we should keep looking for the element of rank r in A.low ; otherwise we should look for the element of rank $(r-t)$ in A.high . To find $t$ , it is enough to find the maximum index $m\leq i-1$ such that $a_{m}$ is in A.low and the maximum index $l\leq j$ such that $a_{l}$ is in A.high . Then $t=l-m$ . The total cost for any query, without considering the partitioning part, is $\log n$ since at most $\log n$ recursion calls are done and only a constant number of operations are performed in each of them (to get the value of $t$ fractional cascading should be used). If a linear algorithm to find the medians is used, the total cost of preprocessing for $k$ range median queries is $n\log k$ . The algorithm can also be modified to solve the online version of the problem.^[7]

References

1 2 Krizanc, Danny; Morin, Pat; Smid, Michiel H. M. (2003). "Range Mode and Range Median Queries on Lists and Trees". ISAAC: 517–526.
↑ Meng, He; Munro, J. Ian; Nicholson, Patrick K. (2011). "Dynamic Range Selection in Linear Space". ISAAC: 160–169.
↑ Yao, A. C (1982). "Space-Time Tradeoff for Answering Range Queries". e 14th Annual ACM Symposium on the Theory of Computing: 128–136.
↑ Greve, M; J{\o}rgensen, A.; Larsen, K.; Truelsen, J. (2010). "Cell probe lower bounds and approximations for range mode". Automata, Languages and Programming: 605–616.
↑ Har-Peled, Sariel; Muthukrishnan, S. (2008). "Range Medians". ESA: 503–514.
↑ Blum, M.; Floyd, R. W.; Pratt, V. R.; Rivest, R. L.; Tarjan, R. E. (August 1973). "Time bounds for selection" (PDF). Journal of Computer and System Sciences. 7 (4): 448–461. doi:10.1016/S0022-0000(73)80033-9.
1 2 3 Beat, Gfeller; Sanders, Peter (2009). "Towards Optimal Range Medians". ICALP (1): 475–486.
1 2 Bose, P; Kranakis, E.; Morin, P.; Tang, Y. (2005). "Approximate range mode and range median queries". In Proceedings of the 22nd Symposium on Theoretical Aspects of Computer Science (STACS 2005), volume 3404 of Lecture Notes in ComputerScience: 377–388.

External links

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[morin-1] 1 2 Krizanc, Danny; Morin, Pat; Smid, Michiel H. M. (2003). "Range Mode and Range Median Queries on Lists and Trees". ISAAC: 517–526.

[menhe-2] Meng, He; Munro, J. Ian; Nicholson, Patrick K. (2011). "Dynamic Range Selection in Linear Space". ISAAC: 160–169.

[yao-3] Yao, A. C (1982). "Space-Time Tradeoff for Answering Range Queries". e 14th Annual ACM Symposium on the Theory of Computing: 128–136.

[jorgensen-4] Greve, M; J{\o}rgensen, A.; Larsen, K.; Truelsen, J. (2010). "Cell probe lower bounds and approximations for range mode". Automata, Languages and Programming: 605–616.

[heriel-5] Har-Peled, Sariel; Muthukrishnan, S. (2008). "Range Medians". ESA: 503–514.

[tarjanmedian-6] Blum, M.; Floyd, R. W.; Pratt, V. R.; Rivest, R. L.; Tarjan, R. E. (August 1973). "Time bounds for selection" (PDF). Journal of Computer and System Sciences. 7 (4): 448–461. doi:10.1016/S0022-0000(73)80033-9.

[ethpaper-7] 1 2 3 Beat, Gfeller; Sanders, Peter (2009). "Towards Optimal Range Medians". ICALP (1): 475–486.

[morin_kranakis-8] 1 2 Bose, P; Kranakis, E.; Morin, P.; Tang, Y. (2005). "Approximate range mode and range median queries". In Proceedings of the 22nd Symposium on Theoretical Aspects of Computer Science (STACS 2005), volume 3404 of Lecture Notes in ComputerScience: 377–388.

Tree data structures
Search trees (dynamic sets/associative arrays)	2–3 2–3–4 AA (a,b) AVL B B+ B* B^x (Optimal) Binary search Dancing HTree Interval Order statistic (Left-leaning) Red-black Scapegoat Splay T Treap UB Weight-balanced
Heaps	Binary Binomial Brodal Fibonacci Leftist Pairing Skew Van Emde Boas Weak
Tries	Ctrie C-trie (compressed ADT) Hash Radix Suffix Ternary search X-fast Y-fast
Spatial data partitioning trees	Ball BK BSP Cartesian Hilbert R k-d (implicit k-d) M Metric MVP Octree Priority R Quad R R+ R* Segment VP X
Other trees	Cover Exponential Fenwick Finger Fractal tree index Fusion Hash calendar iDistance K-ary Left-child right-sibling Link/cut Log-structured merge Merkle PQ Range SPQR Top