eng
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Leibniz International Proceedings in Informatics
1868-8969
2019-03-12
24:1
24:16
10.4230/LIPIcs.STACS.2019.24
article
Constant-Time Retrieval with O(log m) Extra Bits
Dietzfelbinger, Martin
1
https://orcid.org/0000-0001-5484-3474
Walzer, Stefan
1
https://orcid.org/0000-0002-6477-0106
Technische Universität Ilmenau, Germany
For a set U (the universe), retrieval is the following problem. Given a finite subset S subseteq U of size m and f : S -> {0,1}^r for a small constant r, build a data structure D_f with the property that for a suitable query algorithm query we have query(D_f,x) = f(x) for all x in S. For x in U setminus S the value query(D_f,x) is arbitrary in {0,1}^r. The number of bits needed for D_f should be (1+epsilon)r m with overhead epsilon = epsilon(m) >= 0 as small as possible, while the query time should be small. Of course, the time for constructing D_f is relevant as well.
We assume fully random hash functions on U with constant evaluation time are available. It is known that with epsilon ~= 0.09 one can achieve linear construction time and constant query time, and with overhead epsilon_k ~= e^{-k} it is possible to have O(k) query time and O(m^{1+alpha}) construction time, for arbitrary alpha>0. Furthermore, a theoretical construction with epsilon =O((log log m)/sqrt{log m}) gives constant query time and linear construction time. Known constructions avoiding all overhead, except for a seed value of size O(log log m), require logarithmic query time.
In this paper, we present a method for treating the retrieval problem with overhead epsilon = O((log m)/m), which corresponds to O(1) extra memory words (O(log m) bits), and an extremely simple, constant-time query operation. The price to pay is a construction time of O(m^2). We employ the usual framework for retrieval data structures, where construction is effected by solving a sparse linear system of equations over the 2-element field F_2 and a query is effected by a dot product calculation. Our main technical contribution is the design and analysis of a new and natural family of sparse random linear systems with m equations and (1+epsilon)m variables, which combines good locality properties with high probability of having full rank.
Paying a larger overhead of epsilon = O((log m)/m^alpha), the construction time can be reduced to O(m^{1+alpha}) for arbitrary constant 0 < alpha < 1. In combination with an adaptation of known techniques for solving sparse linear systems of equations, our approach leads to a highly practical algorithm for retrieval. In a particular benchmark with m = 10^7 we achieve an order-of-magnitude improvement over previous techniques with epsilon = 0.24% instead of the previously best result of epsilon ~= 3%, with better query time and no significant sacrifices in construction time.
https://drops.dagstuhl.de/storage/00lipics/lipics-vol126-stacs2019/LIPIcs.STACS.2019.24/LIPIcs.STACS.2019.24.pdf
Retrieval
Hashing
Succinct Data Structure
Randomised Data Structure
Structured Gaussian Elimination
Method of Four Russians