Better randomized algorithms for k-server [under construction]

This blog post discusses the 2017 paper by Bubeck et al on k-server using multiscale entropic regularization.

The Problem

$k$ -server is a prototypical problem to study in online algorithms. Given a metric space $(X,d)$ , and requests (that correspond to points in the metric space) that arrive in an online fashion, the goal is to maintain a configuration of $k$ servers at points in $X$ such that:

a server is moved to the request point at the end of every iteration, and
the sum over all iterations of distances between consecutive configurations of servers is minimized. This is the total cost of “realizing” the algorithm.

Observe that the paging problem is a special instance of the $k$ -server problem when the metric space is the uniform metric.

Some prior results

The $k$ -server problem was introduced in Manasse et al, and the authors conjectured a deterministic $k$ -competitive algorithm for all metric spaces. This bound must be essentially tight for deterministic algorithms because the closely related “paging problem” has a deterministic lower bound of $k$ . This lower bound is simple to see because an adverserial request sequence can maintain a set of $k+1$ pages that are requested in a round-robin fashion such that the request at every iteration is the page not present in the cache of the deterministic algorithm. The optimal solution would only have a page fault once every $k$ iterations if it swapped out the correct page.

Best known deterministic algorithm for $k$ -server

In general metric spaces, the best known result is due to Koutsoupias and Papadimitriou who presented a $2k-1$ competitive algorithm which they call as the “work function” algorithm. Given some initial configuration $A_0$ and a set of queries $\rho$ the work function $w_{\rho} (C)$ for some configuration $C$ is the cost associated with optimal set of moves starting from $A_0$ that end at $C$ such that the requests made at every iteration are respected. The work function for all configurations for some set of requests $\rho = \{ r_1,\dots, r_n \}$ can be computed using dynamic programming as: \begin{equation} w_{ {r_1,\dots,r_n} } (C) = \min_{x \in C} \left[ w_{{r_1,\dots,r_{n-1}}} (C \setminus x \cup \{r_n\}) + d(x,r_n) \right] \end{equation} If at iteration $n-1$ , the server configuration is $S$ , in iteration $n$ , the work function algorithm picks the configuration $S'$ that minimizes $w_{\{r_1,\dots,r_n\}} (S') + D (S,S')$ where $D$ is the distance between configurations in the metric $d$ . Intuitively, this decision takes into equal weightage the historical goodness of the configuration $S'$ (the first term) and the cost of moving the servers to $S'$ (the second term). This update rule can be computed in polynomial time because one need not check all possible configurations, it is sufficient to check those that move exactly one server to the request point. The competitiveness of the algorithm relies on the formulation of an appropriate potential function. As it would deviate from the scope of this article, further details of the proof are left out.

Of course one can hope to improve the guarantees if randomization is allowed. Once again, from the paging problem, there is a lower bound of $\Omega (\log k)$ . This result follows by constructing a “pursuer-evader” game. Consider the case where there are $k+1$ pages, and at every iteration the requested page is chosen uniformly randomly. The probability that there is a page fault is $\frac{1}{k+1}$ . Therefore, \begin{equation} \label{eq:001} \text{Expected number of faults in } t \text{ iterations} = \frac{t}{k+1} \end{equation} The coupon collector problem tells us that it would take $(k+1) \log (k+1)$ (upto additive constants) iterations on average for all pages to be requested. The offline solution would simply pick the last requested page to keep uncached, and therefore would see a cost of $1$ every $(k+1) \log (k+1)$ iterations. On the other hand the online optimal solution would incur a cost of $\frac{(k+1) \log (k+1)}{k+1} = \log (k+1)$ as seen from equation (\ref{eq:001}). This proves the lower bound.

Hierarchically Separated Trees

HST’s are classes of metric spaces on which one can once again prove lower bounds on the competitive ratio for $k$ -server. HST’s form an important class of metric spaces due to the breakthrough result by Bartal et al [¹] on probabilistic embedding of finite metric spaces into HST’s. Their result explicitly states that for any finite metric space $X$ , one can construct a set of HST metric spaces, $\mathcal{S}_{HST}$ and a probability distribution over this set $P_{HST}$ such that the expected distance between any two pairs of points suffers at most polylogarithmic distortion with respect to their true distance in $X$ . Observe that this immediately implies that any good (randomized) algorithm $\mathrm{Alg}_1$ for $k$ -server on HST’s would give good (randomized) algorithms for $k$ -server in general metric spaces. A simple strategy would be to sample a HST from $\mathcal{S}_{HST}$ as per $P_{HST}$ , apply $\mathrm{Alg}_1$ on this HST, and perform the resultant action in $X$ . Such an algorithm would have competitive ratio $\mathcal{O} ( \mathrm{poly} (\log |X| ) ) \sigma_{\mathrm{Alg}_1}$ where $\sigma_{\mathrm{Alg}_1}$ is the competitive ratio of $\mathrm{Alg}_1$ because, \begin{align} \mathcal{O} ( \mathrm{poly} (\log |X| ) ) \mathrm{Cost}_{opt}^{X} &\ge \mathbb{E}_{Y \sim P_{HST}} \mathrm{Cost}_{opt}^{Y} \qquad [\text{By Bartal’s embedding}] \nonumber \\\ &\overset{(i)}{\ge} \mathbb{E}_{Y \sim P_{HST}} \frac{1}{\sigma_{\mathrm{Alg}_1}} \mathbb{E}_{\mathrm{Alg}_1} \mathrm{Cost}_{\mathrm{Alg}_1}^{Y} \nonumber \\\ &\ge \frac{1}{\sigma_{\mathrm{Alg}_1}} \mathbb{E} \mathrm{Cost}_{\mathrm{Alg}_1}^{X} \qquad [\text{Again, by Bartal’s embedding}] \end{align} $(i)$ follows from the competitiveness of $\mathrm{Alg}_1$ .

The main contributions of Bubeck et al are a $\mathcal{O} ((\log k)^2)$ -competitive algorithm for the $k$ -server problem on HST’s and a $\mathcal{O} ((\log k)^3 \log (1 + \Delta))$ -competitive algorithm for general metric $k$ -server, where $\Delta$ is the ratio of maximum to minimum non-zero distance between points in the metric space. These algorithms are conveyed through the framework of mirror descent. For an intuitive picture of $k$ -server, check out my previous post.

Some differential geometry

In order to understand the arguments being made by Bubeck et al, some context in differential geometry required. Being an extremely vast and technical field, and with only a tiny bit of knowledge myself, I will avoid shooting myself in the foot by trying to be more comprehensive or precise than what is to follow.

Manifolds are spaces for which at every point, one can construct a neighborhood that can be continuously be deformed (formally a homeomorphism exists) to give a Euclidean space. Sometimes, manifolds come with the additional structure of differentiability - if one zooms in enough, the space looks Euclidean! Differentiable manifolds permit the definition of a unique “tangent space” at each point in the manifold. Given a differentiable manifold, a vector field maps each point in the manifold to a unique vector from its tangent space. A vector field is informally said to be “smooth” if small perturbations to a point on the manifold do not induce large perturbations to the corresponding tangent vector.

A Riemannian manifold is a manifold with an additional structure that allows distances to be computed in a certain way.

problem setting

Figure 1: Visualization on the line

The general form of the Riemannian manifold tells you how to compute distances precisely at each point on the manifold using a smooth local inner product, $\langle , \cdot , \rangle_x$ defined on the tangent space. Intuitively, smoothness means that the angle between two vectors in the tangent space of $x$ , and the angle between two slightly perturbed versions in the tangent space of some nearby point shouldn’t differ by much: \begin{align} &x \mapsto \langle F^1_x , F_x^2 \rangle_x \quad \text{is smooth},\nonumber
&\text{where } F^1_x \text{ and } F^2_x \text{ are smooth vector fields} \end{align}

A very simple visualization in on the line is that of projecting a hilly surface onto flat ground (Figure 1). The physical distance between two points on the line is computed as the path length along the surface of the hills. But, in higher than one dimension, this visualization is not adequate - in particular because the path between two points is not uniquely defined. The distance in this case is specified by the curve (geodesic) of shortest length, where the length is added up as per the smooth inner product at each point.

### The Bregman divergence can be thought of as inducing a Riemannian structure on the function’s domain. Recall that mirror descent can be put into the framework of

Y. Bartal.: Probabilistic Approximation of Metric Spaces and its Algorithmic Applications. In Proc. of the 37th Ann. IEEE Symp. on Foundations of Computer Science, pages 184-193, October 1996. ↩

Written on October 1, 2018