\documentclass[11pt]{article}
\usepackage{amsmath,amssymb,amsthm}
\usepackage{xcolor}
\DeclareMathOperator*{\E}{\mathbb{E}}
\let\Pr\relax
\DeclareMathOperator*{\Pr}{\mathbb{P}}
\DeclareMathOperator*{\rk}{rank}
\DeclareMathOperator*{\poly}{poly}
\renewcommand{\th}{^\text{th}}
\newcommand{\eps}{\epsilon}
\newcommand{\inprod}[1]{\left\langle #1 \right\rangle}
\newcommand{\R}{\mathbb{R}}
\newcommand{\handout}[5]{
\noindent
\begin{center}
\framebox{
\vbox{
\hbox to 5.78in { {\bf CS 388R: Randomized Algorithms } \hfill #2 }
\vspace{4mm}
\hbox to 5.78in { {\Large \hfill #5 \hfill} }
\vspace{2mm}
\hbox to 5.78in { {\em #3 \hfill #4} }
\textcolor{red}{\textbf{NOTE:} THESE NOTES HAVE NOT BEEN EDITED OR CHECKED FOR CORRECTNESS}
}
}
\end{center}
\vspace*{4mm}
}
\newcommand{\lecture}[4]{\handout{#1}{#2}{#3}{Scribes: #4}{Lecture #1}}
\newtheorem{theorem}{Theorem}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{observation}[theorem]{Observation}
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{claim}[theorem]{Claim}
\newtheorem{fact}[theorem]{Fact}
\newtheorem{assumption}[theorem]{Assumption}
% 1-inch margins, from fullpage.sty by H.Partl, Version 2, Dec. 15, 1988.
\topmargin 0pt
\advance \topmargin by -\headheight
\advance \topmargin by -\headsep
\textheight 8.9in
\oddsidemargin 0pt
\evensidemargin \oddsidemargin
\marginparwidth 0.5in
\textwidth 6.5in
\linespread{1.1}
\parindent 0in
\parskip 1.5ex
\begin{document}
\lecture{14 --- October 19, 2017}{Fall 2017}{Prof.\ Eric Price}{Niels Kornerup, Aravind Gollakota}
\section{Overview}
In this lecture we finish our discussion of sampling and look at an efficient sampling-based algorithm for median-finding.
\section{Sampling}
Consider estimating the volume of a $d$-dimensional polytope given a way to tell if a point is inside it or not (i.e.\ a separation oracle).
\begin{itemize}
\item Let $p$ = probability a random point is in the polytope.
\item Algorithm: Try $n$ random points, see what fraction of them are in the polytope.
\item Estimate $\hat{p} = \frac{1}{n} \sum_{i=1}^{n} Z_i$, where $Z_i = 1$ if the $i\th$ point is in the polytope.
\end{itemize}
Question: how many samples do we need to get $\hat{p} \in (1 \pm \epsilon)p$?
\begin{itemize}
\item We get that $\Pr[\sum Z_i - \E[\sum Z_i] \geq \epsilon \cdot \E[\sum Z_i]] \leq 2e^{\frac{-\epsilon^2}{3} \sum Z_i}$ by a Chernoff bound, as the $Z_i$ are independent.
\item This is equivalent to saying that $\Pr[|\hat{p} - p| \geq \epsilon p] \leq 2e^{\frac{-\epsilon^2np}{3}}$
\item Which then implies that we need $n=\frac{3}{p \epsilon^2} \times \log(\frac{2}{\delta})$ samples to get a $1-\delta$ probability of $\hat{p} \in (1\pm \epsilon)p$.
\item But, this requires that we already know $p$...
\end{itemize}
What we really need is to make sure that $n \geq \frac{3}{p \epsilon^2} \log(\frac{2}{\delta})$, so we sample until we get $T$ hits.
\begin{itemize}
\item This will happen if $\sum_{i=1}^{n} Z_i \geq \frac{10}{\epsilon^2} \log(\frac{2}{\delta}) = T$
\item Want $\hat{n} = (1\pm \epsilon) T/p$
\item For some $n'$, $\Pr[\sum_{i=1}^{n'} Z_i \not \in (1\pm \epsilon)n'p] \leq e^{-\frac{\epsilon^2}{3}n'}$
\item Let us consider $n' = \frac{T}{1+\epsilon}$
\begin{itemize}
\item $\Pr[\sum_{i=1}^{n'} Z_i \geq (1 + \epsilon) n'p = Tp] \leq e^{-\frac{\epsilon^2}{3}n'p} = e^{\frac{-\epsilon^2 Tp}{1+\epsilon}} \leq e^{-\epsilon^2\frac{T}{4}}$
\item which is $\leq \delta/2$ if $T\geq \frac{4}{\epsilon^2} \log(\frac{2}{\delta})$
\end{itemize}
\item Now, consider $n' = \frac{T}{1-\epsilon}$
\begin{itemize}
\item $\Pr[\sum_{i=1}^{n'} Z_i \leq (1 - \epsilon) n'p = Tp] \leq e^{-\frac{\epsilon^2}{2}n'p} \leq e^{-\epsilon^2\frac{T}{2}}$
\item which is $\leq \delta/2$ if $T\geq \frac{2}{\epsilon^2} \log(\frac{2}{\delta})$
\end{itemize}
\item Thus, we get that if $T \geq \frac{4}{\epsilon^2} \log(\frac{2}{\delta})$ then with probability $1-\delta$ we get that $\frac{T}{(1+\epsilon)p} \leq \hat{n} \leq \frac{T}{(1-\epsilon)p}$, where $\hat{n}$ is the number of samples used
\item This implies that $\frac{T}{\hat{n}} \in (1\pm \epsilon)p$
\end{itemize}
To summarize, we've just shown that $T = O(\frac{1}{\eps^2} \log (\frac{1}{\delta}))$ coin flips suffice to approximate $p$ to within $\eps$ with probability $\geq 1 - \delta$.
\section{Median-finding}
Given an array of numbers, our task is to find the median. Here are some approaches that first come to mind: \begin{description}
\item[Quickselect.] This is a Quicksort derivative, and works as follows. Pick a random pivot, and separate the elements into those bigger and those smaller than the pivot. The median must lie in the bigger of the two segments, so recursively search within this segment. (Technically the median has a different rank within this segment, so we keep track of it; the implementation is really a \textsc{FindElementOfRank}($i$) rather than \textsc{FindMedian}.\footnote{This applies to the rest of this lecture as well: our median-finding algorithms are really \emph{selection} algorithms.})
A fairly standard Quicksort-like analysis reveals that the expected number of operations in this algorithm is $O(n)$. What about high probability bounds? Unfortunately because we pick our pivot uniformly at random, it's not good. Specifically, the probability of our first $k$ choices all lying in the first $1/k$ fraction of the array is at least $1/k^k$. When this happens, after these $k$ steps our array has size $n(1 - 1/k)^k \approx n/e$. Thus there's a $1/k^k$ chance of taking $\Omega(kn)$ time (for all $k$), i.e.\ a $1/n$ chance of $\Omega(\frac{n \log n}{\log \log n})$ time. This is almost as bad as sorting, and tells us that high probability bounds definitely do not hold.
\item[Deterministic median-of-medians.] There is actually a deterministic algorithm that works. We divide our array into chunks of say 5, take the medians of each, and then take the median of these medians. We then use this as a pivot, again separating elements into two piles and recursing down the bigger one. It can be shown that this always only uses $O(n)$ comparisons. But from a practical point of view, the algorithm is not too simple to implement and also suffers from bad constants.
\end{description}
So we discard these approaches and frame as our goal an algorithm that is simple, randomized, has good high probability bounds, and has small constants.
Let's try to use sampling. Here is a first attempt at an algorithm: \begin{itemize}
\item Choose a sample $S$ of size $s$ ($s \ll n/\log n$)
\item Directly sort and take the median of $S$ (call it $m$), and use it as pivot
\item Split the elements relative to the pivot, and recursively search for the median on the appropriate side
\end{itemize}
If $\ell$ is the number of elements on the median's side, then the running time for this algorithm may be expressed recursively as \[ T(n) = s \log s + n + T(\ell), \] where the $s\log s$ comes from sorting $S$ and the $n$ from splitting the elements relative to the pivot. We can try to sample such that $m$ is close to the real median, i.e.\ that $\rk(m) = (1 \pm \eps)n/2$ with high probability ($1 - 1/\poly(n)$---although note that $n$ drops quickly as we recurse down, and this needs to dealt with carefully). Here we use $\rk(m)$ to denote $m$'s position in the sorted array. If we could do this at all steps of the recursion, then we'd ensure that $\ell \leq (1+\eps)n/2$ always and so \begin{align*}
T(n) &\leq n + T(\frac{1+\eps}{2}n) \\
&= \frac{n}{1 - (1+\eps)/2} = \frac{2n}{1-\eps}.
\end{align*} This isn't bad, and we could set about formalizing this. But instead we will do something better, something that is both simpler and has a better constant than 2.
The key idea is that instead of recursing down the entirety of one half, we can actually try to use more fine-grained information about our sample's median $m$: if it is really within $(1\pm \eps)$ of the actual median, then we should really only be looking at $\eps n/2$ elements ``on either side'' of $m$. Formally, we will let $L$ be the $(1/2 - \eps)s\th$ largest element in $S$, and $H$ the $(1/2 + \eps)s\th$ largest. If $\rk(L) \leq n/2$ and $\rk(R) \geq n/2$, then the median will lie in $[L, H]$. We will see that we can make this happen with high probability. Notice that this idea is similar to our analysis of sampling.
Let's see what we can say about $\rk(m)$. In fact, more generally consider $\rk(x)$, where $x$ is the element of rank $\beta s$ in $S$ ($m$ corresponds to $\beta = 1/2$). We know that $\rk(x) \leq k$ occurs iff at least $\beta s$ elements in $S$ have rank $\leq k$. The natural thing is to use Chernoff to bound this probability. Let $X_i = 1$ if the $i\th$ sample has rank $\leq k$, so that $\E[X_i] = k/n$. Then \begin{align*}
\Pr[\rk(x) \leq k] &= \Pr[\sum_{i=1}^s X_i \geq \beta s] \\
&= \Pr \Big[ \sum_{i=1}^s X_i \geq \E[\sum_{i=1}^s X_i] + (\beta - \frac{k}{n})s \Big] \\
&\leq \exp(\frac{-2((\beta - \frac{k}{n})s)^2}{s}) = \exp(-2 (\beta - \frac{k}{n})^2 s).
\end{align*} To have this be $\leq 1/\poly(n)$, it suffices to have $\frac{k}{n} = \beta- \sqrt{\frac{\log n}{s}}$. Thus whp (with high probability, i.e. $\geq 1 - 1/\poly(n)$), $\rk(x) \geq n(\beta - \sqrt{\frac{\log n}{s}})$. Similarly (since we only used additive Chernoff) we can show that whp $\rk(x) \leq n(\beta + \sqrt{\frac{\log n}{s}})$. And so whp $\rk(x) = (1 \pm \eps)\beta n$ if $s \geq \frac{1}{\eps^2}\log n$.
So now fix $\eps = \sqrt{\frac{\log n}{s}}$. Let $L$ be the $(1/2 - \eps)s\th$ largest element of $S$. Plugging $\beta = 1/2 - \eps$ into the above bound, we see that whp $\rk(L) \in [n/2 - 2 \eps n, n/2]$. Similarly if $R$ is the $(1/2 + \eps)s\th$ largest element of $S$, then whp $\rk(R) \in [n/2, n/2 + 2\eps n]$. When both of these happen, we have that the true median is contained within $[L, H]$, an interval of length $\leq 4\eps n = o(n)$, as desired.
To compute $[L, H]$, we need to make a linear pass over all elements to classify them as $\gtreqless L, H$. If we compare to $L$ first and only to $R$ if necessary, then for roughly half the elements (specifically, $\geq n/2 - 2\eps n$ of them) we avoid comparing them to $R$ since they are less than $L$. So the number of comparisons here is $3n/2 + o(n)$.
Once we've found $[L, H]$, we can find the true median by simply sorting it. This takes at most $O(4 \eps n \log (4\eps n))$ steps. Thus including the original step of sorting $S$ itself, the total number of operations is \[ O(s \log s) + \frac{3n}{2} + O( \eps n \log (\eps n)). \] Plugging in $\eps = \sqrt{\log n/s}$, we see that this quantity is minimized at some appropriate $s$. Specifically, plugging in $s = n^{2/3}$ gives $3n/2 + n^{2/3} \log^{2/3} n = 3n/2 + o(n)$. Thus our new algorithm is indeed simpler, since it avoids recursion, and has a better constant (3/2 instead of 2).
\end{document}