\documentclass[11pt]{article}
\usepackage{amsmath,amssymb,amsthm}
\usepackage{xcolor}
\usepackage{graphicx}
\usepackage{tikz}
\DeclareMathOperator*{\E}{\mathbb{E}}
\let\Pr\relax
\DeclareMathOperator*{\Pr}{\mathbb{P}}
\newcommand{\eps}{\epsilon}
\newcommand{\inprod}[1]{\left\langle #1 \right\rangle}
\newcommand{\R}{\mathbb{R}}
\newcommand{\handout}[5]{
\noindent
\begin{center}
\framebox{
\vbox{
\hbox to 5.78in { {\bf CS 388R: Randomized Algorithms } \hfill #2 }
\vspace{4mm}
\hbox to 5.78in { {\Large \hfill #5 \hfill} }
\vspace{2mm}
\hbox to 5.78in { {\em #3 \hfill #4} }
\textcolor{red}{\textbf{NOTE:} THESE NOTES HAVE NOT BEEN EDITED OR CHECKED FOR CORRECTNESS}
}
}
\end{center}
\vspace*{4mm}
}
\newcommand{\lecture}[4]{\handout{#1}{#2}{#3}{Scribe: #4}{Lecture #1}}
\newtheorem{theorem}{Theorem}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{observation}[theorem]{Observation}
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{claim}[theorem]{Claim}
\newtheorem{fact}[theorem]{Fact}
\newtheorem{assumption}[theorem]{Assumption}
% 1-inch margins, from fullpage.sty by H.Partl, Version 2, Dec. 15, 1988.
\topmargin 0pt
\advance \topmargin by -\headheight
\advance \topmargin by -\headsep
\textheight 8.9in
\oddsidemargin 0pt
\evensidemargin \oddsidemargin
\marginparwidth 0.5in
\textwidth 6.5in
\parindent 0in
\parskip 1.5ex
\begin{document}
\lecture{7 --- 09/21/, 2017}{Fall 2017}{Prof.\ Eric Price}{Isidoros Tziotis, Nathan Guermond}
\section{Overview}
In the last lecture we covered how to throw balls into bins with two choices.
In this lecture we begin by the problem of approximating the mean of an unknown distribution by sampling, and then we turn our attention towards Hash Tables and specifically Cuckoo Hashing and some of its properties. Cuckoo Hashing takes constant time for lookup and delete in the worst case and constant expected time for insertion. On the other hand it requires linear space.
\section{Approximating the mean}
Consider the following problem-- we have an unknown distribution $\mathcal D$ over $\R$ with an unknown mean $\mu$ and variance $\sigma^2$. The goal is determine an approximation $\hat\mu$ for $\mu$ by sampling, so that
$$\Pr[|\hat\mu-\mu|\leq \epsilon \sigma]\geq 1-\delta$$
for an appropriately chosen $\delta$.
Say we have independent random samples $X_1,\ldots,X_n\sim \mathcal D$. A simple solution would be to take the average $\hat\mu=Z=\frac{1}{n}\sum_{i=1}^n X_i$. Now, how can we bound $\hat\mu$ from the average $\mu$? Chebyshev's inequality gives us the bound
$$\Pr[|Z-\mu|\leq t]\leq \frac{\sigma_Z^2}{t^2}=\frac{\sigma^2}{nt^2},$$ where $\sigma_Z^2$ is the variance for $Z$, and $\sigma^2$ is the variance of each variable $X_i$.
Set $t=\epsilon\sigma$, and the above bound gives us
$$\Pr[|Z-\mu|\leq \epsilon\sigma]\leq \frac{1}{n\epsilon^2}=\delta,$$
so we should choose $n=\frac{1}{\epsilon^2\delta}$. Can we do better than this (ie. is this bound tight)?
Let's first see what happens with the Gaussian distribution $Z\sim \mathcal N(\mu,\sigma^2/n)$, then one can show that $$\Pr[|Z-\mu|\geq \frac{t\sigma}{\sqrt n}]\leq 2e^{-t^2/2},$$ so setting $\epsilon=\frac{t}{\sqrt n}$, we would need to choose $n\geq \frac{2}{\epsilon^2}\log{\frac{\delta}{2}}$.
To answer whether this is tight, let us first consider examples for which Markov's inequality is tight. Consider the distribution in which $0$ is chosen with probability $1-p$ and $k$ is chosen with probability $p$. Then for a random variable $X$, $\mu=kp$ and Markov tells us
$$p=\Pr[X\geq k]\leq \frac{\mu}{k}=p,$$
which is tight. We can do the same with Chebyshev's inequality. Consider the distribution in which $\alpha=\frac{1}{\sqrt p}\sigma$ and $-\alpha$ are each chosen with probability $p/2$, and $0$ is chosen with probability $1-p$. Then for a random variable $X$, the variance is $\sigma^2$ and Chebyshev tells us
$$p=\Pr[|X|\geq \frac{1}{\sqrt p}\sigma]\leq \frac{\sigma^2}{\alpha^2}=p.$$
Now, for some chosen $\delta,n$ suppose we have the average $Z=\frac{1}{n}\sum_{i=1}^n X_i$ where each $X_i$ is distributed according to the preceding distribution with $p=\frac{2\delta}{n}$. Notice that
$$\Pr[|Z|\geq \frac{1}{n\sqrt p}\sigma]\geq\Pr[\exists ! i\text{ s.t. }X_i\neq 0]=np(1-p)^{n-1}=2\delta(1-\frac{2\delta}{n})^n\approx 2\delta e^{-2\delta}>\delta.$$
Now, in order for
$$\Pr[Z\geq \epsilon \sigma]>2\delta$$
to be less than or equal to $\delta$, we need
we need $\epsilon\geq \frac{1}{n\sqrt p}\sigma=\frac{\sigma}{\sqrt{2\delta n}}$, and thus $n\geq \frac{2}{\epsilon^2\delta}$. This shows that our original bound is tight.
We will now see what happens if instead of taking the average, we take the median of the $X_i$. Note here that there is no $\epsilon$ dependence, ie. since all the $X_i$ take values in $\pm 1$, $Z$ will also take values in $\pm 1$.
We now want to bound the probability that $|Z-\mu|\geq 2\sigma$. Note that since $|Z-\mu|=1=\sigma$, then for the median to not be in $\pm 2\sigma$ we need $n/2$ of the samples to be above or below $\pm 2\sigma$.
Now notice that
$$\Pr[\text{Any }|X_i|\leq 2\sigma]\geq 3/4,$$
so for $Y_i$ the indicator function of whether $|X_i|\leq 2\sigma$ we have
\begin{align*}
\Pr[\text{At most }\frac{n}{2}\text{ of the }|X_i|\leq 2\sigma]&\leq\Pr[\sum_{i=1}^n Y_i \leq n/2]\\
&\leq\Pr[\sum_{i=1}^n Y_i\leq \E[2Y_i]-\frac{n}{4}]\\
&\leq 2^{-n/8}\leq \delta,
\end{align*}
so we would need to choose $n\geq 8\log\frac{1}{\delta}$.
Now, if we put it all together and combine the two methods and pick independent samples
\[\begin{array}{cccc}
X_{11}&X_{12}&\ldots&X_{1n}\\
X_{21}&X_{22}&\ldots&X_{2n}\\
\vdots&\vdots&\ddots&\vdots\\
X_{m1}&X_{m2}&\ldots&X_{mn},
\end{array}\]
then we estimate $\mu$ by taking $\hat\mu_i=\text{mean}_{j\in [m]} X_{ij}$, and $\hat \mu=\text{median}_{i\in [m]}\hat\mu_i$. First, notice that
using Chebyshev,
$$\Pr[\hat\mu_i-\mu|\leq\epsilon\sigma]\geq 1-\frac{1}{n\epsilon^2}\geq 1-\delta_1,$$ where we must choose $n\geq \frac{1}{\delta_1\epsilon^2}$, and we will see later that it suffices to have $\delta_1=\frac{1}{4}$.
We now consider the median of the $\hat\mu_i$'s. Let the random variable $Z_i=1$ if $|\hat\mu_i-\mu|\leq \epsilon\sigma$ and $0$ otherwise, then
\begin{align*}
\Pr[|\text{median}_{i\in[m]}\hat\mu_i-\mu|\leq \epsilon\sigma]&\geq \Pr[\sum_{i=1}^m Z_i>m/2]\\
&=1-\Pr[\sum_{i=1}^m Z_i \leq m(1-\delta_1)-m(\frac{1}{2}-\delta_1)]\\
&\geq 1-\Pr[\sum_{i=1}^m Z_i\leq \E[\sum_{i=1}^m Z_i] -m(\frac{1}{2}-\delta_1)]\\
&\geq 1-\exp(-2(m(\frac{1}{2}-\delta_1))^2/m)\\
&\geq 1-\delta_2
\end{align*}
where we would need $m\geq \frac{1}{2(1/2-\delta_1)^2}\log\frac{1}{\delta_2}$, so if we choose $\delta_1=\frac{1}{4}$, then we only need $m\geq 8\log\frac{1}{\delta_2}$.
\section{Cuckoo Hashing}
\begin{itemize}
\item As we saw in previous lectures if we create a Hash Table and use random placement we get a worst case lookup time $O(\frac{\log n}{\log\log n})$.
\item If instead we use the Power of Two Choices we get $O(\log\log n)$ which is much better.\\
\item Aiming however for constant lookup time we turn our attention to Cuckoo Hashing.
\end{itemize}
In Cuckoo Hashing every cell of the hash table is considered a vertex and every element is mapped (from 2 different hash functions) to 2 vertices thus considered a (directed) edge. \\ \\ $n$ vertices (bins) \\ \\
$m$ edges (balls)\\ \\
Each element can occupy either end of the edge. If an element is mapped to 2 already occupied hash cells then we randomly evict one of them and repeat the same process until an open cell is found.\\ \\
{\bf But things can go sour for our algorithm if a barbell appears in the graph.}
\newpage
%\ref{fig:barbell} in this section for an example.
\begin{figure}
\centering
\caption{\label{fig:barbell}Our algorithm fails if a barbell occurs in the graph.}
\begin{tikzpicture}
\tikzset{vertex/.style={shape=circle,draw,minimum size=1.5em}}
\tikzset{edge/.style={->,>=latex'}}
\node[vertex](a0){};
\node[vertex](a1)[above right of=a0]{};
\node[vertex](a2)[below right of =a1]{};
\node[vertex](b0)[below of =a0]{};
\node[vertex](b1)[below right of=b0]{};
\node[vertex](b2)[above right of =b1]{};
\path[->]
(a0) edge node {} (a1)
(a1) edge node {} (a2)
(a2) edge node {} (a0)
(b0) edge node {} (a0)
edge node {} (b1)
(b1) edge node {} (b2)
(b2) edge node {} (b0);
\end{tikzpicture}
\end{figure}
%\begin{figure}
%\centering
%\includegraphics[width=0.7\textwidth]{barbell.jpg}
%\caption{\label{fig:barbell}Our algorithm fails if a barbell occurs in the graph.}
%\end{figure}
In order to upper bound the probability our algorithm fails it suffices to compute the probability that any cycle appears in the graph.\\ \\{\bf Note: The analysis will borrow elements from Erdos Renyi $G(n,p)$ graphs and Galton-Wachon processes.}\\ \\
So what is the chance a cycle exists?
$$\Pr[\text{any cycle exists}] \leq \sum\limits_{i=2}^n[\text{any length }i\text{ cycle exists}]$$
Without loss we can focus on undirected cycles and given that we have $n\choose i$ different cycles of length $i$ we proceed:
$$\leq \sum\limits_{i=2}^n n^i\Pr[\text{specific cycle of length $i$ exists}] \leq \sum\limits_{i=2}^n n^i\Pr[\text{any particular set of $i$ edges exists}] $$
Focusing on the probability that any particular set of $i$ edges exists we notice that there are $m \choose i$ possible edge assignments and each one of them is taking place with probability $ {n \choose 2} ^{-i}$
thus
$$\sum\limits_{i=2}^n n^i\Pr[\text{any particular set of $i$ edges exists}] \leq \sum\limits_{i=2}^n n^i m^i {n\choose 2}^{-i} \leq \sum\limits_{i=2}^n O((\frac{m}{n}) ^i)$$
If we have that $m