High-Dimensional Probability Lecture Homework

Homework 1

Problem 1 (Expectation as the optimal estimator)

Let $X$ be a random variable with finite expectation. Show that the function $f(a) = \mathbb{E}(X − a)^2$ is minimized at $a = \mathbb{E} X$ .

$\mathbb{E}(X-a)^2=\mathbb{E}X^2-2a\mathbb{E}X+a^2=(a-\mathbb{E}X)^2.$

Problem 2 (Expectation of random vectors)

Let $X$ and $Y$ be random vectors in $\mathbb{R}^n$ .

(a)

(Linearity) Prove that for any constants $a, b\in \mathbb{R}$ , we have

$\mathbb{E}(aX + bY ) = a \mathbb{E}(X) + b \mathbb{E}(Y ).$

$\mathbb{E}(aX+bY)=(a\mathbb{E}(X_i)+b\mathbb{E}(Y_i))_{i=1}^n=a\mathbb{E}(X)+b\mathbb{E}(Y).$

(b)

(Multiplicativity) Prove that if $X$ and $Y$ are independent, then

$\mathbb{E}\langle X, Y\rangle = \langle \mathbb{E} X, \mathbb{E} Y\rangle.$

This generalizes the identity $\mathbb{E}(XY ) = (\mathbb{E} X)(\mathbb{E} Y )$ for independent random variables.

$\mathbb{E}\langle X, Y\rangle=\mathbb{E}(\sum_{i=1}^n X_iY_i)=\sum_{i=1}^n \mathbb{E}X_i\mathbb{E}Y_i=\langle \mathbb{E}X,\mathbb{E}Y\rangle.$

Problem 3 (Expectation as optimal estimator: random vectors)

Let $X$ be a random vector with finite expectation. Show that the function $f(a) = \mathbb{E}\Vert X − a\Vert_2^2$ is minimized at $a = \mathbb{E} X$ .

$\mathbb{E}\Vert X-a\Vert^2_2=\mathbb{E}\langle X-a,X-a\rangle=\sum_{i=1}^n \mathbb{E}(X_i-a_i)^2\implies a=\mathbb{E}X.$

Problem 4 (A variance-like identity)

For a random vector in $\mathbb{R}^n$ , define $V (X) = \mathbb{E}\Vert X − \mathbb{E} X\Vert_2^2$ .

(a)

Prove that

$V (X) = \mathbb{E}\Vert X\Vert_2^2 −\Vert \mathbb{E} X\Vert_2^2.$

$\begin{align*} V(X)&=\mathbb{E}\Vert X-\mathbb{E}X\Vert^2_2=\mathbb{E}\langle X-\mathbb{E}X,X-\mathbb{E}X\rangle=\mathbb{E}\sum_{i=1}^n (X_i-\mathbb{E}X_i)^2\\\\ &=\sum_{i=1}^n \mathbb{E}X_i^2-(\mathbb{E}X_i)^2=\mathbb{E}\Vert X\Vert^2_2-\Vert\mathbb{E}X\Vert_2^2. \end{align*}$

(b)

Prove that

$V (X) = \frac{1}{2}\mathbb{E}\Vert X − X_0\Vert_2^2 ,$

where $X_0$ is an independent copy of $X$ .

$\frac{1}{2}\mathbb{E}\Vert X-X'\Vert_2^2=\frac{1}{2}\sum_{i=1}^n\mathbb{E}(X_i-X'_i)^2=\sum_{i=1}^n\mathbb{E}X_i^2-\mathbb{E}(X_iX'_i)=\mathbb{E}\Vert X\Vert^2_2-\Vert\mathbb{E}X\Vert_2^2.$

Problem 5 (Additivity of variance)

(a)

Check that if $X$ and $Y$ are independent random vectors in $\mathbb{R}^n$ with zero means, then

$\mathbb{E}\Vert X+Y\Vert_2^2=\mathbb{E}\Vert X\Vert_2^2+\mathbb{E}\Vert Y\Vert_2^2.$

$\mathbb{E}\Vert X+Y\Vert^2_2=\sum_{i=1}^n \mathbb{E}(X_i+Y_i)^2=\mathbb{E}\Vert X\Vert_2^2+\mathbb{E}\Vert Y\Vert_2^2.$

(b)

Prove that if random vectors $X_1,\cdots , X_k$ in $\mathbb{R}^n$ are independent, then

$V (X_1 +\cdots + X_k) = V (X_1) +\cdots + V (X_k),$

where $V (\cdot)$ is defined as $V (X) = \mathbb{E}\Vert X − \mathbb{E} X\Vert_2^2$ .

$\begin{align*} V(X_1+X_2+\cdots+X_k)&=\mathbb{E}\Vert \sum_{i=1}^k X_i\Vert_2^2-\Vert \mathbb{E}\sum_{i=1}^k X_i\Vert_2^2\\\\ &=\mathbb{E}\sum_{i=1}^n (\sum_{j=1}^k X_{j,i})^2-\sum_{i=1}^n (\sum_{j=1}^k \mathbb{E}X_{j,i})^2\\\\ &=\sum_{i=1}^n (\sum_{j=1}^k\mathbb{E}X_{j,i}^2+2\sum_{j=1}^k\sum_{l=j+1}^k \mathbb{E}X_{j,i}\mathbb{E}X_{l,i}-\sum_{j=1}^k \mathbb{E}^2X_{j,i}-2\sum_{j=1}^k\sum_{l=j+1}^k \mathbb{E}X_{j,i}\mathbb{E}X_{l,i})\\\\ &=\sum_{j=1}^k \sum_{i=1}^n \mathbb{E}X_{j,i}^2-\mathbb{E}^2X_{j,i}=\sum_{j=1}^k V(X_j). \end{align*}$

Homework 2

Problem 1 (Balancing vectors)

Let $x_1,\cdots , x_n$ be an arbitrary set of unit vectors in $\mathbb{R}^n$ . Prove that there exist $\varepsilon_1,\cdots , \varepsilon_n\in \lbrace −1, 1\rbrace$ such that

$\Vert \varepsilon_1x_1+\cdots+\varepsilon_nx_n\Vert_2\leq \sqrt{n}.$

Choose $\varepsilon_i\sim X$ , where $\mathbb{P}[X=1]=\mathbb{P}[X=-1]=\frac{1}{2}$ and $\mathbb{E}[X]=0$ . Compute

$\mathbb{E}\Vert \sum_{i=1}^n \varepsilon_ix_i\Vert_2^2=\sum_{i=1}^n \mathbb{E}\Vert\epsilon_ix_i\Vert_2^2=n\\\\ \implies \exists\ \varepsilon_i, \text{ s.t. }\Vert\sum_{i=1}^n \varepsilon_ix_i\Vert_2^2\leq n,\text{ i.e. }\Vert\sum_{i=1}^n \varepsilon_ix_i\Vert_2\leq \sqrt{n}.$

Problem 2 (Random vectors have norm $\approx \sqrt{n}$ )

Let $X$ be a standard normal random vector in $\mathbb{R}^n$ , where $n\geq C_1$ .

(a)

Check that

$\mathbb{E}\Vert X\Vert_2^2 = n \text{ and }\text{Var}(\Vert X\Vert_2^2) = 2n.$

$\mathbb{E}\Vert X\Vert_2^2=\sum_{i=1}^n\mathbb{E}\Vert X_i\Vert_2^2=n.$

$\text{Var}(\Vert X\Vert_2^2)=\mathbb{E}\Vert X\Vert_2^4-(\mathbb{E}\Vert X\Vert_2^2)^2\\\\ =\mathbb{E}[(\sum_{i=1}^n X_i^2)^2]-n^2=\mathbb{E}[\sum_{i=1}^n X_i^4+2\sum_{i<j}X_i^2X_j^2]-n^2\\\\ =3n+2n(n-1)/2-n^2=2n.$

(b)

Conclude that

$|\Vert X\Vert_2^2 − n|\leq C_2 \sqrt{n}\text{ with probability at least }0.99.$

Chebshev’s inequality.

$\mathbb{P}[|\Vert X\Vert_2^2-n|\leq C_2\sqrt{n}]=1-\mathbb{P}[|\Vert X\Vert_2^2-\mu|\geq C_2\sqrt{n}]\\\\ \geq 1-\frac{2}{C_2^2}$

Let $C_2=\sqrt{200}$ .

(c)

Deduce that

$\frac{1}{2}\sqrt{n}\leq \Vert X\Vert_2\leq 2\sqrt{n}\text{ with probability at least }0.99.$

Use conclusion from (b), then

$\sqrt{n-C_2\sqrt{n}}\leq \Vert X\Vert_2\leq \sqrt{n+C_2\sqrt{n}}, \text{ with probability at least }0.99$

Upper Bound: $\implies n\geq (\frac{C_2}{3})^2$ .

Lower Bound: $\implies n\geq (\frac{4C_2}{3})^2$ .

(d)

Prove the tighter bound:

$|\Vert X\Vert_2 − \sqrt{n}|\leq C_3\text{ with probability at least }0.99.$

$|\Vert X\Vert_2-\sqrt{n}|=\frac{|\Vert X\Vert_2^2-n|}{|\Vert X\Vert_2+\sqrt{n}|}\leq \frac{C_2\sqrt{n}}{\sqrt{n}}=C_2.$

Problem 3 (Random vectors are almost orthogonal)

Let $X$ and $Y$ be independent standard normal random vectors in $\mathbb{R}^n$ , where $n\geq C_3$ .

(a)

Check that

$\mathbb{E}\langle X, Y\rangle^2 = n.$

$\mathbb{E}\langle X,Y\rangle^2=\mathbb{E}[(\sum_{i=1}^n X_iY_i)^2]=\mathbb{E}[\sum_{i=1}^n X_i^2Y_i^2+2\sum_{i<j}X_iY_iX_jY_j]=n.$

(b)

Deduce that

$\langle X, Y\rangle^2\leq C_4n\text{ with probability at least }0.99.$

Markov’s inequality.

$\mathbb{P}[\langle X,Y\rangle^2\leq C_4n]=1-\mathbb{P}[\langle X,Y\rangle^2\geq C_4n]\geq \frac{n}{C_4n}=\frac{1}{C_4}.$

Let $C_4=100$ .

(c)

Denote by $\theta$ the angle between the vectors $X$ and $Y$ , as shown in the figure below. Prove that

$|\theta−\frac{\pi}{2}|\leq \frac{C_5}{\sqrt{n}}\text{ with probability at least }0.97.$

Use Problem 3(b) and Problem 2©.

$\mathbb{P}[\cos^2(\theta)=\frac{\langle X,Y\rangle^2}{\Vert X\Vert_2^2\Vert Y\Vert_2^2}\leq \frac{C_4n}{(\frac{\sqrt{n}}{2})^2(\frac{\sqrt{n}}{2})^2}=\frac{16C_4}{n}]\geq 1-3\times0.001=0.997\\\\ \implies \mathbb{P}[|\cos(\theta)|\leq \frac{4\sqrt{C_4}}{\sqrt{n}}]\geq 0.997.$

Analyse $|\theta-\pi/2|$ .

$\cos(\theta)=\sin(\frac{\pi}{2}-\theta)\approx \frac{\pi}{2}-\theta\\\\ \implies |\frac{\pi}{2}-\theta|\approx |\cos(\theta)|\leq \frac{4\sqrt{C_4}}{\sqrt{n}}=\frac{C_5}{\sqrt{n}}.$

Homework 3

Problem 1 (Small ball probabilities)

Let $X_1,\cdots , X_N$ be non-negative independent random variables. Assume that the PDF (probability density function) of each $X_i$ is uniformly bounded by $1$ .

(a)

Check that each $X_i$ satisfies

$\mathbb{P}\lbrace X_i\leq \varepsilon\rbrace\leq \varepsilon\text{ for all }\varepsilon > 0.$

$\mathbb{P}\lbrace X_i\leq \varepsilon\rbrace=\int_0^{\varepsilon} f_{X_i}(x){\rm d}x\leq \int_0^{\varepsilon} {\rm d}x=\varepsilon.$

(b)

Show that the MGF (moment generating function) of each $X_i$ satisfies

$\mathbb{E}\exp(−tX_i)\leq \frac{1}{t}\text{ for all }t > 0.$

$\mathbb{E}\exp(-tX_i)=\int_0^{\infty} \exp(-tx)\mathbb{P}\lbrace X_i=x\rbrace{\rm d}x\leq \int_0^{\infty}\exp(-tx){\rm d}x=\frac{1}{t}.$

(c)

Deduce that averaging increases the strength of (a) dramatically. Namely, show that

$\mathbb{P}\lbrace \frac{1}{N}\sum_{i=1}^N X_i\leq \varepsilon\rbrace\leq (C\varepsilon)^N\text{ for all } \varepsilon > 0.$

Use Chernoff’s inequality, let $\mu=\mathbb{E}\sum_{i=1}^N X_i$ ,

$\mathbb{P}\lbrace \sum_{i=1}^N X_i\leq N\varepsilon\rbrace\leq \min_{t<0}e^{-tN\varepsilon}\mathbb{E}[e^{t\sum_{i=1}^N X_i}]=\min_{t>0}\prod_{i=1}^N \mathbb{E}[e^{-tX_i}]e^{tN\varepsilon}\leq \min_{t>0}t^{-N}e^{tN\varepsilon}.$

Let $f(t)=t^{-N}e^{tN\varepsilon}$ , then

$\frac{\mathrm{d}\ln f(t)}{\mathrm{d}t}=-N\frac{1}{t}+N\varepsilon=0\implies t=\frac{1}{\varepsilon}\\\\ \implies \mathbb{P}\lbrace \frac{1}{N}\sum_{i=1}^N X_i\leq \varepsilon\rbrace\leq (e\varepsilon)^N.$

Problem 2 (Boosting randomized algorithms)

Imagine we have an algorithm for solving some decision problem (for example, the algorithm may answer the question: “is there a helicopter in a given image?”). Suppose each time the algorithm runs, it gives the correct answer independently with probability $\frac{1}{2} + \delta$ with some small margin $\delta\in (0, 1/2)$ . In other words, the algorithm does just marginally better than a random guess.

To improve the confidence, the following “boosting” procedure is often used. Run the algorithm $N$ times, and take the majority vote. Show that the new algorithm gives the correct answer with probability at least $1 − 2\exp(−c\delta^2 N)$ . This is good because the confidence rapidly (exponentially!) approaches $1$ as $N$ grows.

Let $\mathbb{P}\lbrace X_i=0\rbrace=\delta+\frac{1}{2}, \mathbb{P}\lbrace X_i=1\rbrace=\frac{1}{2}-\delta$ , $\mu=\mathbb{E}\sum_{i=1}^N X_i=N(\frac{1}{2}-\delta)$ .

Use Chernoff’s inequality and Taylor Expansion,

$\mathbb{P}\lbrace S_N\geq \frac{N}{2}\rbrace\leq e^{-\mu}(\frac{2e\mu}{N})^{N/2}=e^{-N(1/2-\delta)}(e(1-2\delta))^{N/2}=e^{N\delta}(1-2\delta)^{N/2}=(e^\delta(1-2\delta)^{1/2})^N\\\\ =e^{N(\delta+\frac{1}{2}\ln(1-2\delta))}\leq e^{N(\delta+\frac{1}{2}(-2\delta-2\delta^2))}=e^{-N\delta^2}\\\\ \implies \mathbb{P}\lbrace S_N<\frac{N}{2}\rbrace\geq 1-e^{-N\delta^2}.$

Problem 3 (Optimality of Chernoff’s inequality)

(a)

Prove the following useful inequalities for binomial coefficients:

$\left(\frac{n}{m}\right)^m\leq \binom{n}{m}\leq \sum_{k=0}^m \binom{n}{k}\leq \left(\frac{en}{m}\right)^m$

for all integers $m\in [1, n]$ .

The first inequality:

$\frac{n}{m}\leq \frac{n-i}{m-i}, \text{ for all }i=0,\cdots,m-1\\\\ \implies \left(\frac{n}{m}\right)^m\leq \frac{n}{m}\frac{n-1}{m-1}\cdots\frac{n-m+1}{1}=\binom{n}{m}.$

The second inequality: Obviously.

The third inequality:

$\sum_{k=0}^m \binom{n}{k}\leq \left(\frac{en}{m}\right)^m\iff \sum_{k=0}^m \binom{n}{k}\left(\frac{m}{n}\right)^m\leq \sum_{k=0}^{\infty} \frac{m^k}{k!}.$

We have

$\binom{n}{k}\left(\frac{m}{n}\right)^m\leq \frac{m^k}{k!}\iff \frac{n!m^m}{(n-k)!n^m}\leq m^{k}.$

And

$\frac{(n-i)m}{n}\leq m\implies \frac{n!m^m}{(n-k)!n^m}\leq \frac{n!m^k}{(n-k)!n^k}\leq m^k\implies \sum_{k=0}^m \binom{n}{k}\left(\frac{m}{n}\right)^m\leq \sum_{k=0}^{\infty} \frac{m^k}{k!}\\\\ \implies \sum_{k=0}^m \binom{n}{k}\leq \left(\frac{en}{m}\right)^m.$

(b)

Show that Chernoff’s inequality is almost optimal. Namely, if $S_N$ is a binomial random variable with mean $\mu$ , that is $S_N\sim \text{Binom}(N, \mu/N)$ , show that

$\mathbb{P}\lbrace S_N\geq t\rbrace\geq e^{-\mu}(\mu/t)^t$

for any integer $t\in \lbrace 1,\cdots , N\rbrace$ such that $t\geq \mu$ .

Use Problem 3 (a).

$\mathbb{P}\lbrace S_N\geq t\rbrace=\sum_{i=t}^N \binom{N}{i}(\mu/N)^i(1-\mu/N)^{N-i}\\\\ \geq \sum_{i=t}^N (\frac{N}{i})^i(\mu/N)^i(1-\mu/N)^{N-i}=\sum_{i=t}^N (\mu/i)^i(1-\mu/N)^{N-i}\geq (\mu/t)^t(1-\mu/N)^{N-t}\\\\ \geq(\mu/t)^t\left(\frac{N-\mu}{N}\right)^{N-\mu}\geq (\mu/t)^te^{-\mu}\iff \left(\frac{N-\mu}{N}\right)^N\geq \left(\frac{N-\mu}{eN}\right)^{\mu}\\\\ \iff N(\ln (N-\mu)-\ln N)\geq \mu(\ln (N-\mu)-1-\ln N)\iff (N-\mu)\ln(N-\mu)+\mu \ln N+\mu\geq N\ln N$

Let $f(\mu)=(N-\mu)\ln(N-\mu)+\mu\ln N+\mu$ , then

$f'(\mu)=-1-\ln(N-\mu)+\ln N+1=\ln N-\ln(N-\mu)\geq 0\\\\ \implies f(\mu)\geq f(0)=N\ln N.$

Q.E.D.

Problem 4 (The lower tail in Chernoff’s inequality)

Let $X_i$ be independent Bernoulli random variables with parameters $p_i$ . Consider their sum $S_N = \sum_{i=1}^N X_i$ and denote its mean by $\mu = \mathbb{E} S_N$ . Then, for any $t <\mu$ , we have

$\mathbb{P}\lbrace S_N\leq t\rbrace\leq e^{-\mu}(\frac{e\mu}{t})^t$

Use $\lambda$ -MGF Method and Markov’s inequality.

$\mathbb{P}\lbrace S_N\leq t\rbrace=\mathbb{P}\lbrace \exp(-\lambda S_N)\geq \exp(-\lambda t)\rbrace\leq \exp(\lambda t)\mathbb{E}\exp(-\lambda S_N)\\\\ =\exp(\lambda t)\prod_{i=1}^N \mathbb{E}\exp(-\lambda X_i)=\exp(\lambda t)\prod_{i=1}^N (1-p_i+p_ie^{-\lambda})\\\\ =\exp(\lambda t)\prod_{i=1}^N (p_i(e^{-\lambda}-1)+1)\leq \exp(\lambda t)\prod_{i=1}^N \exp(p_i(e^{-\lambda}-1))\\\\ =\exp(\lambda t)\exp(\mu(e^{-\lambda}-1))=\exp(\lambda t+\mu e^{-\lambda}-\mu).$

Let $f(\lambda)=\lambda t+\mu e^{-\lambda}$ , then

$f'(\lambda)=t-\mu e^{-\lambda}=0\implies \lambda=\ln(\mu/t)\implies \min f(\lambda)=f(\ln(\mu/t))=t\ln(\mu/t)+t.$

Therefore,

$\exp(\lambda t+\mu e^{-\lambda}-\mu)\leq \exp(t\ln(\mu/t)+t-\mu)=e^{-\mu}\left(\frac{e\mu}{t}\right)^t.$

Homework 4

Problem 1 (Irregularity of sparse random graphs)

Consider a random graph $G(N, p)$ whose expected degree satisfies $d = (N − 1)p < c \log N$ . Show that with probability at least $0.9$ , the exists at least one vertex with degree at least $10d$ .

The uncomfortable point about the lower bound is that the $d_i$ ’s are actually not independent of each other. When we loosen the upper bound, we can ignore the independence and take a higher upper bound. However, the lower bound will be restricted by the lack of independence. Therefore, the core idea is to find a set that can be independent.

Let $V'$ be a set of vertices with no edges between any two vertices. Note that $d = (n - 1)p = o(\log n)$ , so $p = o(\log n/n)$ . Thus,

$\mathbb{P}\lbrace \exists\ u,v\in V':(u,v)\in G\rbrace\leq p|V'|^2=o(\log n/n)|V'|^2.$

Therefore, setting $|V'| = n^{1/3}$ ensures that there are no edges between any two vertices in $V'$ as $n \to \infty$ .

At this point, we calculate

$\mathbb{P}\lbrace \exists\ i\in V':d_i=10d\rbrace=1-\mathbb{P}\lbrace \forall\ i\in V':d_i\neq 10d\rbrace=1-\mathbb{P}\lbrace d_i\neq 10d\rbrace^{|V'|}\geq 1-\mathbb{P}\lbrace d_i\neq 10d\rbrace^n.$

Moreover,

$\begin{align*} \mathbb{P}\lbrace d_i=10d\rbrace&=\frac{1}{\sqrt{2\pi 10d}}e^{-d}\left(\frac{ed}{10d}\right)^{10d}=\exp\left(-d-10d\log(10d)+10d+10d\log d-\frac{1}{2}\log(20\pi d)\right)\\ &=\exp\left(9d-10d\log 10-C_1-\frac{1}{2}\log d\right)=\exp\left(-C_2d-C_3\log d-C_1\right)\\ &>\exp\left(-C_2c\log n-C_3\log c-C_3\log\log n-C_1\right)>n^{-c}. \end{align*}$

Substituting this into the previous expression completes the proof.

Problem 2 (No too high degrees)

(a)

Check that the classical form of Chernoff’s inequality (Theorem 2.3.1 in the book) implies the following simple and useful bound:

$\mathbb{P}\lbrace S\geq t\rbrace \leq e^{-t}\text{ for any }t\geq e^2\mu.$

where $S$ is any binomial random variable with mean $\mu$ .

From Chernoff’s inequality, we have

$\mathbb{P}\lbrace S\geq t\rbrace \leq e^{-\mu}(\frac{e\mu}{t})^t,\text{ for any }t>\mu.$

If we consider the case where $t\geq e^2\mu$ , then

$e^{-\mu}(\frac{e\mu}{t})^t\leq (\frac{e\mu}{e^2\mu})^t=e^{-t}.$

(b)

Consider a random graph $G(N, p)$ whose expected degree satisfies $d = (N − 1)p\leq \log N$ . Show that with probability at least $0.9$ , the following holds: the degrees of all vertices are bounded by $C\log N$ .

Problem 3 (Discrepancy)

Put $N$ random points into the unit square $[0, 1]^2$ independently and according to the uniform distribution. Show that with probability at least $0.99$ , this set of points is good in the following sense.

For any axis-aligned rectangle $I\subset [0, 1]^2$ we have

$\lambda_I N-C\sqrt{\lambda_IN\log N}\leq N_I\leq \lambda_IN+C\sqrt{\lambda_IN\log N}$

as long as the left hand side is nonnegative and $N\geq C_1$ .

Here $N_I$ denotes the number of points in the rectangle $I$ , $\lambda_I$ denotes the area of $I$ , and $C$ is an absolute constant.