Official website.

Homework 1

Problem 1 (Expectation as the optimal estimator)

Let XX be a random variable with finite expectation. Show that the function f(a)=E(Xa)2f(a) = \mathbb{E}(X − a)^2 is minimized at a=EXa = \mathbb{E} X.

E(Xa)2=EX22aEX+a2=(aEX)2.\mathbb{E}(X-a)^2=\mathbb{E}X^2-2a\mathbb{E}X+a^2=(a-\mathbb{E}X)^2.

Problem 2 (Expectation of random vectors)

Let XX and YY be random vectors in Rn\mathbb{R}^n.

(a)

(Linearity) Prove that for any constants a,bRa, b\in \mathbb{R}, we have

E(aX+bY)=aE(X)+bE(Y).\mathbb{E}(aX + bY ) = a \mathbb{E}(X) + b \mathbb{E}(Y ).

E(aX+bY)=(aE(Xi)+bE(Yi))i=1n=aE(X)+bE(Y).\mathbb{E}(aX+bY)=(a\mathbb{E}(X_i)+b\mathbb{E}(Y_i))_{i=1}^n=a\mathbb{E}(X)+b\mathbb{E}(Y).

(b)

(Multiplicativity) Prove that if XX and YY are independent, then

EX,Y=EX,EY.\mathbb{E}\langle X, Y\rangle = \langle \mathbb{E} X, \mathbb{E} Y\rangle.

This generalizes the identity E(XY)=(EX)(EY)\mathbb{E}(XY ) = (\mathbb{E} X)(\mathbb{E} Y ) for independent random variables.

EX,Y=E(i=1nXiYi)=i=1nEXiEYi=EX,EY.\mathbb{E}\langle X, Y\rangle=\mathbb{E}(\sum_{i=1}^n X_iY_i)=\sum_{i=1}^n \mathbb{E}X_i\mathbb{E}Y_i=\langle \mathbb{E}X,\mathbb{E}Y\rangle.

Problem 3 (Expectation as optimal estimator: random vectors)

Let XX be a random vector with finite expectation. Show that the function f(a)=EXa22f(a) = \mathbb{E}\Vert X − a\Vert_2^2 is minimized at a=EXa = \mathbb{E} X.

EXa22=EXa,Xa=i=1nE(Xiai)2    a=EX.\mathbb{E}\Vert X-a\Vert^2_2=\mathbb{E}\langle X-a,X-a\rangle=\sum_{i=1}^n \mathbb{E}(X_i-a_i)^2\implies a=\mathbb{E}X.

Problem 4 (A variance-like identity)

For a random vector in Rn\mathbb{R}^n , define V(X)=EXEX22V (X) = \mathbb{E}\Vert X − \mathbb{E} X\Vert_2^2 .

(a)

Prove that

V(X)=EX22EX22.V (X) = \mathbb{E}\Vert X\Vert_2^2 −\Vert \mathbb{E} X\Vert_2^2.

V(X)=EXEX22=EXEX,XEX=Ei=1n(XiEXi)2=i=1nEXi2(EXi)2=EX22EX22.\begin{align*} V(X)&=\mathbb{E}\Vert X-\mathbb{E}X\Vert^2_2=\mathbb{E}\langle X-\mathbb{E}X,X-\mathbb{E}X\rangle=\mathbb{E}\sum_{i=1}^n (X_i-\mathbb{E}X_i)^2\\\\ &=\sum_{i=1}^n \mathbb{E}X_i^2-(\mathbb{E}X_i)^2=\mathbb{E}\Vert X\Vert^2_2-\Vert\mathbb{E}X\Vert_2^2. \end{align*}

(b)

Prove that

V(X)=12EXX022,V (X) = \frac{1}{2}\mathbb{E}\Vert X − X_0\Vert_2^2 ,

where X0X_0 is an independent copy of XX.

12EXX22=12i=1nE(XiXi)2=i=1nEXi2E(XiXi)=EX22EX22.\frac{1}{2}\mathbb{E}\Vert X-X'\Vert_2^2=\frac{1}{2}\sum_{i=1}^n\mathbb{E}(X_i-X'_i)^2=\sum_{i=1}^n\mathbb{E}X_i^2-\mathbb{E}(X_iX'_i)=\mathbb{E}\Vert X\Vert^2_2-\Vert\mathbb{E}X\Vert_2^2.

Problem 5 (Additivity of variance)

(a)

Check that if XX and YY are independent random vectors in Rn\mathbb{R}^n with zero means, then

EX+Y22=EX22+EY22.\mathbb{E}\Vert X+Y\Vert_2^2=\mathbb{E}\Vert X\Vert_2^2+\mathbb{E}\Vert Y\Vert_2^2.

EX+Y22=i=1nE(Xi+Yi)2=EX22+EY22.\mathbb{E}\Vert X+Y\Vert^2_2=\sum_{i=1}^n \mathbb{E}(X_i+Y_i)^2=\mathbb{E}\Vert X\Vert_2^2+\mathbb{E}\Vert Y\Vert_2^2.

(b)

Prove that if random vectors X1,,XkX_1,\cdots , X_k in Rn\mathbb{R}^n are independent, then

V(X1++Xk)=V(X1)++V(Xk),V (X_1 +\cdots + X_k) = V (X_1) +\cdots + V (X_k),

where V()V (\cdot) is defined as V(X)=EXEX22V (X) = \mathbb{E}\Vert X − \mathbb{E} X\Vert_2^2.

V(X1+X2++Xk)=Ei=1kXi22Ei=1kXi22=Ei=1n(j=1kXj,i)2i=1n(j=1kEXj,i)2=i=1n(j=1kEXj,i2+2j=1kl=j+1kEXj,iEXl,ij=1kE2Xj,i2j=1kl=j+1kEXj,iEXl,i)=j=1ki=1nEXj,i2E2Xj,i=j=1kV(Xj).\begin{align*} V(X_1+X_2+\cdots+X_k)&=\mathbb{E}\Vert \sum_{i=1}^k X_i\Vert_2^2-\Vert \mathbb{E}\sum_{i=1}^k X_i\Vert_2^2\\\\ &=\mathbb{E}\sum_{i=1}^n (\sum_{j=1}^k X_{j,i})^2-\sum_{i=1}^n (\sum_{j=1}^k \mathbb{E}X_{j,i})^2\\\\ &=\sum_{i=1}^n (\sum_{j=1}^k\mathbb{E}X_{j,i}^2+2\sum_{j=1}^k\sum_{l=j+1}^k \mathbb{E}X_{j,i}\mathbb{E}X_{l,i}-\sum_{j=1}^k \mathbb{E}^2X_{j,i}-2\sum_{j=1}^k\sum_{l=j+1}^k \mathbb{E}X_{j,i}\mathbb{E}X_{l,i})\\\\ &=\sum_{j=1}^k \sum_{i=1}^n \mathbb{E}X_{j,i}^2-\mathbb{E}^2X_{j,i}=\sum_{j=1}^k V(X_j). \end{align*}

Homework 2

Problem 1 (Balancing vectors)

Let x1,,xnx_1,\cdots , x_n be an arbitrary set of unit vectors in Rn\mathbb{R}^n . Prove that there exist ε1,,εn{1,1}\varepsilon_1,\cdots , \varepsilon_n\in \lbrace −1, 1\rbrace such that

ε1x1++εnxn2n.\Vert \varepsilon_1x_1+\cdots+\varepsilon_nx_n\Vert_2\leq \sqrt{n}.

Choose εiX\varepsilon_i\sim X, where P[X=1]=P[X=1]=12\mathbb{P}[X=1]=\mathbb{P}[X=-1]=\frac{1}{2} and E[X]=0\mathbb{E}[X]=0. Compute

Ei=1nεixi22=i=1nEϵixi22=n     εi, s.t. i=1nεixi22n, i.e. i=1nεixi2n.\mathbb{E}\Vert \sum_{i=1}^n \varepsilon_ix_i\Vert_2^2=\sum_{i=1}^n \mathbb{E}\Vert\epsilon_ix_i\Vert_2^2=n\\\\ \implies \exists\ \varepsilon_i, \text{ s.t. }\Vert\sum_{i=1}^n \varepsilon_ix_i\Vert_2^2\leq n,\text{ i.e. }\Vert\sum_{i=1}^n \varepsilon_ix_i\Vert_2\leq \sqrt{n}.

Problem 2 (Random vectors have norm n\approx \sqrt{n})

Let XX be a standard normal random vector in Rn\mathbb{R}^n , where nC1n\geq C_1.

(a)

Check that

EX22=n and Var(X22)=2n.\mathbb{E}\Vert X\Vert_2^2 = n \text{ and }\text{Var}(\Vert X\Vert_2^2) = 2n.

EX22=i=1nEXi22=n.\mathbb{E}\Vert X\Vert_2^2=\sum_{i=1}^n\mathbb{E}\Vert X_i\Vert_2^2=n.

Var(X22)=EX24(EX22)2=E[(i=1nXi2)2]n2=E[i=1nXi4+2i<jXi2Xj2]n2=3n+2n(n1)/2n2=2n.\text{Var}(\Vert X\Vert_2^2)=\mathbb{E}\Vert X\Vert_2^4-(\mathbb{E}\Vert X\Vert_2^2)^2\\\\ =\mathbb{E}[(\sum_{i=1}^n X_i^2)^2]-n^2=\mathbb{E}[\sum_{i=1}^n X_i^4+2\sum_{i<j}X_i^2X_j^2]-n^2\\\\ =3n+2n(n-1)/2-n^2=2n.

(b)

Conclude that

X22nC2n with probability at least 0.99.|\Vert X\Vert_2^2 − n|\leq C_2 \sqrt{n}\text{ with probability at least }0.99.

Chebshev’s inequality.

P[X22nC2n]=1P[X22μC2n]12C22\mathbb{P}[|\Vert X\Vert_2^2-n|\leq C_2\sqrt{n}]=1-\mathbb{P}[|\Vert X\Vert_2^2-\mu|\geq C_2\sqrt{n}]\\\\ \geq 1-\frac{2}{C_2^2}

Let C2=200C_2=\sqrt{200}.

(c)

Deduce that

12nX22n with probability at least 0.99.\frac{1}{2}\sqrt{n}\leq \Vert X\Vert_2\leq 2\sqrt{n}\text{ with probability at least }0.99.

Use conclusion from (b), then

nC2nX2n+C2n, with probability at least 0.99\sqrt{n-C_2\sqrt{n}}\leq \Vert X\Vert_2\leq \sqrt{n+C_2\sqrt{n}}, \text{ with probability at least }0.99

Upper Bound:     n(C23)2\implies n\geq (\frac{C_2}{3})^2.

Lower Bound:     n(4C23)2\implies n\geq (\frac{4C_2}{3})^2.

(d)

Prove the tighter bound:

X2nC3 with probability at least 0.99.|\Vert X\Vert_2 − \sqrt{n}|\leq C_3\text{ with probability at least }0.99.

X2n=X22nX2+nC2nn=C2.|\Vert X\Vert_2-\sqrt{n}|=\frac{|\Vert X\Vert_2^2-n|}{|\Vert X\Vert_2+\sqrt{n}|}\leq \frac{C_2\sqrt{n}}{\sqrt{n}}=C_2.

Problem 3 (Random vectors are almost orthogonal)

Let XX and YY be independent standard normal random vectors in Rn\mathbb{R}^n , where nC3n\geq C_3.

(a)

Check that

EX,Y2=n.\mathbb{E}\langle X, Y\rangle^2 = n.

EX,Y2=E[(i=1nXiYi)2]=E[i=1nXi2Yi2+2i<jXiYiXjYj]=n.\mathbb{E}\langle X,Y\rangle^2=\mathbb{E}[(\sum_{i=1}^n X_iY_i)^2]=\mathbb{E}[\sum_{i=1}^n X_i^2Y_i^2+2\sum_{i<j}X_iY_iX_jY_j]=n.

(b)

Deduce that

X,Y2C4n with probability at least 0.99.\langle X, Y\rangle^2\leq C_4n\text{ with probability at least }0.99.

Markov’s inequality.

P[X,Y2C4n]=1P[X,Y2C4n]nC4n=1C4.\mathbb{P}[\langle X,Y\rangle^2\leq C_4n]=1-\mathbb{P}[\langle X,Y\rangle^2\geq C_4n]\geq \frac{n}{C_4n}=\frac{1}{C_4}.

Let C4=100C_4=100.

(c)

Denote by θ\theta the angle between the vectors XX and YY , as shown in the figure below. Prove that

θπ2C5n with probability at least 0.97.|\theta−\frac{\pi}{2}|\leq \frac{C_5}{\sqrt{n}}\text{ with probability at least }0.97.

Use Problem 3(b) and Problem 2©.

P[cos2(θ)=X,Y2X22Y22C4n(n2)2(n2)2=16C4n]13×0.001=0.997    P[cos(θ)4C4n]0.997.\mathbb{P}[\cos^2(\theta)=\frac{\langle X,Y\rangle^2}{\Vert X\Vert_2^2\Vert Y\Vert_2^2}\leq \frac{C_4n}{(\frac{\sqrt{n}}{2})^2(\frac{\sqrt{n}}{2})^2}=\frac{16C_4}{n}]\geq 1-3\times0.001=0.997\\\\ \implies \mathbb{P}[|\cos(\theta)|\leq \frac{4\sqrt{C_4}}{\sqrt{n}}]\geq 0.997.

Analyse θπ/2|\theta-\pi/2|.

cos(θ)=sin(π2θ)π2θ    π2θcos(θ)4C4n=C5n.\cos(\theta)=\sin(\frac{\pi}{2}-\theta)\approx \frac{\pi}{2}-\theta\\\\ \implies |\frac{\pi}{2}-\theta|\approx |\cos(\theta)|\leq \frac{4\sqrt{C_4}}{\sqrt{n}}=\frac{C_5}{\sqrt{n}}.

Homework 3

Problem 1 (Small ball probabilities)

Let X1,,XNX_1,\cdots , X_N be non-negative independent random variables. Assume that the PDF (probability density function) of each XiX_i is uniformly bounded by 11.

(a)

Check that each XiX_i satisfies

P{Xiε}ε for all ε>0.\mathbb{P}\lbrace X_i\leq \varepsilon\rbrace\leq \varepsilon\text{ for all }\varepsilon > 0.

P{Xiε}=0εfXi(x)dx0εdx=ε.\mathbb{P}\lbrace X_i\leq \varepsilon\rbrace=\int_0^{\varepsilon} f_{X_i}(x){\rm d}x\leq \int_0^{\varepsilon} {\rm d}x=\varepsilon.

(b)

Show that the MGF (moment generating function) of each XiX_i satisfies

Eexp(tXi)1t for all t>0.\mathbb{E}\exp(−tX_i)\leq \frac{1}{t}\text{ for all }t > 0.

Eexp(tXi)=0exp(tx)P{Xi=x}dx0exp(tx)dx=1t.\mathbb{E}\exp(-tX_i)=\int_0^{\infty} \exp(-tx)\mathbb{P}\lbrace X_i=x\rbrace{\rm d}x\leq \int_0^{\infty}\exp(-tx){\rm d}x=\frac{1}{t}.

(c)

Deduce that averaging increases the strength of (a) dramatically. Namely, show that

P{1Ni=1NXiε}(Cε)N for all ε>0.\mathbb{P}\lbrace \frac{1}{N}\sum_{i=1}^N X_i\leq \varepsilon\rbrace\leq (C\varepsilon)^N\text{ for all } \varepsilon > 0.

Use Chernoff’s inequality, let μ=Ei=1NXi\mu=\mathbb{E}\sum_{i=1}^N X_i,

P{i=1NXiNε}mint<0etNεE[eti=1NXi]=mint>0i=1NE[etXi]etNεmint>0tNetNε.\mathbb{P}\lbrace \sum_{i=1}^N X_i\leq N\varepsilon\rbrace\leq \min_{t<0}e^{-tN\varepsilon}\mathbb{E}[e^{t\sum_{i=1}^N X_i}]=\min_{t>0}\prod_{i=1}^N \mathbb{E}[e^{-tX_i}]e^{tN\varepsilon}\leq \min_{t>0}t^{-N}e^{tN\varepsilon}.

Let f(t)=tNetNεf(t)=t^{-N}e^{tN\varepsilon}, then

dlnf(t)dt=N1t+Nε=0    t=1ε    P{1Ni=1NXiε}(eε)N.\frac{\mathrm{d}\ln f(t)}{\mathrm{d}t}=-N\frac{1}{t}+N\varepsilon=0\implies t=\frac{1}{\varepsilon}\\\\ \implies \mathbb{P}\lbrace \frac{1}{N}\sum_{i=1}^N X_i\leq \varepsilon\rbrace\leq (e\varepsilon)^N.

Problem 2 (Boosting randomized algorithms)

Imagine we have an algorithm for solving some decision problem (for example, the algorithm may answer the question: “is there a helicopter in a given image?”). Suppose each time the algorithm runs, it gives the correct answer independently with probability 12+δ\frac{1}{2} + \delta with some small margin δ(0,1/2)\delta\in (0, 1/2). In other words, the algorithm does just marginally better than a random guess.

To improve the confidence, the following “boosting” procedure is often used. Run the algorithm NN times, and take the majority vote. Show that the new algorithm gives the correct answer with probability at least 12exp(cδ2N)1 − 2\exp(−c\delta^2 N). This is good because the confidence rapidly (exponentially!) approaches 11 as NN grows.

Let P{Xi=0}=δ+12,P{Xi=1}=12δ\mathbb{P}\lbrace X_i=0\rbrace=\delta+\frac{1}{2}, \mathbb{P}\lbrace X_i=1\rbrace=\frac{1}{2}-\delta, μ=Ei=1NXi=N(12δ)\mu=\mathbb{E}\sum_{i=1}^N X_i=N(\frac{1}{2}-\delta).

Use Chernoff’s inequality and Taylor Expansion,

P{SNN2}eμ(2eμN)N/2=eN(1/2δ)(e(12δ))N/2=eNδ(12δ)N/2=(eδ(12δ)1/2)N=eN(δ+12ln(12δ))eN(δ+12(2δ2δ2))=eNδ2    P{SN<N2}1eNδ2.\mathbb{P}\lbrace S_N\geq \frac{N}{2}\rbrace\leq e^{-\mu}(\frac{2e\mu}{N})^{N/2}=e^{-N(1/2-\delta)}(e(1-2\delta))^{N/2}=e^{N\delta}(1-2\delta)^{N/2}=(e^\delta(1-2\delta)^{1/2})^N\\\\ =e^{N(\delta+\frac{1}{2}\ln(1-2\delta))}\leq e^{N(\delta+\frac{1}{2}(-2\delta-2\delta^2))}=e^{-N\delta^2}\\\\ \implies \mathbb{P}\lbrace S_N<\frac{N}{2}\rbrace\geq 1-e^{-N\delta^2}.

Problem 3 (Optimality of Chernoff’s inequality)

(a)

Prove the following useful inequalities for binomial coefficients:

(nm)m(nm)k=0m(nk)(enm)m\left(\frac{n}{m}\right)^m\leq \binom{n}{m}\leq \sum_{k=0}^m \binom{n}{k}\leq \left(\frac{en}{m}\right)^m

for all integers m[1,n]m\in [1, n].

The first inequality:

nmnimi, for all i=0,,m1    (nm)mnmn1m1nm+11=(nm).\frac{n}{m}\leq \frac{n-i}{m-i}, \text{ for all }i=0,\cdots,m-1\\\\ \implies \left(\frac{n}{m}\right)^m\leq \frac{n}{m}\frac{n-1}{m-1}\cdots\frac{n-m+1}{1}=\binom{n}{m}.

The second inequality: Obviously.

The third inequality:

k=0m(nk)(enm)m    k=0m(nk)(mn)mk=0mkk!.\sum_{k=0}^m \binom{n}{k}\leq \left(\frac{en}{m}\right)^m\iff \sum_{k=0}^m \binom{n}{k}\left(\frac{m}{n}\right)^m\leq \sum_{k=0}^{\infty} \frac{m^k}{k!}.

We have

(nk)(mn)mmkk!    n!mm(nk)!nmmk.\binom{n}{k}\left(\frac{m}{n}\right)^m\leq \frac{m^k}{k!}\iff \frac{n!m^m}{(n-k)!n^m}\leq m^{k}.

And

(ni)mnm    n!mm(nk)!nmn!mk(nk)!nkmk    k=0m(nk)(mn)mk=0mkk!    k=0m(nk)(enm)m.\frac{(n-i)m}{n}\leq m\implies \frac{n!m^m}{(n-k)!n^m}\leq \frac{n!m^k}{(n-k)!n^k}\leq m^k\implies \sum_{k=0}^m \binom{n}{k}\left(\frac{m}{n}\right)^m\leq \sum_{k=0}^{\infty} \frac{m^k}{k!}\\\\ \implies \sum_{k=0}^m \binom{n}{k}\leq \left(\frac{en}{m}\right)^m.

(b)

Show that Chernoff’s inequality is almost optimal. Namely, if SNS_N is a binomial random variable with mean μ\mu, that is SNBinom(N,μ/N)S_N\sim \text{Binom}(N, \mu/N), show that

P{SNt}eμ(μ/t)t\mathbb{P}\lbrace S_N\geq t\rbrace\geq e^{-\mu}(\mu/t)^t

for any integer t{1,,N}t\in \lbrace 1,\cdots , N\rbrace such that tμt\geq \mu.

Use Problem 3 (a).

P{SNt}=i=tN(Ni)(μ/N)i(1μ/N)Nii=tN(Ni)i(μ/N)i(1μ/N)Ni=i=tN(μ/i)i(1μ/N)Ni(μ/t)t(1μ/N)Nt(μ/t)t(NμN)Nμ(μ/t)teμ    (NμN)N(NμeN)μ    N(ln(Nμ)lnN)μ(ln(Nμ)1lnN)    (Nμ)ln(Nμ)+μlnN+μNlnN\mathbb{P}\lbrace S_N\geq t\rbrace=\sum_{i=t}^N \binom{N}{i}(\mu/N)^i(1-\mu/N)^{N-i}\\\\ \geq \sum_{i=t}^N (\frac{N}{i})^i(\mu/N)^i(1-\mu/N)^{N-i}=\sum_{i=t}^N (\mu/i)^i(1-\mu/N)^{N-i}\geq (\mu/t)^t(1-\mu/N)^{N-t}\\\\ \geq(\mu/t)^t\left(\frac{N-\mu}{N}\right)^{N-\mu}\geq (\mu/t)^te^{-\mu}\iff \left(\frac{N-\mu}{N}\right)^N\geq \left(\frac{N-\mu}{eN}\right)^{\mu}\\\\ \iff N(\ln (N-\mu)-\ln N)\geq \mu(\ln (N-\mu)-1-\ln N)\iff (N-\mu)\ln(N-\mu)+\mu \ln N+\mu\geq N\ln N

Let f(μ)=(Nμ)ln(Nμ)+μlnN+μf(\mu)=(N-\mu)\ln(N-\mu)+\mu\ln N+\mu, then

f(μ)=1ln(Nμ)+lnN+1=lnNln(Nμ)0    f(μ)f(0)=NlnN.f'(\mu)=-1-\ln(N-\mu)+\ln N+1=\ln N-\ln(N-\mu)\geq 0\\\\ \implies f(\mu)\geq f(0)=N\ln N.

Q.E.D.

Problem 4 (The lower tail in Chernoff’s inequality)

Let XiX_i be independent Bernoulli random variables with parameters pip_i . Consider their sum SN=i=1NXiS_N = \sum_{i=1}^N X_i and denote its mean by μ=ESN\mu = \mathbb{E} S_N . Then, for any t<μt <\mu, we have

P{SNt}eμ(eμt)t\mathbb{P}\lbrace S_N\leq t\rbrace\leq e^{-\mu}(\frac{e\mu}{t})^t

Use λ\lambda-MGF Method and Markov’s inequality.

P{SNt}=P{exp(λSN)exp(λt)}exp(λt)Eexp(λSN)=exp(λt)i=1NEexp(λXi)=exp(λt)i=1N(1pi+pieλ)=exp(λt)i=1N(pi(eλ1)+1)exp(λt)i=1Nexp(pi(eλ1))=exp(λt)exp(μ(eλ1))=exp(λt+μeλμ).\mathbb{P}\lbrace S_N\leq t\rbrace=\mathbb{P}\lbrace \exp(-\lambda S_N)\geq \exp(-\lambda t)\rbrace\leq \exp(\lambda t)\mathbb{E}\exp(-\lambda S_N)\\\\ =\exp(\lambda t)\prod_{i=1}^N \mathbb{E}\exp(-\lambda X_i)=\exp(\lambda t)\prod_{i=1}^N (1-p_i+p_ie^{-\lambda})\\\\ =\exp(\lambda t)\prod_{i=1}^N (p_i(e^{-\lambda}-1)+1)\leq \exp(\lambda t)\prod_{i=1}^N \exp(p_i(e^{-\lambda}-1))\\\\ =\exp(\lambda t)\exp(\mu(e^{-\lambda}-1))=\exp(\lambda t+\mu e^{-\lambda}-\mu).

Let f(λ)=λt+μeλf(\lambda)=\lambda t+\mu e^{-\lambda}, then

f(λ)=tμeλ=0    λ=ln(μ/t)    minf(λ)=f(ln(μ/t))=tln(μ/t)+t.f'(\lambda)=t-\mu e^{-\lambda}=0\implies \lambda=\ln(\mu/t)\implies \min f(\lambda)=f(\ln(\mu/t))=t\ln(\mu/t)+t.

Therefore,

exp(λt+μeλμ)exp(tln(μ/t)+tμ)=eμ(eμt)t.\exp(\lambda t+\mu e^{-\lambda}-\mu)\leq \exp(t\ln(\mu/t)+t-\mu)=e^{-\mu}\left(\frac{e\mu}{t}\right)^t.

Homework 4

Problem 1 (Irregularity of sparse random graphs)

Consider a random graph G(N,p)G(N, p) whose expected degree satisfies d=(N1)p<clogNd = (N − 1)p < c \log N. Show that with probability at least 0.90.9, the exists at least one vertex with degree at least 10d10d.

The uncomfortable point about the lower bound is that the did_i’s are actually not independent of each other. When we loosen the upper bound, we can ignore the independence and take a higher upper bound. However, the lower bound will be restricted by the lack of independence. Therefore, the core idea is to find a set that can be independent.

Let VV' be a set of vertices with no edges between any two vertices. Note that d=(n1)p=o(logn)d = (n - 1)p = o(\log n), so p=o(logn/n)p = o(\log n/n). Thus,

P{ u,vV:(u,v)G}pV2=o(logn/n)V2.\mathbb{P}\lbrace \exists\ u,v\in V':(u,v)\in G\rbrace\leq p|V'|^2=o(\log n/n)|V'|^2.

Therefore, setting V=n1/3|V'| = n^{1/3} ensures that there are no edges between any two vertices in VV' as nn \to \infty.

At this point, we calculate

P{ iV:di=10d}=1P{ iV:di10d}=1P{di10d}V1P{di10d}n.\mathbb{P}\lbrace \exists\ i\in V':d_i=10d\rbrace=1-\mathbb{P}\lbrace \forall\ i\in V':d_i\neq 10d\rbrace=1-\mathbb{P}\lbrace d_i\neq 10d\rbrace^{|V'|}\geq 1-\mathbb{P}\lbrace d_i\neq 10d\rbrace^n.

Moreover,

P{di=10d}=12π10ded(ed10d)10d=exp(d10dlog(10d)+10d+10dlogd12log(20πd))=exp(9d10dlog10C112logd)=exp(C2dC3logdC1)>exp(C2clognC3logcC3loglognC1)>nc.\begin{align*} \mathbb{P}\lbrace d_i=10d\rbrace&=\frac{1}{\sqrt{2\pi 10d}}e^{-d}\left(\frac{ed}{10d}\right)^{10d}=\exp\left(-d-10d\log(10d)+10d+10d\log d-\frac{1}{2}\log(20\pi d)\right)\\ &=\exp\left(9d-10d\log 10-C_1-\frac{1}{2}\log d\right)=\exp\left(-C_2d-C_3\log d-C_1\right)\\ &>\exp\left(-C_2c\log n-C_3\log c-C_3\log\log n-C_1\right)>n^{-c}. \end{align*}

Substituting this into the previous expression completes the proof.

Problem 2 (No too high degrees)

(a)

Check that the classical form of Chernoff’s inequality (Theorem 2.3.1 in the book) implies the following simple and useful bound:

P{St}et for any te2μ.\mathbb{P}\lbrace S\geq t\rbrace \leq e^{-t}\text{ for any }t\geq e^2\mu.

where SS is any binomial random variable with mean μ\mu.

From Chernoff’s inequality, we have

P{St}eμ(eμt)t, for any t>μ.\mathbb{P}\lbrace S\geq t\rbrace \leq e^{-\mu}(\frac{e\mu}{t})^t,\text{ for any }t>\mu.

If we consider the case where te2μt\geq e^2\mu, then

eμ(eμt)t(eμe2μ)t=et.e^{-\mu}(\frac{e\mu}{t})^t\leq (\frac{e\mu}{e^2\mu})^t=e^{-t}.

(b)

Consider a random graph G(N,p)G(N, p) whose expected degree satisfies d=(N1)plogNd = (N − 1)p\leq \log N. Show that with probability at least 0.90.9, the following holds: the degrees of all vertices are bounded by ClogNC\log N.

Problem 3 (Discrepancy)

Put NN random points into the unit square [0,1]2[0, 1]^2 independently and according to the uniform distribution. Show that with probability at least 0.990.99, this set of points is good in the following sense.

For any axis-aligned rectangle I[0,1]2I\subset [0, 1]^2 we have

λINCλINlogNNIλIN+CλINlogN\lambda_I N-C\sqrt{\lambda_IN\log N}\leq N_I\leq \lambda_IN+C\sqrt{\lambda_IN\log N}

as long as the left hand side is nonnegative and NC1N\geq C_1.

Here NIN_I denotes the number of points in the rectangle II, λI\lambda_I denotes the area of II, and CC is an absolute constant.