Official website.

Homework 1

Problem 1 (Expectation as the optimal estimator)

Let XX be a random variable with finite expectation. Show that the function f(a)=E(Xa)2f(a) = \mathbb{E}(X − a)^2 is minimized at a=EXa = \mathbb{E} X.

E(Xa)2=EX22aEX+a2=(aEX)2.\mathbb{E}(X-a)^2=\mathbb{E}X^2-2a\mathbb{E}X+a^2=(a-\mathbb{E}X)^2.

Problem 2 (Expectation of random vectors)

Let XX and YY be random vectors in Rn\mathbb{R}^n.

(a)

(Linearity) Prove that for any constants a,bRa, b\in \mathbb{R}, we have

E(aX+bY)=aE(X)+bE(Y).\mathbb{E}(aX + bY ) = a \mathbb{E}(X) + b \mathbb{E}(Y ).

E(aX+bY)=(aE(Xi)+bE(Yi))i=1n=aE(X)+bE(Y).\mathbb{E}(aX+bY)=(a\mathbb{E}(X_i)+b\mathbb{E}(Y_i))_{i=1}^n=a\mathbb{E}(X)+b\mathbb{E}(Y).

(b)

(Multiplicativity) Prove that if XX and YY are independent, then

EX,Y=EX,EY.\mathbb{E}\langle X, Y\rangle = \langle \mathbb{E} X, \mathbb{E} Y\rangle.

This generalizes the identity E(XY)=(EX)(EY)\mathbb{E}(XY ) = (\mathbb{E} X)(\mathbb{E} Y ) for independent random variables.

EX,Y=E(i=1nXiYi)=i=1nEXiEYi=EX,EY.\mathbb{E}\langle X, Y\rangle=\mathbb{E}(\sum_{i=1}^n X_iY_i)=\sum_{i=1}^n \mathbb{E}X_i\mathbb{E}Y_i=\langle \mathbb{E}X,\mathbb{E}Y\rangle.

Problem 3 (Expectation as optimal estimator: random vectors)

Let XX be a random vector with finite expectation. Show that the function f(a)=EXa22f(a) = \mathbb{E}\Vert X − a\Vert_2^2 is minimized at a=EXa = \mathbb{E} X.

EXa22=EXa,Xa=i=1nE(Xiai)2    a=EX.\mathbb{E}\Vert X-a\Vert^2_2=\mathbb{E}\langle X-a,X-a\rangle=\sum_{i=1}^n \mathbb{E}(X_i-a_i)^2\implies a=\mathbb{E}X.

Problem 4 (A variance-like identity)

For a random vector in Rn\mathbb{R}^n , define V(X)=EXEX22V (X) = \mathbb{E}\Vert X − \mathbb{E} X\Vert_2^2 .

(a)

Prove that

V(X)=EX22EX22.V (X) = \mathbb{E}\Vert X\Vert_2^2 −\Vert \mathbb{E} X\Vert_2^2.

V(X)=EXEX22=EXEX,XEX=Ei=1n(XiEXi)2=i=1nEXi2(EXi)2=EX22EX22.\begin{align*} V(X)&=\mathbb{E}\Vert X-\mathbb{E}X\Vert^2_2=\mathbb{E}\langle X-\mathbb{E}X,X-\mathbb{E}X\rangle=\mathbb{E}\sum_{i=1}^n (X_i-\mathbb{E}X_i)^2\\\\ &=\sum_{i=1}^n \mathbb{E}X_i^2-(\mathbb{E}X_i)^2=\mathbb{E}\Vert X\Vert^2_2-\Vert\mathbb{E}X\Vert_2^2. \end{align*}

(b)

Prove that

V(X)=12EXX022,V (X) = \frac{1}{2}\mathbb{E}\Vert X − X_0\Vert_2^2 ,

where X0X_0 is an independent copy of XX.

12EXX22=12i=1nE(XiXi)2=i=1nEXi2E(XiXi)=EX22EX22.\frac{1}{2}\mathbb{E}\Vert X-X'\Vert_2^2=\frac{1}{2}\sum_{i=1}^n\mathbb{E}(X_i-X'_i)^2=\sum_{i=1}^n\mathbb{E}X_i^2-\mathbb{E}(X_iX'_i)=\mathbb{E}\Vert X\Vert^2_2-\Vert\mathbb{E}X\Vert_2^2.

Problem 5 (Additivity of variance)

(a)

Check that if XX and YY are independent random vectors in Rn\mathbb{R}^n with zero means, then

EX+Y22=EX22+EY22.\mathbb{E}\Vert X+Y\Vert_2^2=\mathbb{E}\Vert X\Vert_2^2+\mathbb{E}\Vert Y\Vert_2^2.

EX+Y22=i=1nE(Xi+Yi)2=EX22+EY22.\mathbb{E}\Vert X+Y\Vert^2_2=\sum_{i=1}^n \mathbb{E}(X_i+Y_i)^2=\mathbb{E}\Vert X\Vert_2^2+\mathbb{E}\Vert Y\Vert_2^2.

(b)

Prove that if random vectors X1,,XkX_1,\cdots , X_k in Rn\mathbb{R}^n are independent, then

V(X1++Xk)=V(X1)++V(Xk),V (X_1 +\cdots + X_k) = V (X_1) +\cdots + V (X_k),

where V()V (\cdot) is defined as V(X)=EXEX22V (X) = \mathbb{E}\Vert X − \mathbb{E} X\Vert_2^2.

V(X1+X2++Xk)=Ei=1kXi22Ei=1kXi22=Ei=1n(j=1kXj,i)2i=1n(j=1kEXj,i)2=i=1n(j=1kEXj,i2+2j=1kl=j+1kEXj,iEXl,ij=1kE2Xj,i2j=1kl=j+1kEXj,iEXl,i)=j=1ki=1nEXj,i2E2Xj,i=j=1kV(Xj).\begin{align*} V(X_1+X_2+\cdots+X_k)&=\mathbb{E}\Vert \sum_{i=1}^k X_i\Vert_2^2-\Vert \mathbb{E}\sum_{i=1}^k X_i\Vert_2^2\\\\ &=\mathbb{E}\sum_{i=1}^n (\sum_{j=1}^k X_{j,i})^2-\sum_{i=1}^n (\sum_{j=1}^k \mathbb{E}X_{j,i})^2\\\\ &=\sum_{i=1}^n (\sum_{j=1}^k\mathbb{E}X_{j,i}^2+2\sum_{j=1}^k\sum_{l=j+1}^k \mathbb{E}X_{j,i}\mathbb{E}X_{l,i}-\sum_{j=1}^k \mathbb{E}^2X_{j,i}-2\sum_{j=1}^k\sum_{l=j+1}^k \mathbb{E}X_{j,i}\mathbb{E}X_{l,i})\\\\ &=\sum_{j=1}^k \sum_{i=1}^n \mathbb{E}X_{j,i}^2-\mathbb{E}^2X_{j,i}=\sum_{j=1}^k V(X_j). \end{align*}

Homework 2

Problem 1 (Balancing vectors)

Let x1,,xnx_1,\cdots , x_n be an arbitrary set of unit vectors in Rn\mathbb{R}^n . Prove that there exist ε1,,εn{1,1}\varepsilon_1,\cdots , \varepsilon_n\in \lbrace −1, 1\rbrace such that

ε1x1++εnxn2n.\Vert \varepsilon_1x_1+\cdots+\varepsilon_nx_n\Vert_2\leq \sqrt{n}.

Choose εiX\varepsilon_i\sim X, where P[X=1]=P[X=1]=12\mathbb{P}[X=1]=\mathbb{P}[X=-1]=\frac{1}{2} and E[X]=0\mathbb{E}[X]=0. Compute

Ei=1nεixi22=i=1nEϵixi22=n     εi, s.t. i=1nεixi22n, i.e. i=1nεixi2n.\mathbb{E}\Vert \sum_{i=1}^n \varepsilon_ix_i\Vert_2^2=\sum_{i=1}^n \mathbb{E}\Vert\epsilon_ix_i\Vert_2^2=n\\\\ \implies \exists\ \varepsilon_i, \text{ s.t. }\Vert\sum_{i=1}^n \varepsilon_ix_i\Vert_2^2\leq n,\text{ i.e. }\Vert\sum_{i=1}^n \varepsilon_ix_i\Vert_2\leq \sqrt{n}.

Problem 2 (Random vectors have norm n\approx \sqrt{n})

Let XX be a standard normal random vector in Rn\mathbb{R}^n , where nC1n\geq C_1.

(a)

Check that

EX22=n and Var(X22)=2n.\mathbb{E}\Vert X\Vert_2^2 = n \text{ and }\text{Var}(\Vert X\Vert_2^2) = 2n.

EX22=i=1nEXi22=n.\mathbb{E}\Vert X\Vert_2^2=\sum_{i=1}^n\mathbb{E}\Vert X_i\Vert_2^2=n.

Var(X22)=EX24(EX22)2=E[(i=1nXi2)2]n2=E[i=1nXi4+2i<jXi2Xj2]n2=3n+2n(n1)/2n2=2n.\text{Var}(\Vert X\Vert_2^2)=\mathbb{E}\Vert X\Vert_2^4-(\mathbb{E}\Vert X\Vert_2^2)^2\\\\ =\mathbb{E}[(\sum_{i=1}^n X_i^2)^2]-n^2=\mathbb{E}[\sum_{i=1}^n X_i^4+2\sum_{i<j}X_i^2X_j^2]-n^2\\\\ =3n+2n(n-1)/2-n^2=2n.

(b)

Conclude that

X22nC2n with probability at least 0.99.|\Vert X\Vert_2^2 − n|\leq C_2 \sqrt{n}\text{ with probability at least }0.99.

Chebshev’s inequality.

P[X22nC2n]=1P[X22μC2n]12C22\mathbb{P}[|\Vert X\Vert_2^2-n|\leq C_2\sqrt{n}]=1-\mathbb{P}[|\Vert X\Vert_2^2-\mu|\geq C_2\sqrt{n}]\\\\ \geq 1-\frac{2}{C_2^2}

Let C2=200C_2=\sqrt{200}.

(c)

Deduce that

12nX22n with probability at least 0.99.\frac{1}{2}\sqrt{n}\leq \Vert X\Vert_2\leq 2\sqrt{n}\text{ with probability at least }0.99.

Use conclusion from (b), then

nC2nX2n+C2n, with probability at least 0.99\sqrt{n-C_2\sqrt{n}}\leq \Vert X\Vert_2\leq \sqrt{n+C_2\sqrt{n}}, \text{ with probability at least }0.99

Upper Bound:     n(C23)2\implies n\geq (\frac{C_2}{3})^2.

Lower Bound:     n(4C23)2\implies n\geq (\frac{4C_2}{3})^2.

(d)

Prove the tighter bound:

X2nC3 with probability at least 0.99.|\Vert X\Vert_2 − \sqrt{n}|\leq C_3\text{ with probability at least }0.99.

X2n=X22nX2+nC2nn=C2.|\Vert X\Vert_2-\sqrt{n}|=\frac{|\Vert X\Vert_2^2-n|}{|\Vert X\Vert_2+\sqrt{n}|}\leq \frac{C_2\sqrt{n}}{\sqrt{n}}=C_2.

Problem 3 (Random vectors are almost orthogonal)

Let XX and YY be independent standard normal random vectors in Rn\mathbb{R}^n , where nC3n\geq C_3.

(a)

Check that

EX,Y2=n.\mathbb{E}\langle X, Y\rangle^2 = n.

EX,Y2=E[(i=1nXiYi)2]=E[i=1nXi2Yi2+2i<jXiYiXjYj]=n.\mathbb{E}\langle X,Y\rangle^2=\mathbb{E}[(\sum_{i=1}^n X_iY_i)^2]=\mathbb{E}[\sum_{i=1}^n X_i^2Y_i^2+2\sum_{i<j}X_iY_iX_jY_j]=n.

(b)

Deduce that

X,Y2C4n with probability at least 0.99.\langle X, Y\rangle^2\leq C_4n\text{ with probability at least }0.99.

Markov’s inequality.

P[X,Y2C4n]=1P[X,Y2C4n]nC4n=1C4.\mathbb{P}[\langle X,Y\rangle^2\leq C_4n]=1-\mathbb{P}[\langle X,Y\rangle^2\geq C_4n]\geq \frac{n}{C_4n}=\frac{1}{C_4}.

Let C4=100C_4=100.

(c)

Denote by θ\theta the angle between the vectors XX and YY , as shown in the figure below. Prove that

θπ2C5n with probability at least 0.97.|\theta−\frac{\pi}{2}|\leq \frac{C_5}{\sqrt{n}}\text{ with probability at least }0.97.

Use Problem 3(b) and Problem 2©.

P[cos2(θ)=X,Y2X22Y22C4n(n2)2(n2)2=16C4n]13×0.001=0.997    P[cos(θ)4C4n]0.997.\mathbb{P}[\cos^2(\theta)=\frac{\langle X,Y\rangle^2}{\Vert X\Vert_2^2\Vert Y\Vert_2^2}\leq \frac{C_4n}{(\frac{\sqrt{n}}{2})^2(\frac{\sqrt{n}}{2})^2}=\frac{16C_4}{n}]\geq 1-3\times0.001=0.997\\\\ \implies \mathbb{P}[|\cos(\theta)|\leq \frac{4\sqrt{C_4}}{\sqrt{n}}]\geq 0.997.

Analyse θπ/2|\theta-\pi/2|.

cos(θ)=sin(π2θ)π2θ    π2θcos(θ)4C4n=C5n.\cos(\theta)=\sin(\frac{\pi}{2}-\theta)\approx \frac{\pi}{2}-\theta\\\\ \implies |\frac{\pi}{2}-\theta|\approx |\cos(\theta)|\leq \frac{4\sqrt{C_4}}{\sqrt{n}}=\frac{C_5}{\sqrt{n}}.

Homework 3

Problem 1 (Small ball probabilities)

Let X1,,XNX_1,\cdots , X_N be non-negative independent random variables. Assume that the PDF (probability density function) of each XiX_i is uniformly bounded by 11.

(a)

Check that each XiX_i satisfies

P{Xiε}ε for all ε>0.\mathbb{P}\lbrace X_i\leq \varepsilon\rbrace\leq \varepsilon\text{ for all }\varepsilon > 0.

P{Xiε}=0εfXi(x)dx0εdx=ε.\mathbb{P}\lbrace X_i\leq \varepsilon\rbrace=\int_0^{\varepsilon} f_{X_i}(x){\rm d}x\leq \int_0^{\varepsilon} {\rm d}x=\varepsilon.

(b)

Show that the MGF (moment generating function) of each XiX_i satisfies

Eexp(tXi)1t for all t>0.\mathbb{E}\exp(−tX_i)\leq \frac{1}{t}\text{ for all }t > 0.

Eexp(tXi)=0exp(tx)P{Xi=x}dx0exp(tx)dx=1t.\mathbb{E}\exp(-tX_i)=\int_0^{\infty} \exp(-tx)\mathbb{P}\lbrace X_i=x\rbrace{\rm d}x\leq \int_0^{\infty}\exp(-tx){\rm d}x=\frac{1}{t}.

(c)

Deduce that averaging increases the strength of (a) dramatically. Namely, show that

P{1Ni=1NXiε}(Cε)N for all ε>0.\mathbb{P}\lbrace \frac{1}{N}\sum_{i=1}^N X_i\leq \varepsilon\rbrace\leq (C\varepsilon)^N\text{ for all } \varepsilon > 0.

Use Chernoff’s inequality, let μ=Ei=1NXi\mu=\mathbb{E}\sum_{i=1}^N X_i,

P{i=1NXiNε}mint<0etNεE[eti=1NXi]=mint>0i=1NE[etXi]etNεmint>0tNetNε.\mathbb{P}\lbrace \sum_{i=1}^N X_i\leq N\varepsilon\rbrace\leq \min_{t<0}e^{-tN\varepsilon}\mathbb{E}[e^{t\sum_{i=1}^N X_i}]=\min_{t>0}\prod_{i=1}^N \mathbb{E}[e^{-tX_i}]e^{tN\varepsilon}\leq \min_{t>0}t^{-N}e^{tN\varepsilon}.

Let f(t)=tNetNεf(t)=t^{-N}e^{tN\varepsilon}, then

dlnf(t)dt=N1t+Nε=0    t=1ε    P{1Ni=1NXiε}(eε)N.\frac{\mathrm{d}\ln f(t)}{\mathrm{d}t}=-N\frac{1}{t}+N\varepsilon=0\implies t=\frac{1}{\varepsilon}\\\\ \implies \mathbb{P}\lbrace \frac{1}{N}\sum_{i=1}^N X_i\leq \varepsilon\rbrace\leq (e\varepsilon)^N.

Problem 2 (Boosting randomized algorithms)

Imagine we have an algorithm for solving some decision problem (for example, the algorithm may answer the question: “is there a helicopter in a given image?”). Suppose each time the algorithm runs, it gives the correct answer independently with probability 12+δ\frac{1}{2} + \delta with some small margin δ(0,1/2)\delta\in (0, 1/2). In other words, the algorithm does just marginally better than a random guess.

To improve the confidence, the following “boosting” procedure is often used. Run the algorithm NN times, and take the majority vote. Show that the new algorithm gives the correct answer with probability at least 12exp(cδ2N)1 − 2\exp(−c\delta^2 N). This is good because the confidence rapidly (exponentially!) approaches 11 as NN grows.

Let P{Xi=0}=δ+12,P{Xi=1}=12δ\mathbb{P}\lbrace X_i=0\rbrace=\delta+\frac{1}{2}, \mathbb{P}\lbrace X_i=1\rbrace=\frac{1}{2}-\delta, μ=Ei=1NXi=N(12δ)\mu=\mathbb{E}\sum_{i=1}^N X_i=N(\frac{1}{2}-\delta).

Use Chernoff’s inequality and Taylor Expansion,

P{SNN2}eμ(2eμN)N/2=eN(1/2δ)(e(12δ))N/2=eNδ(12δ)N/2=(eδ(12δ)1/2)N=eN(δ+12ln(12δ))eN(δ+12(2δ2δ2))=eNδ2    P{SN<N2}1eNδ2.\mathbb{P}\lbrace S_N\geq \frac{N}{2}\rbrace\leq e^{-\mu}(\frac{2e\mu}{N})^{N/2}=e^{-N(1/2-\delta)}(e(1-2\delta))^{N/2}=e^{N\delta}(1-2\delta)^{N/2}=(e^\delta(1-2\delta)^{1/2})^N\\\\ =e^{N(\delta+\frac{1}{2}\ln(1-2\delta))}\leq e^{N(\delta+\frac{1}{2}(-2\delta-2\delta^2))}=e^{-N\delta^2}\\\\ \implies \mathbb{P}\lbrace S_N<\frac{N}{2}\rbrace\geq 1-e^{-N\delta^2}.

Problem 3 (Optimality of Chernoff’s inequality)

(a)

Prove the following useful inequalities for binomial coefficients:

(nm)m(nm)k=0m(nk)(enm)m\left(\frac{n}{m}\right)^m\leq \binom{n}{m}\leq \sum_{k=0}^m \binom{n}{k}\leq \left(\frac{en}{m}\right)^m

for all integers m[1,n]m\in [1, n].

The first inequality:

nmnimi, for all i=0,,m1    (nm)mnmn1m1nm+11=(nm).\frac{n}{m}\leq \frac{n-i}{m-i}, \text{ for all }i=0,\cdots,m-1\\\\ \implies \left(\frac{n}{m}\right)^m\leq \frac{n}{m}\frac{n-1}{m-1}\cdots\frac{n-m+1}{1}=\binom{n}{m}.

The second inequality: Obviously.

The third inequality:

k=0m(nk)(enm)m    k=0m(nk)(mn)mk=0mkk!.\sum_{k=0}^m \binom{n}{k}\leq \left(\frac{en}{m}\right)^m\iff \sum_{k=0}^m \binom{n}{k}\left(\frac{m}{n}\right)^m\leq \sum_{k=0}^{\infty} \frac{m^k}{k!}.

We have

(nk)(mn)mmkk!    n!mm(nk)!nmmk.\binom{n}{k}\left(\frac{m}{n}\right)^m\leq \frac{m^k}{k!}\iff \frac{n!m^m}{(n-k)!n^m}\leq m^{k}.

And

(ni)mnm    n!mm(nk)!nmn!mk(nk)!nkmk    k=0m(nk)(mn)mk=0mkk!    k=0m(nk)(enm)m.\frac{(n-i)m}{n}\leq m\implies \frac{n!m^m}{(n-k)!n^m}\leq \frac{n!m^k}{(n-k)!n^k}\leq m^k\implies \sum_{k=0}^m \binom{n}{k}\left(\frac{m}{n}\right)^m\leq \sum_{k=0}^{\infty} \frac{m^k}{k!}\\\\ \implies \sum_{k=0}^m \binom{n}{k}\leq \left(\frac{en}{m}\right)^m.

(b)

Show that Chernoff’s inequality is almost optimal. Namely, if SNS_N is a binomial random variable with mean μ\mu, that is SNBinom(N,μ/N)S_N\sim \text{Binom}(N, \mu/N), show that

P{SNt}eμ(μ/t)t\mathbb{P}\lbrace S_N\geq t\rbrace\geq e^{-\mu}(\mu/t)^t

for any integer t{1,,N}t\in \lbrace 1,\cdots , N\rbrace such that tμt\geq \mu.

Use Problem 3 (a).

P{SNt}=i=tN(Ni)(μ/N)i(1μ/N)Nii=tN(Ni)i(μ/N)i(1μ/N)Ni=i=tN(μ/i)i(1μ/N)Ni(μ/t)t(1μ/N)Nt(μ/t)t(NμN)Nμ(μ/t)teμ    (NμN)N(NμeN)μ    N(ln(Nμ)lnN)μ(ln(Nμ)1lnN)    (Nμ)ln(Nμ)+μlnN+μNlnN\mathbb{P}\lbrace S_N\geq t\rbrace=\sum_{i=t}^N \binom{N}{i}(\mu/N)^i(1-\mu/N)^{N-i}\\\\ \geq \sum_{i=t}^N (\frac{N}{i})^i(\mu/N)^i(1-\mu/N)^{N-i}=\sum_{i=t}^N (\mu/i)^i(1-\mu/N)^{N-i}\geq (\mu/t)^t(1-\mu/N)^{N-t}\\\\ \geq(\mu/t)^t\left(\frac{N-\mu}{N}\right)^{N-\mu}\geq (\mu/t)^te^{-\mu}\iff \left(\frac{N-\mu}{N}\right)^N\geq \left(\frac{N-\mu}{eN}\right)^{\mu}\\\\ \iff N(\ln (N-\mu)-\ln N)\geq \mu(\ln (N-\mu)-1-\ln N)\iff (N-\mu)\ln(N-\mu)+\mu \ln N+\mu\geq N\ln N

Let f(μ)=(Nμ)ln(Nμ)+μlnN+μf(\mu)=(N-\mu)\ln(N-\mu)+\mu\ln N+\mu, then

f(μ)=1ln(Nμ)+lnN+1=lnNln(Nμ)0    f(μ)f(0)=NlnN.f'(\mu)=-1-\ln(N-\mu)+\ln N+1=\ln N-\ln(N-\mu)\geq 0\\\\ \implies f(\mu)\geq f(0)=N\ln N.

Q.E.D.

Problem 4 (The lower tail in Chernoff’s inequality)

Let XiX_i be independent Bernoulli random variables with parameters pip_i . Consider their sum SN=i=1NXiS_N = \sum_{i=1}^N X_i and denote its mean by μ=ESN\mu = \mathbb{E} S_N . Then, for any t<μt <\mu, we have

P{SNt}eμ(eμt)t\mathbb{P}\lbrace S_N\leq t\rbrace\leq e^{-\mu}(\frac{e\mu}{t})^t

Use λ\lambda-MGF Method and Markov’s inequality.

P{SNt}=P{exp(λSN)exp(λt)}exp(λt)Eexp(λSN)=exp(λt)i=1NEexp(λXi)=exp(λt)i=1N(1pi+pieλ)=exp(λt)i=1N(pi(eλ1)+1)exp(λt)i=1Nexp(pi(eλ1))=exp(λt)exp(μ(eλ1))=exp(λt+μeλμ).\mathbb{P}\lbrace S_N\leq t\rbrace=\mathbb{P}\lbrace \exp(-\lambda S_N)\geq \exp(-\lambda t)\rbrace\leq \exp(\lambda t)\mathbb{E}\exp(-\lambda S_N)\\\\ =\exp(\lambda t)\prod_{i=1}^N \mathbb{E}\exp(-\lambda X_i)=\exp(\lambda t)\prod_{i=1}^N (1-p_i+p_ie^{-\lambda})\\\\ =\exp(\lambda t)\prod_{i=1}^N (p_i(e^{-\lambda}-1)+1)\leq \exp(\lambda t)\prod_{i=1}^N \exp(p_i(e^{-\lambda}-1))\\\\ =\exp(\lambda t)\exp(\mu(e^{-\lambda}-1))=\exp(\lambda t+\mu e^{-\lambda}-\mu).

Let f(λ)=λt+μeλf(\lambda)=\lambda t+\mu e^{-\lambda}, then

f(λ)=tμeλ=0    λ=ln(μ/t)    minf(λ)=f(ln(μ/t))=tln(μ/t)+t.f'(\lambda)=t-\mu e^{-\lambda}=0\implies \lambda=\ln(\mu/t)\implies \min f(\lambda)=f(\ln(\mu/t))=t\ln(\mu/t)+t.

Therefore,

exp(λt+μeλμ)exp(tln(μ/t)+tμ)=eμ(eμt)t.\exp(\lambda t+\mu e^{-\lambda}-\mu)\leq \exp(t\ln(\mu/t)+t-\mu)=e^{-\mu}\left(\frac{e\mu}{t}\right)^t.