# Bayesian Statistics

Wednesday, 4th August, 2021 EXAMINATION FOR THE DEGREES OF M.A., M.SCI. AND B.SC. (SCIENCE) Bayesian Statistics Level M (Summer) Course code: STATS 5100 This paper consists of 5 pages and contains 8 question(s). Candidates should attempt all questions. Question Question Question Question Question Question Question Question Total 1 2 3 4 5 6 7 8 2 4 20 10 20 10 6 8 80 marks marks marks marks marks marks marks marks marks The following material is made available to you: Statistical tables∗ Formula sheet Statistical Tables 2 hours is allowed for this exam under normal conditions. 4 hours is allowed for online exam. CONTINUED OVERLEAF/ 1 1. Suppose there are two parameters in a model which are strongly correlated in the posterior distribution. Explain what implications this has for a Gibbs sampler sampling these two parameters (you may use a sketch to illustrate your explanation if you wish). [2 MARKS] 2. A random sample of n = 50 values is drawn from the normal distribution with unknown mean θ and known variance σ 2 = 4. The average of the sampled values is ȳ = 22. Suppose your prior distribution for θ is normal with mean 25 and standard deviation 3. Calculate the 95% central posterior interval for θ. [4 MARKS] 3. Suppose yi , i = 1, . . . , n arise from n independent and identically distributed random variables that follow a distribution with probability mass function p(yi |θ) = Γ(yi + r) (1 − θ)r θyi , Γ(yi + 1)Γ(r) where yi = 0, 1, 2, . . . and r > 0 is a known integer. The aim here is to estimate parameter θ from the data y = (y1 , . . . , yn ). Note that the expectation of the distribution is rθ/(1 − θ). (a) Derive the posterior distribution of θ if a Beta distribution Be(α, β) is used as the prior, where α, β > 0. [3 MARKS] (b) An intuitive way to interpret the information contained in the Beta prior for θ is that the information is equivalent to that in an observation from a distribution of the same form as that above, but possibly with a different value of r. What is the value of this notional observation and what is the corresponding value of r? Specify the values using α and β. [2 MARKS] (c) Is the Beta distribution a conjugate prior? Explain. [2 MARKS] (d) Derive Jeffreys’ prior for θ. [5 MARKS] (e) Is the Jeffreys’ prior proper? Explain. [2 MARKS] (f) Is it fine to use the Jeffreys’ prior above for inference? Explain. [2 MARKS] (g) Derive the posterior predictive distribution p(ỹ|y) for new observation ỹ given that the Beta prior Be(α, β) is used. [4 MARKS] CONTINUED OVERLEAF/ 2 4. Across the top national European football divisions, each year, a few teams will complete a season undefeated. In 2021, 3 teams did this. You decide to model the number of such teams that manage this in a particular year yyear as a Poisson random variable with parameter λyear : yyear |λyear ∼ Poi(λyear ) independently across years. A Ga(α, β) prior is placed on λyear . (a) Obtain the posterior distribution for λ2021 |y2021 [2 MARKS] (b) To assign values for α and β in the prior, you decide to take an Empirical Bayes approach using data from previous years. The equivalent counts in the years 2015 – 2020 were 2,0,1,1,2 and 1 respectively. Show that the maximum likelihood ˆ of λyear is yyear . Calculate the mean and variance of the λ̂s across the estimator λyear years 2015-2020 and, by equating them to the mean and variance of the Ga(α, β) distribution, solve for α and β. [5 MARKS] (c) Using the estimated α and β values, and the result from part (a), calculate the posterior mean of λ2021 . [3 MARKS] 5. Consider the following hierarchical model for binary data: yij ∼ Bernoulli(λij ) logit(λij ) = ei ei ∼ N(µ, σ 2 ) µ ∼ N(0, σ02 ) σ 2 ∼ Inv-Gamma(α, β), where i = 1, . . . , n, j = 1, . . . , m and σ0 , α, β > 0 are known values. The probability density of σ 2 ∼ Inv-Gamma(α, β) is β β α −2(α+1) 2 σ exp − 2 , where σ 2 > 0. p(σ ) = Γ(α) σ (a) Are µ and σ 2 a priori dependent? [1 MARK] (b) Write down the joint posterior density p(µ, σ 2 , e1 , . . . , en |y). There is no need to calculate the normalising constant. [2 MARKS] (c) Derive the full conditional distributions that are needed for conducting Gibbs sampling to sample from the joint posterior distribution. Simplify the results to known distributions if possible. [6 MARKS] CONTINUED OVERLEAF/ 3 (d) Are the samples generated by Gibbs sampling independent? Explain. [2 MARKS] (e) Why do we need to discard some initial samples as burn-in? [2 MARKS] (f) Write down WinBUGS model code to run Gibbs sampling for the model. [4 MARKS] (g) To initialise the model, which nodes does one need to provide initial values for? [1 MARK] (h) Add some code into the WinBUGS mode code above to estimate the probability p(e1 > 0|y), and state how to get the estimated probability from the output. [2 MARKS] 6. Suppose θ > 0 is the parameter of interest of a given model, and a > 0 is the decided value for θ. Let p(θ|y) denote the posterior distribution of θ. Note that E[f (x)|y] = R f (x)p(x|y)dx. (a) Show that the Bayes action obtained by minimising the expected loss under loss function a a L1 (θ, a) = − log − 1 θ θ is 1 π y . a = 1/E θ [4 MARKS] (b) Show that the Bayes action obtained by minimising the expected loss under loss function 2 a−θ L2 (θ, a) = a is aπ = E [θ2 |y] . E [θ|y] [4 MARKS] (c) Suppose y = (0.77, 2.68, 0.85, 0.44, 1.10) consists of n = 5 independent observations from an exponential distribution: Exp(θ). A Gamma prior Ga(0.5, 1) is chosen for θ. Calculate aπ under the loss function L2 in part (b). There is no need to derive in details the posterior distribution of θ here. [2 MARKS] CONTINUED OVERLEAF/ 4 7. A set of four heating elements are put through testing for 500 hours. Two fail at 320 and 210 hours respectively. Assume the lifetime of these heating elements follows an Exponential distribution with a Gamma(4, 2500) prior for the rate of failure (in days). Find the probability that a fifth heating element has an expected lifetime longer than 750 hours. [6 MARKS] 8. Consider a Poisson model for the number of phone calls arriving in a call centre in a 10 minute period for n call centres. For call centre yi , we include the explanatory variable xi which indicates the number of operators working in the call centre. The likelihood is denoted as p(y|λ, x) = n Y (xi λ)yi i=1 yi ! P e−xi λ ∝ λ i yi −λ e P i xi We wish to consider a mixture of two Gamma distributions as the prior for the rate parameter λ: p (λ|α1 , β1 , α2 , β2 ) = ρGamma(α1 , β1 ) + (1 − ρ)Gamma(α2 , β2 ) Show that the resulting posterior is also a mixture of two Gamma distributions by deriving the posterior mixing rates and parameters. Make sure that the posterior is properly normalised. [8 MARKS] END OF QUESTION PAPER. 5

Purchase answer to see full attachment