This homework should be turned in as a .pdf of the output along with an .Rmd file via uploading under assignments on Sakai. Be sure your .Rmd code is fully reproducible (if not, points may be deducted). Remember the 11 point or larger font size requirement!

1. OLS Estimation

  1. Using the scalar formulation of the ANOVA model \(y_{ij} \sim N\left(\mu_j,\sigma^2 \right)\), show that \(\widehat{\mu}_{OLS}=(\overline{y}_1,\cdots,\overline{y}_J)\), where \(\overline{y}_j\) is the sample mean in group \(j\).

  2. Using the scalar formulation of the ANOVA model \(y_{ij} \sim N\left(\mu+\alpha_j,\sigma^2 \right)\) with the constraint \(\sum_j \alpha_j=0\), assume \(n_j=n\) and show

    1. \(\widehat{\mu}=\overline{y}_{\cdot \cdot}\), where \(\overline{y}_{\cdot \cdot}\) is the grand mean over all observations

    2. \(\widehat{\mu}=\frac{1}{J}\sum_j \widehat{\mu}_j\)

    3. \(\widehat{\alpha}_j=\widehat{\mu}_j-\widehat{\mu}=\overline{y}_{j}-\overline{y}_{\cdot \cdot}\)

  3. Write the model in (a) in matrix form assuming you have 3 groups with 3 observations each and show that \((X'X)^{-1}X'y\) yields the same estimates as in the scalar formulation. Note the parameter vector corresponding to this model should contain the three mean parameters \((\mu_1, \mu_2, \mu_3)\).

2. Game of Thrones

The handling of female characters in the American series Game of Thrones, and indeed whether it is feminist or mysogynistic, has been hotly debated. The dataset gotscreen.RData contains information on the number of seconds of screentime for members of each gender in each episode of seven seasons of Game of Thrones. Using descriptive statistics as well as a two-way ANOVA model with interaction, explore whether an actorโ€™s screentime differs by gender (male, female, or unspecified) and whether there are any differences in potential gender effects across the 7 seasons of the show. Your answer should include the following details.

  1. Exploratory data analysis of screentime, including average screentime per gender as well as total screentime for all actors of a given gender.
  2. A clear specification of the model using an equation, including clear specification of any modeling assumptions.
  3. A clearly-labeled table providing point and interval estimates for each parameter in the linear predictor of your model.
  4. Clear specification of any hypothesis tests or other inferential techniques used to evaluate the questions posed above.
  5. Evidence of adequacy of model fit and evaluation of suitability of any assumptions.
  6. Clear description of results in language accessible to the average fan of the show, including graphical displays as appropriate. Comment on any insights that may differ between exploratory data analysis and analysis using the ANOVA model, along with reasons why these insights may differ.

3. Contrasts

Suppose you fit an ANOVA model to responses collected from J=3 groups. Consider the following two ANOVA model parameterizations.

\[\begin{eqnarray} y_{ij}&=&\mu_j + \varepsilon_{ij} ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ (1) \\ y_{ij}&=& \mu + \alpha_1I(j=1)+\alpha_2I(j=2)+\varepsilon_{ij} ~~~~ (2) \end{eqnarray}\]
  1. Find the linear combinations of parameters in model (2) that are equivalent to \(\mu_1-\mu_2\), \(\mu_1-\mu_3\), and \(\mu_2-\mu_3\) in model (1).

  2. Show that the estimates of these mean differences are identical regardless of the coding scheme (1) or (2) used, either theoretically or by analyzing the Game of Thrones data using a one-way ANOVA model with gender as the only predictor.

4. (Required for 610, extra credit for 410.)

Prove or disprove. Suppose \(X\) and \(Y\) each follow univariate normal distributions and that \(\text{Cov}(X,Y)=0\). Then \(X\) is independent of \(Y\).