class: center, middle, inverse, title-slide # ANOVA ##
### Andrew Farina ### 2020-06-22 --- layout: true background-image: url(img/hexlogo.png) background-size: 65px 75px background-position: 0.5% 1% --- class: inverse, center, middle .left[ # Understanding the Data * ### Loading the Data * ### Summary Statistics * ### Visualizing the Data ] --- ## Data: *stalker* #### Install the `{pl462}` data package and load the tidyverse: ```r devtools::install_github("A-Farina/pl462", ref = "main") library(tidyverse) # For data manipulation library(ggpubr) # For result output library(rstatix) # Tidy friendly simple models ``` -- #### From `{pl462}`, load the stalker dataset: .pull-left100[ ```r data(stalker, package = "pl462") dat <- stalker glimpse(dat) ``` ``` ## Rows: 78 ## Columns: 3 ## $ Therapy <fct> Cruel to be Kind Therapy, Cruel to be Kind Therapy, Cruel to … ## $ AmtTher <fct> 6 Weeks, 6 Weeks, 6 Weeks, 6 Weeks, 6 Weeks, 6 Weeks, 6 Weeks… ## $ Stalker <dbl> 47, 50, 51, 52, 53, 57, 57, 60, 63, 66, 68, 72, 72, 35, 34, 3… ``` ] --- ## Data: Categorical Counts ```r dat %>% group_by(Therapy, AmtTher) %>% select(-Stalker) %>% table() ``` ``` ## AmtTher ## Therapy 6 Weeks 12 Weeks ## Cruel to be Kind Therapy 13 13 ## Psychodyshamic Therapy 13 13 ## None 13 13 ``` --- ## Data: Continuous Distributions ```r dat %>% ggplot(aes(x = Stalker, fill = AmtTher)) + geom_density() + facet_grid(Therapy ~ .) + labs(title = "Stalking Hour Distributions", subtitle = "According to Therapy Amount and Type", fill = "Therapy Amount") ``` <!-- --> --- ## Data: Summary Statistics ```r dat %>% group_by(Therapy, AmtTher) %>% get_summary_stats(Stalker, type = "mean_sd") ``` ``` ## # A tibble: 6 x 6 ## Therapy AmtTher variable n mean sd ## <fct> <fct> <chr> <dbl> <dbl> <dbl> ## 1 Cruel to be Kind Therapy 6 Weeks Stalker 13 59.1 8.5 ## 2 Cruel to be Kind Therapy 12 Weeks Stalker 13 37.8 14.2 ## 3 Psychodyshamic Therapy 6 Weeks Stalker 13 60.1 5.01 ## 4 Psychodyshamic Therapy 12 Weeks Stalker 13 42.9 14.9 ## 5 None 6 Weeks Stalker 13 76.3 17.1 ## 6 None 12 Weeks Stalker 13 66.2 20.6 ``` --- ## Data: Visualize the Data ```r ggboxplot(dat, x = "AmtTher", y = "Stalker", color = "Therapy") ``` <!-- --> --- # Research Question: Dr. Field is interested in seeing the <span style='color: red;'>most effective</span> therapeutic intervention for stalkers. In addition, in an effort to cut back on the number of weeks he spends treating stalkers, he is trying to discover the <span style='color: red;'>minimum amount of time</span> patients need to be in treatment to show improvement. In order to test his hypotheses, he recruits 78 convicted stalkers to participate in either a 6 week or 12 week therapy program. His post intervention measure of success is the <span style='color: blue;'>number of hours a week spent stalking</span>. --- class: inverse, center, middle .left[ # Assumption Testing in R * ### Outliers * ### Normality of Residuals * ### Normality by Group * ### Homogeneity of Variance ] --- ## Outliers ```r dat %>% group_by(Therapy, AmtTher) %>% identify_outliers(Stalker) ``` ``` ## # A tibble: 2 x 5 ## Therapy AmtTher Stalker is.outlier is.extreme ## <fct> <fct> <dbl> <lgl> <lgl> ## 1 Cruel to be Kind Therapy 12 Weeks 12 TRUE FALSE ## 2 Psychodyshamic Therapy 12 Weeks 80 TRUE TRUE ``` .footnote[ One extreme outlier exists in these data. This can be due to: 1) data entry errors, measurement errors, or unusual values. You can include the outlier in the analysis anyway if you do not believe the result will be substantially affected. This can be evaluated by comparing the result of the ANOVA test with and without the outlier. ] --- ## Normality of Residuals- QQ Plot ```r model <- lm(Stalker ~ Therapy * AmtTher, data = dat) ggqqplot(residuals(model)) ``` <img src="index_files/figure-html/residuals-1.png" style="display: block; margin: auto;" /> .footnote[ The QQ plot compares the theoretical distribution (normal) with these data. All of the points fall roughly along the reference line so we can assume normality of these data. ] --- ## Normality of Residuals- Shapiro-Wilks Test ```r shapiro_test(residuals(model)) ``` ``` ## # A tibble: 1 x 3 ## variable statistic p.value ## <chr> <dbl> <dbl> ## 1 residuals(model) 0.983 0.364 ``` .footnote[ The P-Value from the Shapiro-Wilk’s test describes these data as not being significantly different than normal. ] --- ## Normality by group- Shapiro-Wilks Test ```r dat %>% group_by(AmtTher) %>% shapiro_test(Stalker) ``` ``` ## # A tibble: 2 x 4 ## AmtTher variable statistic p ## <fct> <chr> <dbl> <dbl> ## 1 6 Weeks Stalker 0.890 0.00115 ## 2 12 Weeks Stalker 0.922 0.00974 ``` .footnote[ The Shapiro-Wilk’s method is widely recommended for normality tests. It is based on the correlation between the data and the corresponding normal scores. ] --- ## Normal by group- QQ Plot .pull-left[ ```r ggqqplot(dat, "Stalker", ggtheme = theme_bw()) + facet_grid(Therapy~AmtTher) ``` ] .pull-right[ <!-- --> ] .footnote[ In order to determine normality graphically we can use the output of a normal Q-Q Plot. If the data are normally distributed then the data points will be close to the diagonal line. ] --- ## Homogeneity of Variance ```r dat %>% levene_test(Stalker ~ Therapy * AmtTher) ``` ``` ## # A tibble: 1 x 4 ## df1 df2 statistic p ## <int> <int> <dbl> <dbl> ## 1 5 72 3.50 0.00687 ``` .footnote[ Levene's test is significant (p<.05). Therefore, we cannot assume the homogeneity of variances in the different groups. <span style='color: red;'>R output will be slightly different from SPSS because R uses a median centered approach and SPSS uses a mean centered approach.</span>. To mirror SPSS output, add: `center = mean` to the function. ] --- class: inverse, center, middle # Two-way ANOVA using R --- ## Tests of Between-Subject Effects ```r (stalker_aov <- dat %>% anova_test(Stalker ~ Therapy * AmtTher)) ``` ``` ## ANOVA Table (type II tests) ## ## Effect DFn DFd F p p<.05 ges ## 1 Therapy 2 72 19.244 2.02e-07 * 0.348 ## 2 AmtTher 1 72 24.722 4.34e-06 * 0.256 ## 3 Therapy:AmtTher 2 72 0.988 3.77e-01 0.027 ``` .footnote[ We are interested in the Therapy, Amt_Therapy and Therapy x Amt_Therapy rows of the table. We must first look at the Therapy x AmtTher interaction. We can see from the Sig. column that we do not have a statistically significant interaction. We can see from the above table that there was a significant difference in stalker hours between Therapy Types (P < .001) and between therapy amounts (P < .001). ] --- ## Multiple Comparisons: Therapy Type ```r dat %>% pairwise_t_test(Stalker ~ Therapy, p.adjust.method = 'bonferroni') %>% select(-`.y.`, -n1, -n2) %>% knitr::kable("html") ``` <table> <thead> <tr> <th style="text-align:left;"> group1 </th> <th style="text-align:left;"> group2 </th> <th style="text-align:right;"> p </th> <th style="text-align:left;"> p.signif </th> <th style="text-align:right;"> p.adj </th> <th style="text-align:left;"> p.adj.signif </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Cruel to be Kind Therapy </td> <td style="text-align:left;"> Psychodyshamic Therapy </td> <td style="text-align:right;"> 0.508000 </td> <td style="text-align:left;"> ns </td> <td style="text-align:right;"> 1.00e+00 </td> <td style="text-align:left;"> ns </td> </tr> <tr> <td style="text-align:left;"> Cruel to be Kind Therapy </td> <td style="text-align:left;"> None </td> <td style="text-align:right;"> 0.000004 </td> <td style="text-align:left;"> **** </td> <td style="text-align:right;"> 1.18e-05 </td> <td style="text-align:left;"> **** </td> </tr> <tr> <td style="text-align:left;"> Psychodyshamic Therapy </td> <td style="text-align:left;"> None </td> <td style="text-align:right;"> 0.000048 </td> <td style="text-align:left;"> **** </td> <td style="text-align:right;"> 1.44e-04 </td> <td style="text-align:left;"> *** </td> </tr> </tbody> </table> .footnote[ We are interested in the differences between (1) Cruel to be Kind Therapy and Psychodyshamic Therapy, (2) Cruel to be Kind Therapy and None, and (3) Psychodyshamic Therapy and None. From the results we can see that there is a significant difference between Cruel to be Kind Therapy and none and Psychodyshamic Therapy and none, but not Cruel to be Kind and Psychodyshamic Therapies. ] --- ## Multiple Comparisons: Therapy Amount ```r dat %>% pairwise_t_test(Stalker ~ AmtTher, p.adjust.method = 'bonferroni') %>% select(-`.y.`, -n1, -n2) %>% knitr::kable("html") ``` <table> <thead> <tr> <th style="text-align:left;"> group1 </th> <th style="text-align:left;"> group2 </th> <th style="text-align:right;"> p </th> <th style="text-align:left;"> p.signif </th> <th style="text-align:right;"> p.adj </th> <th style="text-align:left;"> p.adj.signif </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 6 Weeks </td> <td style="text-align:left;"> 12 Weeks </td> <td style="text-align:right;"> 0.000107 </td> <td style="text-align:left;"> *** </td> <td style="text-align:right;"> 0.000107 </td> <td style="text-align:left;"> *** </td> </tr> </tbody> </table> .footnote[ We are interested in the differences between 6 Week and 12 Week therapy amounts. From the results we can see that there is a significant difference between 6 Week and 12 Week therapy amounts. ] --- ## Visualizing The Data ```r stalker %>% group_by(Therapy, AmtTher) %>% summarize(mean = mean(Stalker)) %>% ggplot(aes(x = AmtTher, y = mean, group = Therapy)) + geom_line(aes(color = Therapy)) + geom_point(aes(color = Therapy)) + labs(title = "Estimated Marginal Means of Time Spent Stalking (hours per week)", x = "Amount of Therapy", y = "Estimated Marginal Means") ``` .center[ <!-- --> ] --- ## Visualizing The Data The following plot is not of sufficient quality to present in your reports but provides a good graphical illustration of your results. In addition, we can get an idea of whether there is an interaction effect by inspecting whether the lines are parallel or not. From this plot we can see how our results from the previous table might make sense. Remember that if the lines are not parallel then there is the possibility of an interaction taking place. <br><br> .center[ <!-- --> ] --- ## Simple Main Effects If we observed a statistically significant interaction, we would then test the simple main effects by grouping one of the variables (AmtTher) and conducting one-way ANOVAs between each level of Therapy. In the R code below, we will group the data by AmtTher and analyze the simple main effects of Therapy on hours of stalking. Here is the code we would use: ```r model <- lm(Stalker ~ Therapy * AmtTher, data = dat) dat %>% group_by(AmtTher) %>% anova_test(Stalker ~ Therapy, error = model) ``` ``` ## # A tibble: 2 x 8 ## AmtTher Effect DFn DFd F p `p<.05` ges ## <fct> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> ## 1 6 Weeks Therapy 2 72 5.89 0.004 * 0.141 ## 2 12 Weeks Therapy 2 72 14.3 0.00000572 * 0.285 ``` --- ## References - ### These analyses and [xaringan slides](https://github.com/yihui/xaringan) were produced using only R code - ### The Code was derived from a datanovia tutorial on ANOVA Analyses. Additional explainations and information can be found in this [datanovia tutorial](https://www.datanovia.com/en/lessons/anova-in-r/)