Design and Analysis of Experiments Using R

class: center, middle, inverse, title-slide

.title[
# Design and Analysis of Experiments Using R
]
.subtitle[
## Latin Square Design
]
.author[
### Olga Lyashevska
]
.date[
### 2022-11-09
]

---

# This week: Latin Square Design
<img src="figs/roadmap-rcbf.png" style="width: 90%" />
---
# Block factors

One block factor:

- RCB - Randomised Complete Block (all treatments are assigned exactly once within each block) and the like: 
  - GCB - Generalised Complete Block     
  - RCBF - Randomised Complete Block Factorial Design

Two block factors:

- LSD - Latin Square Design (2 blocking variables)

---
# Comparison

In Latin square each treatment level occurs only once in each row and once in each column (similar to sudoku!)

---
# Latin Square Design

Randomised Complete Block design allow us to block on a single source of variation in the responses.

There are experimental situations with more than one source of extraneous variation.

In these situations we use Latin Square Design.

It require that the numbers of levels of both blocking factors be the same as (or a multiple of) the number of treatments.

---
# Statistical Model

The model for an LSD is written:

`$$y_{ijk}= \mu+\alpha_{i}+\tau_{j}+\beta_{k}+\epsilon_{ijk}$$`

where 
`$y_{ijk}$` is the observation in the i-th row ad k-th column

`$\mu$` is the overall mean

`$\alpha_i$` the i-th row blocking factor,

`$\tau_j$` is the j-th treatment effect,

`$\beta_{k}$` is the k-th column blocking factor

`$\epsilon_{ijk}$` is the random error.

---
# Analysis of variance

The analysis of variance consists of partitioning the total sum of squares of the N observations into components for rows, columns, treatments, and error, for example,

`$$SS_{T} = SS_{Rows} + SS_{Columns} + SS_{Treatments} + SS_{E}$$`

---
# Common uses

One of the common uses for a Latin Square arises when a sequence of
treatments is given to a subject over several time periods.

We need to block on subjects, because each subject tends to respond differently.

We also need to block on time period, because there may be consistent differences over time due
to growth, aging, disease progression, or other factors.

---
# Bioequivalence of drug delivery

Consider the blood concentration of a drug after the drug has been administered. The concentration will typically start at zero, increase to some maximum level as the drug gets into the bloodstream, and then decrease back to zero as the drug is metabolized or excreted.

These time-concentration curves may differ if the drug is delivered in a different form, say a tablet versus a capsule.

Bioequivalence studies seek to determine if different drug delivery systems have similar biological effects.

---
# Bioequivalence of drug delivery

.font90[
[Source: Advances in therapy. 2017 Sep; Volume 34(9). Page: 2071-2082.](https://doi.org/10.1007/s12325-017-0594-8) 
]
---
# Example

We wish to compare 3 methods for delivering a drug: 
- _a solution_ 
- _a tablet_
- _a capsule_

The response will be the area under the time-concentration curve. We anticipate large subject to subject differences, so we block on subject.

---
# Example (cont.)

There are 3 subjects, and each subject will be given the drug 3 times, once with each of the 3 methods.

Because the body may adapt to the drug in some way, each drug will be used once in the first period, once in the second period, and once in the third period.

We anticipate large differences, so we block on period.

---
# Example (cont.)

Area under the curve for administering a drug via:

- A: solution 
- B: tablet  
- C: capsule

Table entries are treatments and responses.

This is a 3x3 design.

---
# Spreadsheet representation

Download <a href="../datasets/bioequiv.csv">data</a>: right click --> Save link as
---
# R example

```r
# clean working space
rm(list = ls(all = TRUE))
# read in dataset
bioequiv <- read.csv("../datasets/bioequiv.csv")
# explore
str(bioequiv)
```

```
## 'data.frame':	9 obs. of  4 variables:
##  $ Period  : int  1 1 1 2 2 2 3 3 3
##  $ Subject : int  1 2 3 1 2 3 1 2 3
##  $ Method  : chr  "A" "C" "B" "C" ...
##  $ Response: int  1799 2075 1396 1846 1156 868 2147 1777 2291
```

---
# Code variables as factors

```r
# change to factors
bioequiv$Period <- as.factor(bioequiv$Period)
bioequiv$Subject <- as.factor(bioequiv$Subject)
bioequiv$Method <- as.factor(bioequiv$Method)

str(bioequiv)
```

```
## 'data.frame':	9 obs. of  4 variables:
##  $ Period  : Factor w/ 3 levels "1","2","3": 1 1 1 2 2 2 3 3 3
##  $ Subject : Factor w/ 3 levels "1","2","3": 1 2 3 1 2 3 1 2 3
##  $ Method  : Factor w/ 3 levels "A","B","C": 1 3 2 3 2 1 2 1 3
##  $ Response: int  1799 2075 1396 1846 1156 868 2147 1777 2291
```

---
# Analysis of variance

```r
# analysis of variance
mod <- lm(Response~Period+Subject+Method, bioequiv)
anova(mod)
```

```
## Analysis of Variance Table
## 
## Response: Response
##           Df Sum Sq Mean Sq F value   Pr(>F)   
## Period     2 928006  464003 103.231 0.009594 **
## Subject    2 261115  130557  29.047 0.033282 * 
## Method     2 608891  304445  67.733 0.014549 * 
## Residuals  2   8990    4495                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

---
# Analysis of variance

```r
# the ame result can be achieved via
summary(aov(Response~Period+Subject+Method, bioequiv))
```

```
##             Df Sum Sq Mean Sq F value  Pr(>F)   
## Period       2 928006  464003  103.23 0.00959 **
## Subject      2 261115  130557   29.05 0.03328 * 
## Method       2 608891  304445   67.73 0.01455 * 
## Residuals    2   8990    4495                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```
---
# Significant differences

The ANOVA table suggest that there is a significant variation in `Response` due to `Period`, `Subject` and `Method`.

It does not hovewer identify the specific pairs of `Methods` which differ.

Questions:
- Do all methods of drug administration differ? 
- Some of them?

To find significant differences, you must use post hoc tests.

---
# Comparison of means: LSD test

LSD: Fisher's Least Significant Difference

It requires a significant F-test for the equality of all k means before
individual paired differences may be tested.

This test helps to identify the populations whose means are statistically different.

The basic idea of the test is to compare the populations taken in pairs.

It is then used to proceed in a one-way or two-way analysis of variance.

---
# Comparison of means: LSD test

```r
# install.packages("agricolae")
library(agricolae)
# help(package="agricolae")
test1 <- LSD.test(y = lm(Response~Period+Subject+Method, bioequiv), 
                  trt = "Method",
                  group=FALSE)
```
---
# Comparison of means: LSD test

```r
test1$comparison
```

```
##       difference pvalue signif.       LCL       UCL
## A - B   -85.0000 0.2607         -320.5292  150.5292
## A - C  -589.3333 0.0085      ** -824.8625 -353.8041
## B - C  -504.3333 0.0116       * -739.8625 -268.8041
```
A difference between pair of means A - C and B - C is significant, whereas a diference between A - B is not significant.

Hence, we conclude that (A) solution and (B) tablet result in the same bioequivalence.

---
# Comparison of means: TukeyHSD
Tukey HSD: Honestly Significant Difference Test

Corrects for the Type I error (incorrectly rejecting a true null hypothesis).

Note, that LSD test does not correct.

Both test are used only for comparing of unstructured treatments (not appropriate if treatments correspond to k levels of a quantitative factor).

---
# Comparison of means: TukeyHSD

.font90[

```r
test2 <- TukeyHSD(aov(Response~Period+Subject+Method, bioequiv), "Method", ordered=TRUE)
test2
```

```
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
##     factor levels have been ordered
## 
## Fit: aov(formula = Response ~ Period + Subject + Method, data = bioequiv)
## 
## $Method
##         diff       lwr      upr     p adj
## B-A  85.0000 -237.4625 407.4625 0.4303453
## C-A 589.3333  266.8708 911.7959 0.0155165
## C-B 504.3333  181.8708 826.7959 0.0210673
```
Same conclusions as with LSD test. Note, that p-values are higher. 
This test is more conservative.
]
---
# Plot Tukey HSD

```r
plot(test2)
```

![](week7_files/figure-html/unnamed-chunk-8-1.png)

---
# Post-hoc tests assumptions

1. The observations being tested are independent within and among the groups.
1. The groups associated with each mean in the test are normally distributed.
1. There is equal within-group variance across the groups associated with each mean in the test (homogeneity of variance).

---
# Type I and Type II error
A statistically significant result cannot prove that a research hypothesis is correct (as this implies 100% certainty). Because a p-value is based on probabilities, there is always a chance of making an incorrect conclusion regarding accepting or rejecting the null hypothesis (H0).

Anytime we make a decision using statistics there are four possible outcomes, with two representing correct decisions and two representing errors.

---
# Type I and Type II error

[Read this paper: Ind Psychiatry J. 2009 Jul-Dec; 18(2): 127–131.](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2996198/?report=classic)

---
# Create Latin Square Design

```r
library(agricolae)
treatments<-c(LETTERS[1:5])
design <- design.lsd(treatments, seed = 23)$book
levels(design$row) <- paste("Day", rep(1:5))
levels(design$col) <- paste("Lab", rep(1:5))
head(design)
```

```
##   plots   row   col treatments
## 1   101 Day 1 Lab 1          A
## 2   102 Day 1 Lab 2          D
## 3   103 Day 1 Lab 3          B
## 4   104 Day 1 Lab 4          C
## 5   105 Day 1 Lab 5          E
## 6   201 Day 2 Lab 1          D
```
---
# Subsetting dataframes

Lets remove columns `plots`

Delete column by name:

```r
design1 <- subset(design, select=-c(plots))
```

```r
drop <-c("plots")
design2 <- design[,!(names(design) %in% drop)]
# see ?"%in%"
```
Drop column by column index numbers

```r
design3 <-design[-c(1)]
```

---
# Subsetting dataframes

We can also choose to keep columns instead of dropping.

```r
design4 <- design[c("row", "col", "treatments")]
design5 <- design[c(2:4)]
```

```r
design6 <- subset(design, select=c(row, col, treatments))
```
Verify that all options give identical result.
---
# Replication

A design based on a single Latin Square has equal number of columns and rows, which may not always be adequate for an analysis of the treatment effects.

One way of increasing the number of observations is to stack squares one above the other, or next to each other.

---
# Conclusion

Latin Square design helps to increase the precision in detecting differences among treatments by adjusting for variability in experimental units in two ways (rows and columns).

However, the restriction on Latin Square designs is that the number of levels of the row blocking factor, the number of levels of the column blocking factor, and the number of levels of the treatment factor all have to be equal.

This restriction may be impractical in some situations.