class: center, middle, inverse, title-slide .title[ # Design and Analysis of Experiments Using R ] .subtitle[ ## Latin Square Design ] .author[ ### Olga Lyashevska ] .date[ ### 2022-11-09 ] --- # This week: Latin Square Design <img src="figs/roadmap-rcbf.png" style="width: 90%" /> --- # Block factors One block factor: - RCB - Randomised Complete Block (all treatments are assigned exactly once within each block) and the like: - GCB - Generalised Complete Block - RCBF - Randomised Complete Block Factorial Design Two block factors: - LSD - Latin Square Design (2 blocking variables) --- # Comparison <img src="figs/Screenshot_20210120_122217.png" style="width: 100%" /> In Latin square each treatment level occurs only once in each row and once in each column (similar to sudoku!) --- # Latin Square Design Randomised Complete Block design allow us to block on a single source of variation in the responses. There are experimental situations with more than one source of extraneous variation. In these situations we use Latin Square Design. It require that the numbers of levels of both blocking factors be the same as (or a multiple of) the number of treatments. --- # Statistical Model The model for an LSD is written: `$$y_{ijk}= \mu+\alpha_{i}+\tau_{j}+\beta_{k}+\epsilon_{ijk}$$` where `\(y_{ijk}\)` is the observation in the i-th row ad k-th column `\(\mu\)` is the overall mean `\(\alpha_i\)` the i-th row blocking factor, `\(\tau_j\)` is the j-th treatment effect, `\(\beta_{k}\)` is the k-th column blocking factor `\(\epsilon_{ijk}\)` is the random error. --- # Analysis of variance The analysis of variance consists of partitioning the total sum of squares of the N observations into components for rows, columns, treatments, and error, for example, `$$SS_{T} = SS_{Rows} + SS_{Columns} + SS_{Treatments} + SS_{E}$$` --- # Common uses One of the common uses for a Latin Square arises when a sequence of treatments is given to a subject over several time periods. We need to block on subjects, because each subject tends to respond differently. We also need to block on time period, because there may be consistent differences over time due to growth, aging, disease progression, or other factors. --- # Bioequivalence of drug delivery Consider the blood concentration of a drug after the drug has been administered. The concentration will typically start at zero, increase to some maximum level as the drug gets into the bloodstream, and then decrease back to zero as the drug is metabolized or excreted. These time-concentration curves may differ if the drug is delivered in a different form, say a tablet versus a capsule. Bioequivalence studies seek to determine if different drug delivery systems have similar biological effects. --- # Bioequivalence of drug delivery <img src="figs/bioequivalence.png" style="width: 80%" /> .font90[ [Source: Advances in therapy. 2017 Sep; Volume 34(9). Page: 2071-2082.](https://doi.org/10.1007/s12325-017-0594-8) ] --- # Example We wish to compare 3 methods for delivering a drug: - _a solution_ - _a tablet_ - _a capsule_ The response will be the area under the time-concentration curve. We anticipate large subject to subject differences, so we block on subject. --- # Example (cont.) There are 3 subjects, and each subject will be given the drug 3 times, once with each of the 3 methods. Because the body may adapt to the drug in some way, each drug will be used once in the first period, once in the second period, and once in the third period. We anticipate large differences, so we block on period. --- # Example (cont.) Area under the curve for administering a drug via: - A: solution - B: tablet - C: capsule Table entries are treatments and responses. <img src="figs/latin-square-design.png" style="width: 80%" /> This is a 3x3 design. --- # Spreadsheet representation <img src="figs/Screenshot_20210119_205108.png" style="width: 60%" /> Download <a href="../datasets/bioequiv.csv">data</a>: right click --> Save link as --- # R example ```r # clean working space rm(list = ls(all = TRUE)) # read in dataset bioequiv <- read.csv("../datasets/bioequiv.csv") # explore str(bioequiv) ``` ``` ## 'data.frame': 9 obs. of 4 variables: ## $ Period : int 1 1 1 2 2 2 3 3 3 ## $ Subject : int 1 2 3 1 2 3 1 2 3 ## $ Method : chr "A" "C" "B" "C" ... ## $ Response: int 1799 2075 1396 1846 1156 868 2147 1777 2291 ``` --- # Code variables as factors ```r # change to factors bioequiv$Period <- as.factor(bioequiv$Period) bioequiv$Subject <- as.factor(bioequiv$Subject) bioequiv$Method <- as.factor(bioequiv$Method) str(bioequiv) ``` ``` ## 'data.frame': 9 obs. of 4 variables: ## $ Period : Factor w/ 3 levels "1","2","3": 1 1 1 2 2 2 3 3 3 ## $ Subject : Factor w/ 3 levels "1","2","3": 1 2 3 1 2 3 1 2 3 ## $ Method : Factor w/ 3 levels "A","B","C": 1 3 2 3 2 1 2 1 3 ## $ Response: int 1799 2075 1396 1846 1156 868 2147 1777 2291 ``` --- # Analysis of variance ```r # analysis of variance mod <- lm(Response~Period+Subject+Method, bioequiv) anova(mod) ``` ``` ## Analysis of Variance Table ## ## Response: Response ## Df Sum Sq Mean Sq F value Pr(>F) ## Period 2 928006 464003 103.231 0.009594 ** ## Subject 2 261115 130557 29.047 0.033282 * ## Method 2 608891 304445 67.733 0.014549 * ## Residuals 2 8990 4495 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` --- # Analysis of variance ```r # the ame result can be achieved via summary(aov(Response~Period+Subject+Method, bioequiv)) ``` ``` ## Df Sum Sq Mean Sq F value Pr(>F) ## Period 2 928006 464003 103.23 0.00959 ** ## Subject 2 261115 130557 29.05 0.03328 * ## Method 2 608891 304445 67.73 0.01455 * ## Residuals 2 8990 4495 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` --- # Significant differences The ANOVA table suggest that there is a significant variation in `Response` due to `Period`, `Subject` and `Method`. It does not hovewer identify the specific pairs of `Methods` which differ. Questions: - Do all methods of drug administration differ? - Some of them? To find significant differences, you must use post hoc tests. --- # Comparison of means: LSD test LSD: Fisher's Least Significant Difference It requires a significant F-test for the equality of all k means before individual paired differences may be tested. This test helps to identify the populations whose means are statistically different. The basic idea of the test is to compare the populations taken in pairs. It is then used to proceed in a one-way or two-way analysis of variance. --- # Comparison of means: LSD test ```r # install.packages("agricolae") library(agricolae) # help(package="agricolae") test1 <- LSD.test(y = lm(Response~Period+Subject+Method, bioequiv), trt = "Method", group=FALSE) ``` --- # Comparison of means: LSD test ```r test1$comparison ``` ``` ## difference pvalue signif. LCL UCL ## A - B -85.0000 0.2607 -320.5292 150.5292 ## A - C -589.3333 0.0085 ** -824.8625 -353.8041 ## B - C -504.3333 0.0116 * -739.8625 -268.8041 ``` A difference between pair of means A - C and B - C is significant, whereas a diference between A - B is not significant. Hence, we conclude that (A) solution and (B) tablet result in the same bioequivalence. --- # Comparison of means: TukeyHSD Tukey HSD: Honestly Significant Difference Test Corrects for the Type I error (incorrectly rejecting a true null hypothesis). Note, that LSD test does not correct. Both test are used only for comparing of unstructured treatments (not appropriate if treatments correspond to k levels of a quantitative factor). --- # Comparison of means: TukeyHSD .font90[ ```r test2 <- TukeyHSD(aov(Response~Period+Subject+Method, bioequiv), "Method", ordered=TRUE) test2 ``` ``` ## Tukey multiple comparisons of means ## 95% family-wise confidence level ## factor levels have been ordered ## ## Fit: aov(formula = Response ~ Period + Subject + Method, data = bioequiv) ## ## $Method ## diff lwr upr p adj ## B-A 85.0000 -237.4625 407.4625 0.4303453 ## C-A 589.3333 266.8708 911.7959 0.0155165 ## C-B 504.3333 181.8708 826.7959 0.0210673 ``` Same conclusions as with LSD test. Note, that p-values are higher. This test is more conservative. ] --- # Plot Tukey HSD ```r plot(test2) ``` ![](week7_files/figure-html/unnamed-chunk-8-1.png)<!-- --> --- # Post-hoc tests assumptions 1. The observations being tested are independent within and among the groups. 1. The groups associated with each mean in the test are normally distributed. 1. There is equal within-group variance across the groups associated with each mean in the test (homogeneity of variance). --- # Type I and Type II error A statistically significant result cannot prove that a research hypothesis is correct (as this implies 100% certainty). Because a p-value is based on probabilities, there is always a chance of making an incorrect conclusion regarding accepting or rejecting the null hypothesis (H0). Anytime we make a decision using statistics there are four possible outcomes, with two representing correct decisions and two representing errors. --- # Type I and Type II error <img src="figs/Screenshot_20210120_112118.png" style="width: 80%" /> [Read this paper: Ind Psychiatry J. 2009 Jul-Dec; 18(2): 127–131.](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2996198/?report=classic) --- # Create Latin Square Design ```r library(agricolae) treatments<-c(LETTERS[1:5]) design <- design.lsd(treatments, seed = 23)$book levels(design$row) <- paste("Day", rep(1:5)) levels(design$col) <- paste("Lab", rep(1:5)) head(design) ``` ``` ## plots row col treatments ## 1 101 Day 1 Lab 1 A ## 2 102 Day 1 Lab 2 D ## 3 103 Day 1 Lab 3 B ## 4 104 Day 1 Lab 4 C ## 5 105 Day 1 Lab 5 E ## 6 201 Day 2 Lab 1 D ``` --- # Subsetting dataframes Lets remove columns `plots` Delete column by name: ```r design1 <- subset(design, select=-c(plots)) ``` ```r drop <-c("plots") design2 <- design[,!(names(design) %in% drop)] # see ?"%in%" ``` Drop column by column index numbers ```r design3 <-design[-c(1)] ``` --- # Subsetting dataframes We can also choose to keep columns instead of dropping. ```r design4 <- design[c("row", "col", "treatments")] design5 <- design[c(2:4)] ``` ```r design6 <- subset(design, select=c(row, col, treatments)) ``` Verify that all options give identical result. --- # Replication A design based on a single Latin Square has equal number of columns and rows, which may not always be adequate for an analysis of the treatment effects. One way of increasing the number of observations is to stack squares one above the other, or next to each other. <img src="figs/Screenshot_20210120_222350.png" style="width: 80%" /> --- # Conclusion Latin Square design helps to increase the precision in detecting differences among treatments by adjusting for variability in experimental units in two ways (rows and columns). However, the restriction on Latin Square designs is that the number of levels of the row blocking factor, the number of levels of the column blocking factor, and the number of levels of the treatment factor all have to be equal. This restriction may be impractical in some situations. <!-- --- --> <!-- # Cotton Spinning Example --> <!-- This experiment was reported in the November 1953 issue of the journal Applied Statistics by Robert Peake, of the British Cotton Industry Research Association. --> <!-- In the cotton-spinning process, a strand of cotton (known as roving) thicker than the final thread is produced. Roving is twisted and as the degree of twist increases, so does the strength of the cotton, but unfortunately, so does the production time and hence, the cost. --> <!-- The twist is introduced by means of a rotary guide called a flyer. --> <!-- --- --> <!-- # Cotton Spinning Example --> <!-- One of the purposes of the experiment was to investigate the way in which different degrees of twist (measured in turns per inch) affected the breakage rate of the roving (strand of cotton). --> <!-- <img src="figs/Screenshot_20210120_220512.png" style="width: 90%" /> --> <!-- This design implies that that same machine should be used everyday. --> <!-- The experimenters could not guarantee that the same six machines would be available for the same six days, and this led them to select a RCB design. --> <!-- --- --> <!-- #R Code --> <!-- ```{r} --> <!-- cotton.data = read.table("../datasets/cotton.spinning.txt", header=T) --> <!-- head(cotton.data, 3) --> <!-- ``` --> <!-- Download <a href="../datasets/cotton.spinning.txt">data</a>: right click --> <!-- Save link as --> <!-- --- --> <!-- # Treatment factors --> <!-- There are two treatment factors, namely `Flyer` (type of flyer) and `Twist` (degree of twist). --> <!-- The first treatment factor, flyer, has two levels, “ordinary” and “special.” coded as 1 and 2, respectively. --> <!-- The levels of the second treatment factor, twist, had to be chosen within a feasible range. A pilot --> <!-- experiment was run to determine this range, and four non equally spaced levels were selected, --> <!-- 1.63, 1.69, 1.78, and 1.90 turns per inch. --> <!-- --- --> <!-- # Create factor variables --> <!-- ```{r} --> <!-- cotton.data = within(cotton.data, --> <!-- {fBlock = factor(Block); fTrtmt = factor(Trtmt); --> <!-- fFlyer = factor(Flyer); fTwist = factor(Twist) }) --> <!-- ``` --> <!-- Function [`within`](http://rfunction.com/archives/2182#:~:text=Perform%20R%20expressions%20using%20the,object%20with%20these%20revised%20contents.) evaluates an expression in a dataframe. --> <!-- --- --> <!-- # RCB design --> <!-- ```{r comment=""} --> <!-- model1 = aov(Break ~ fBlock + fTrtmt, data=cotton.data) --> <!-- summary(model1) --> <!-- # this is the same --> <!-- # model1 = lm(Break ~ fBlock + fTrtmt, data=cotton.data) --> <!-- # anova(model1) --> <!-- ``` --> <!-- --- --> <!-- # Interaction Plots --> <!-- ```{r} --> <!-- with(cotton.data, interaction.plot(fBlock, fTrtmt, Break)) --> <!-- ``` --> <!-- --- --> <!-- # Comparison of means: TukeyHSD --> <!-- ```{r message=FALSE, comment=""} --> <!-- res <-TukeyHSD(model1, "fBlock", ordered=TRUE) --> <!-- res --> <!-- ``` --> <!-- --- --> <!-- # Plot: TukeyHSD --> <!-- ```{r} --> <!-- plot(TukeyHSD(model1, "fBlock", ordered=TRUE)) --> <!-- ``` --> <!-- --- --> <!-- # Test for interaction --> <!-- ```{r} --> <!-- library(asbio) --> <!-- with(cotton.data, tukey.add.test(Break, fBlock, fTrtmt)) --> <!-- ``` --> <!-- No interaction between blocking variables. -->