class: center, middle, inverse, title-slide .title[ # Design and Analysis of Experiments Using R ] .subtitle[ ## Randomised Block Design ] .author[ ### Olga Lyashevska ] .date[ ### 2022-10-24 ] --- # Last week: Factorial Design <img src="figs/roadmap-crfd.png" style="width: 100%" /> --- # Factorial Design - More than one treatment factor; - All possible treatment combinations randomized to experimental units; - Experimental units are homogeneous; Benefits: - Interaction or joint effects of the factors can be detected; - Interactions can occur when the effect of one factor is different depending on the level of another factor or on a combination of levels of other factors. --- # This week: Randomised Block Design <img src="figs/roadmap-rcbf.png" style="width: 90%" /> --- - RCB - Randomised Complete Block All treatments are assigned exactly once within each block -- - GCB - Generalised Complete Block All treatments are assigned more than once within each block -- - RCBF - Randomised Complete Block Factorial Design Multiple factors -- - LSD - Latin Square Design 2 blocking variables --> next lecture --- #Complete Block Design - experimental units are not homogeneous; -- - heterogeneous experimental units are grouped into homogeneous subgroups or blocks prior to assigning them to treatment levels; -- - the variation between blocks is removed from the error sums of squares; -- - the power or sensitivity for detecting treatment effects is increased; --- # Introduction To eliminate as much of the natural variation as possible and increase the sensitivity of experiments, it is advisable to choose the experimental units for a study to be as homogeneous as possible. -- This would reduce the variance, `\(\sigma^{2}\)`, of the experimental error and increase the power for detecting treatment factor effects. -- But also most experimenters would like the conclusions of their work to have wide applicability. -- To achieve both goals blocking can be used. --- # Blocking - A group of heterogeneous experimental units is used so that the conclusions can be more general; -- - These heterogeneous experimental units are grouped into homogeneous subgroups before they are randomly assigned to treatment factor levels; -- - Randomly assigning treatment factor levels to experimental units within the smaller homogeneous subgroups of experimental units (blocks), has the same effect as using only homogeneous units; -- - Yet it allows the conclusions to be generalised to the entire class of heterogeneous experimental units used in the study. --- # Blocking - physical properties - location - time - etc In the most basic form there are no replicates within a block. The analysis of a randomised complete block design is straightforward. Block factor is treated as “another” factor in the model. --- # Example A student wants to investigate grandma tales of methods for extending the life of cut flowers. The treatment factor was the liquid to fill the vase. The levels were: 1. Tap water 2. Tap water with one spoonful of sugar added 3. Tap water with one cup of carbonated water 4. Tap water with one cup of 7-up --- # Example continued The experimental units were single flowers and the response was the time in days until the flower wilted. The student wanted the conclusions of his study to apply to many types of flowers, so a Randomised Complete Block design was used. -- The blocks were: 1. Rose 2. Carnation 3. Daisy 4. Tulip --- # R code A vector of factor levels is created to represent the four treatments. ```r set.seed(346) f <- factor(c(1,2,3,4)) flnum <- rep(f,4) ``` Create a random ordering of the treatment levels four times. A different random ordering is required for each of the four blocks or types of flowers. ```r b1t <- sample(f,4) b2t <- sample(f,4) b3t <- sample(f,4) b4t <- sample(f,4) ``` --- # R code Stack vectors together ```r trt<-c(b1t, b2t, b3t, b4t) ``` ```r block <- factor(rep(c("carnation", "daisy", "rose", "tulip"), each=4)) ``` --- # R code Create a dataframe and add a random response. ```r design<-data.frame(Block = block, FlowerNumber = flnum, Treatment=trt) # design$Response <- sample(7:21, nrow(design), replace=TRUE) write.csv(design, file="outputs/FlowerExperiment.csv", row.names=FALSE) ``` This code will produce a different randomized list each time it is run. --- # [agricolae](https://cran.r-project.org/web/packages/agricolae/index.html) package This package offers a broad functionality in the design of experiments. ```r # install.packages("agricolae") # make package accessible to the current R session # help(package="agricolae") library(agricolae) ?design.rcbd ``` --- ```r treatment<-c(1,2,3,4) outdesign <- design.rcbd(treatment, 4, seed = 11) rcb <- outdesign$book levels(rcb$block) <- c("carnation", "daisy", "rose", "tulip") head(rcb) ``` ``` plots block treatment 1 101 carnation 4 2 102 carnation 2 3 103 carnation 1 4 104 carnation 3 5 201 daisy 1 6 202 daisy 2 ``` --- # Formal specification The analysis of a randomized complete block design is straightforward. We treat the block factor as “another” factor in our model `$$Y_{ij}= \mu+\alpha_{i}+\beta_{j}+\epsilon_{ij}$$` Where `\(y_{ij}\)` is the response for the `\(j\)`th experimental unit to the `\(i\)`th level of the treatment factor; `\(\alpha_{j}\)`'s are the treatment effects and `\(\beta_{j}\)` are the block effects. This is an additive model, there is no interaction between block and treatment. --- # Formal specification The error term `\(\epsilon_{ij}\)` has the usual assumptions: `$$\epsilon_{ij}\stackrel{iid}{\sim}N(0, \sigma^{2})$$` Assumptions need to be checked with residuls plots. --- # Michelson example We now look at data from the 1879 experiment of Michelson to measure the speed of light. There are five experiments (column Expt); each has 20 runs (column Run) and Speed is the recorded speed of light, in km/sec ```r library(MASS) data("michelson") summary(michelson) ``` ``` ## Speed Run Expt ## Min. : 620.0 1 : 5 1:20 ## 1st Qu.: 807.5 2 : 5 2:20 ## Median : 850.0 3 : 5 3:20 ## Mean : 852.4 4 : 5 4:20 ## 3rd Qu.: 892.5 5 : 5 5:20 ## Max. :1070.0 6 : 5 ## (Other):70 ``` --- # Boxplots ```r plot(michelson$Expt, michelson$Speed, main="Speed of Light Data", xlab="Experiment No.") ``` <!-- --> --- # Group mean ```r with(michelson,tapply(Speed, Expt, mean)) ``` ``` 1 2 3 4 5 909.0 856.0 845.0 820.5 831.5 ``` ```r with(michelson,tapply(Speed, Run, mean)) ``` ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 894 842 866 900 818 790 794 854 884 856 898 894 858 800 834 828 888 856 844 850 ``` --- # Random Block Design Analyse as a randomized block design, with `Run` and `Expt` as factors. ```r fm <- aov(Speed ~ Run + Expt, data=michelson) summary(fm) ``` ``` Df Sum Sq Mean Sq F value Pr(>F) Run 19 113344 5965 1.105 0.36321 Expt 4 94514 23629 4.378 0.00307 ** Residuals 76 410166 5397 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` --- # Dropping a factor Fit the sub-model omitting `Run` ```r fm0 <- update(fm, .~. - Run) summary(fm0) ``` ``` Df Sum Sq Mean Sq F value Pr(>F) Expt 4 94514 23628 4.288 0.00311 ** Residuals 95 523510 5511 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` `\(H_{0}: \mu_{1}=\mu_{2}=\mu_{i}\)` Reject the hypothesis `\(H_{0}\)` if the `Pr(>F)` value on the `aov` output is less than the chosen value of `\(\alpha\)`. --- # Model Comparison with anova() Compare models using a formal analysis of variance ```r anova(fm0, fm) ``` ``` Analysis of Variance Table Model 1: Speed ~ Expt Model 2: Speed ~ Run + Expt Res.Df RSS Df Sum of Sq F Pr(>F) 1 95 523510 2 76 410166 19 113344 1.1053 0.3632 ``` This means that adding the `Run` to the model did not significantly improved fit over the model 1. --- # Interaction Plots Variability between Experiments ```r with(michelson, interaction.plot(Run, Expt, Speed)) ``` <!-- --> --- # Comparison of means: TukeyHSD Create a set of confidence intervals on the differences between the means of the levels of a factor. ```r TukeyHSD(fm, "Expt", ordered=TRUE) ``` ``` Tukey multiple comparisons of means 95% family-wise confidence level factor levels have been ordered Fit: aov(formula = Speed ~ Run + Expt, data = michelson) $Expt diff lwr upr p adj 5-4 11.0 -53.9162681 75.91627 0.9895190 3-4 24.5 -40.4162681 89.41627 0.8288663 2-4 35.5 -29.4162681 100.41627 0.5477057 1-4 88.5 23.5837319 153.41627 0.0025342 3-5 13.5 -51.4162681 78.41627 0.9774799 2-5 24.5 -40.4162681 89.41627 0.8288663 1-5 77.5 12.5837319 142.41627 0.0111963 2-3 11.0 -53.9162681 75.91627 0.9895190 1-3 64.0 -0.9162681 128.91627 0.0552229 1-2 53.0 -11.9162681 117.91627 0.1622609 ``` --- # Plot: TukeyHSD ```r plot(TukeyHSD(fm, "Expt", ordered=TRUE)) ``` <!-- --> --- # Test for additivity (interaction) The key assumption is that there is no interaction between block and treatment. ```r # install.packages("asbio") library(asbio) ``` ``` Loading required package: tcltk ``` ```r with(michelson, tukey.add.test(Speed, Expt, Run)) ``` ``` Tukey's one df test for additivity F = 2.0760303 Denom df = 75 p-value = 0.1537909 ``` The Null hypothesis is that there is no interaction term. p-value > 0.05 --> No interaction --- # Cotton Spinning Example This experiment was reported in the November 1953 issue of the journal Applied Statistics by Robert Peake, of the British Cotton Industry Research Association. -- At a stage of the cotton-spinning process, a strand of cotton (roving) thicker than the final thread is produced. Roving is twisted, as the degree of twist increases, so does the strength of the cotton, but, so does the production time and hence, the cost. The twist is introduced by means of a rotary guide called a “flyer.” -- One of the purposes of the experiment was to investigate the way in which different degrees of twist (measured in turns per inch) affected the breakage rate of the roving (strand of cotton). --- #R Code ```r cotton.data = read.table("../datasets/cotton.spinning.txt", header=T) head(cotton.data, 3) ``` ``` ## Block Trtmt Flyer Twist Break ## 1 1 12 1 1.69 6.0 ## 2 2 12 1 1.69 9.7 ## 3 3 12 1 1.69 7.4 ``` --- # Treatment factors There are two treatment factors, namely `Flyer` (type of flyer) and `Twist` (degree of twist). The first treatment factor, flyer, has two levels, “ordinary” and “special”, coded as 1 and 2, respectively. The levels of the second treatment factor, twist, had to be chosen within a feasible range. A pilot experiment was run to determine this range, and four non equally spaced levels were selected, 1.63, 1.69, 1.78, and 1.90 turns per inch. --- # Create factor variables ```r cotton.data = within(cotton.data, {fBlock = factor(Block); fTrtmt = factor(Trtmt); fFlyer = factor(Flyer); fTwist = factor(Twist) }) ``` Function [`within`](http://rfunction.com/archives/2182#:~:text=Perform%20R%20expressions%20using%20the,object%20with%20these%20revised%20contents.) evaluates an expression in a dataframe. --- # Analysis for randomized complete block design ```r model1 = aov(Break ~ fBlock + fTrtmt, data=cotton.data) summary(model1) ``` ``` Df Sum Sq Mean Sq F value Pr(>F) fBlock 12 177.2 14.76 2.890 0.00329 ** fTrtmt 5 231.0 46.21 9.047 1.89e-06 *** Residuals 60 306.4 5.11 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ```r # this is the same # model1 = lm(Break ~ fBlock + fTrtmt, data=cotton.data) # anova(model1) ``` --- # Interaction Plots ```r with(cotton.data, interaction.plot(fBlock, fTrtmt, Break)) ``` <!-- --> --- # Comparison of means: TukeyHSD ```r res <-TukeyHSD(model1, "fBlock", ordered=TRUE) res ``` ``` Tukey multiple comparisons of means 95% family-wise confidence level factor levels have been ordered Fit: aov(formula = Break ~ fBlock + fTrtmt, data = cotton.data) $fBlock diff lwr upr p adj 2-1 2.13333333 -2.3674829 6.634150 0.9113533 8-1 2.20000000 -2.3008163 6.700816 0.8923965 3-1 2.35000000 -2.1508163 6.850816 0.8413588 4-1 3.21666667 -1.2841496 7.717483 0.4153721 6-1 3.25000000 -1.2508163 7.750816 0.3991041 9-1 3.28333333 -1.2174829 7.784150 0.3831245 10-1 3.81666667 -0.6841496 8.317483 0.1773045 7-1 4.20000000 -0.3008163 8.700816 0.0902464 5-1 4.88333333 0.3825171 9.384150 0.0220869 13-1 4.95000000 0.4491837 9.450816 0.0190259 11-1 5.41666667 0.9158504 9.917483 0.0063739 12-1 5.48333333 0.9825171 9.984150 0.0054170 8-2 0.06666667 -4.4341496 4.567483 1.0000000 3-2 0.21666667 -4.2841496 4.717483 1.0000000 4-2 1.08333333 -3.4174829 5.584150 0.9997423 6-2 1.11666667 -3.3841496 5.617483 0.9996491 9-2 1.15000000 -3.3508163 5.650816 0.9995280 10-2 1.68333333 -2.8174829 6.184150 0.9843871 7-2 2.06666667 -2.4341496 6.567483 0.9280093 5-2 2.75000000 -1.7508163 7.250816 0.6582203 13-2 2.81666667 -1.6841496 7.317483 0.6234730 11-2 3.28333333 -1.2174829 7.784150 0.3831245 12-2 3.35000000 -1.1508163 7.850816 0.3521077 3-8 0.15000000 -4.3508163 4.650816 1.0000000 4-8 1.01666667 -3.4841496 5.517483 0.9998663 6-8 1.05000000 -3.4508163 5.550816 0.9998131 9-8 1.08333333 -3.4174829 5.584150 0.9997423 10-8 1.61666667 -2.8841496 6.117483 0.9888296 7-8 2.00000000 -2.5008163 6.500816 0.9424231 5-8 2.68333333 -1.8174829 7.184150 0.6922187 13-8 2.75000000 -1.7508163 7.250816 0.6582203 11-8 3.21666667 -1.2841496 7.717483 0.4153721 12-8 3.28333333 -1.2174829 7.784150 0.3831245 4-3 0.86666667 -3.6341496 5.367483 0.9999756 6-3 0.90000000 -3.6008163 5.400816 0.9999633 9-3 0.93333333 -3.5674829 5.434150 0.9999458 10-3 1.46666667 -3.0341496 5.967483 0.9952079 7-3 1.85000000 -2.6508163 6.350816 0.9672482 5-3 2.53333333 -1.9674829 7.034150 0.7644820 13-3 2.60000000 -1.9008163 7.100816 0.7332185 11-3 3.06666667 -1.4341496 7.567483 0.4915220 12-3 3.13333333 -1.3674829 7.634150 0.4571568 6-4 0.03333333 -4.4674829 4.534150 1.0000000 9-4 0.06666667 -4.4341496 4.567483 1.0000000 10-4 0.60000000 -3.9008163 5.100816 0.9999996 7-4 0.98333333 -3.5174829 5.484150 0.9999057 5-4 1.66666667 -2.8341496 6.167483 0.9856091 13-4 1.73333333 -2.7674829 6.234150 0.9802298 11-4 2.20000000 -2.3008163 6.700816 0.8923965 12-4 2.26666667 -2.2341496 6.767483 0.8711284 9-6 0.03333333 -4.4674829 4.534150 1.0000000 10-6 0.56666667 -3.9341496 5.067483 0.9999998 7-6 0.95000000 -3.5508163 5.450816 0.9999345 5-6 1.63333333 -2.8674829 6.134150 0.9878272 13-6 1.70000000 -2.8008163 6.200816 0.9830854 11-6 2.16666667 -2.3341496 6.667483 0.9021643 12-6 2.23333333 -2.2674829 6.734150 0.8820500 10-9 0.53333333 -3.9674829 5.034150 0.9999999 7-9 0.91666667 -3.5841496 5.417483 0.9999553 5-9 1.60000000 -2.9008163 6.100816 0.9897652 13-9 1.66666667 -2.8341496 6.167483 0.9856091 11-9 2.13333333 -2.3674829 6.634150 0.9113533 12-9 2.20000000 -2.3008163 6.700816 0.8923965 7-10 0.38333333 -4.1174829 4.884150 1.0000000 5-10 1.06666667 -3.4341496 5.567483 0.9997802 13-10 1.13333333 -3.3674829 5.634150 0.9995924 11-10 1.60000000 -2.9008163 6.100816 0.9897652 12-10 1.66666667 -2.8341496 6.167483 0.9856091 5-7 0.68333333 -3.8174829 5.184150 0.9999982 13-7 0.75000000 -3.7508163 5.250816 0.9999950 11-7 1.21666667 -3.2841496 5.717483 0.9991742 12-7 1.28333333 -3.2174829 5.784150 0.9986138 13-5 0.06666667 -4.4341496 4.567483 1.0000000 11-5 0.53333333 -3.9674829 5.034150 0.9999999 12-5 0.60000000 -3.9008163 5.100816 0.9999996 11-13 0.46666667 -4.0341496 4.967483 1.0000000 12-13 0.53333333 -3.9674829 5.034150 0.9999999 12-11 0.06666667 -4.4341496 4.567483 1.0000000 ``` --- # Plot: TukeyHSD ```r plot(TukeyHSD(model1, "fBlock", ordered=TRUE)) ``` <!-- --> --- # Test for interaction ```r with(cotton.data, tukey.add.test(Break, fBlock, fTrtmt)) ``` ``` ## ## Tukey's one df test for additivity ## F = 0.0022901 Denom df = 59 p-value = 0.9619934 ``` No interaction between blocking variables.