class: center, middle, inverse, title-slide # Design and Analysis of Experiments Using R ## Split Plot Design ### Olga Lyashevska ### 2021-11-25 --- # Factorial and fractional factorial - Combinations of factor levels can be randomly assigned to the experimental units; - Randomisation of treatment combinations to experimental units guarantees the validity of the conclusions; - Sometimes the levels of one or more factors are more difficult to change; - Complete randomization of factor-level combinations to experimental units could make the experiment much more time consuming or perhaps impossible to conduct. - Frequent in process improvement studies where the process consists of several steps, and some factors in an experiment relate to one process step while others relate to others. --- # What is a Split Plot Design? - A split plot design is a special case of a factorial treatment structure. - It is used when some factors are harder (or more expensive) to vary than others. - Consists of two experiments with *different experimental units* and *different errors*. - Originated in agricultural experiments where certain factors require large experimental units, whereas other factors can be easily applied to smaller plots of land. --- # Origin The name split-plot comes from the original application in agricultural experiments, where the levels of some factors (called **whole-plot** factors) could only be varied between plots of land, while the levels of other factors could be varied within a plot (**split-plot** factors). The randomisation is such that the **hard-to-vary** factors are not changed as frequently as **easy-to-vary** factors. This makes it more convenient to conduct the list of experiments. --- # Example I: Irrigation and corn variety Consider the following factorial problem: - 3 different irrigation levels - 4 different corn varieties - Response: biomass - Available resources: 6 plots of land --- # Example I: Irrigation and corn variety By definition we can not vary the irrigation on a small scale. .center[] --- # Example I: Irrigation and corn variety - We are “forced” to use “large” experimental units for the irrigation level factor. - Assume that we can use a specific irrigation level on each of the 6 plots. --- # Example I: Irrigation and corn variety - Randomly assign each irrigation level to 2 of the plots (whole plots). - In each plot randomly assign the 4 different corn varieties (split plots). <img src="figs/Screenshot_20210202_124146.png" style="width: 90%" /> - Two independent randomizations are being performed! - Irrigation level is the whole-plot factor and corn variety the split-plot factor. --- # Example I: Irrigation and corn variety - Whole plots (plots of land) are the experimental units for the whole-plot factor (irrigation level). - Split plots (subplots of land) are the experimental units for the split-plot factor. - In this settings whole plots act as blocks. - Basically, we are performing two different experiments in one: - each experiment has its own randomisation - each experiment has its own experimental unit --- # Example I: Irrigation and corn variety Split plot is *mixed model* with fixed and random effects. Fixed effects are variables that we expect will have an effect on the response variable: explanatory variables in a standard linear regression. In our case, we are interested in the effect of corn variety on biomass. So corn variety is a fixed effect and biomass is the response (dependent) variable. Random effects are usually grouping factors for which we are trying to control. We might not be specifically interested in their impact on the response variable, but we know that they might be influencing the patterns we see. However, this division is subjective. --- # Example II: Paper Manufacturer A paper manufacturer is interested in 3 different pulp preparation methods and 4 different cooking temperatures for the pulp and wishes to study the effect of these two factors on the tensile strength of the paper. Each replicate of a factorial experiment requires 12 observations, and the experimenter has decided to run 3 replicates. This will require a total of 36 runs. --- # Example II: Paper Manufacturer The experimenter decides to conduct the experiment as follows: - A batch of pulp is produced by one of the three methods under study. - Then this batch is divided into four samples, and each sample is cooked at one of the four temperatures. - Then a second batch of pulp is made up using another of the three methods. - This second batch is also divided into four samples that are tested at the four temperatures. - The process is then repeated, until all three replicates (36 runs) of the experiment are obtained. --- # Example II: Paper Manufacturer <img src="figs/Screenshot_20210203_115908.png" style="width: 100%" /> This experiments resembles a Factorial Experiments with 3 levels of pulp preparation method (Factor A) and 4 levels of Temperature (Factor B). - randomly select a treatment combination (a preparation method and a temperature) and obtain an observation - repeat until all 36 observations have been taken. --- # Example II: Paper Manufacturer However, the experimenter did not collect the data this way. Instead he made up a batch of pulp and obtained observations for all four temperatures from that batch. A completely randomized factorial experiment would require 36 batches of pulp, which is completely unrealistic. The split-plot design requires only 9 batches total - a considerable experimental efficiency! --- # Example II: Paper Manufacturer - In this split-plot design there are 9 whole plots, and the preparation methods are called the whole plot or main treatments. - Each whole plot is divided into four parts called subplots (or split-plots), and one temperature is assigned to each. Temperature is called the subplot treatment. <img src="figs/Screenshot_20210203_121437.png" style="width: 90%" /> --- # Example II: Model `$$y_{ijk} = \mu + \tau_i + \beta_{j} + (\tau\beta)_{ij} + \gamma_k + (\beta\gamma)_{jk} + \epsilon_{ijk}$$` where: `\(i = 1, 2, . . . , r\)`, `\(j = 1, 2, . . . , a\)`, `\(k = 1, 2, . . . , b\)` whole-plot: - `\(\tau_{i}\)`: replicates - `\(\beta_j\)`: main treatments (factor A) - `\((\tau\beta)_{ij}\)`: whole-plot error (replicates x A) split-plot: - `\(\gamma_k\)`: treatment (factor B) - `\((\beta\gamma)_{jk}\)`: replicates x B interactions - `\(\epsilon_{ijk}\)`: split-plot error. --- # Example II: Paper Manufacturer This results in two different error structures for the experiment. Split-plot error is typically less than the whole-plot error because the split-plots are generally more homogeneous than the whole plots. --- # Example III: Oats Dataset `oats` from R-package `MASS` Description: The yield of oats from a split-plot field trial using three varieties and four levels of manurial (nitrogen) treatment. The experiment was laid out in 6 blocks of 3 main plots, each split into 4 sub-plots. The varieties were applied to the main plots and the manurial treatments to the sub-plots. Overview of data: - 6 different blocks (B) - 3 different varieties (V) - 4 different nitrogen treatments (N) - Response (Y) - yields --- # Example III: Oats ```r library(MASS) data(oats) summary(oats) ``` ``` ## B V N Y ## I :12 Golden.rain:24 0.0cwt:18 Min. : 53.0 ## II :12 Marvellous :24 0.2cwt:18 1st Qu.: 86.0 ## III:12 Victory :24 0.4cwt:18 Median :102.5 ## IV :12 0.6cwt:18 Mean :104.0 ## V :12 3rd Qu.:121.2 ## VI :12 Max. :174.0 ``` --- # Example III: Oats <img src="figs/Screenshot_20210202_130439.png" style="width: 90%" /> --- # Example III: Oats - This is a more complicated design as before as we have an additional block factor. - A whole-plot is given by a plot of land in a block. - The whole-plot factor is variety. - A block design (RCB) was used at the whole-plot level. - A split plot is given by a subplot of land. - The split-plot factor is given by nitrogen treatment. --- # Example III: Model - We have an RCB for the whole-plot factor. - The experimental unit on the whole-plot level is given by the combination of block and variety. <img src="figs/Screenshot_20210202_131059.png" style="width: 90%" /> --- # Example III: Oats In R we use the [`lme4::lmer`](https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf) function with an extra random effect (error) per combination of block and variety. ```r # ?lme4::lmer library(lme4) fit.lme<-lmer(Y~B+V*N+(1|B:V), data=oats) anova(fit.lme) ``` ``` ## Analysis of Variance Table ## npar Sum Sq Mean Sq F value ## B 5 4675.0 935.0 5.2800 ## V 2 526.1 263.0 1.4853 ## N 3 20020.5 6673.5 37.6856 ## V:N 6 321.7 53.6 0.3028 ``` --- # Split-plot design in R The effects of wheat variety and rates of a herbicide: - 3 blocks (replicates) - in each block there are two plots (whole-plots) to grow a variety of wheat (A0, A1); - the whole-plot is divided into four split-plots (sub-plots) to apply different rates of a herbicide (B0, B1, B2, B3). How many observations are there in total? --- # Split-plot design in R <img src="figs/Screenshot_20210202_140453.png" style="width: 90%" /> There are 24 observations. --- # Split-plot design in R ```r rm(list=ls()) #clean up memory data <- data.frame(list( observation = 1:24, replicate = rep(c("I","II","III"), each = 8), A = rep(rep(c("A0","A1"), each = 4), times = 3), B = rep(c("B0","B1","B2","B3"), times = 6))) yield <- c(13.8,15.5,21.0,18.9,19.3,22.2,25.3,25.9, 13.5,15.0,22.7,18.3,18.0,24.2,24.8,26.7, 13.2,15.2,22.3,19.6,20.5,25.4,28.4,27.6) data$yield <- yield ``` --- # Split-plot design in R ```r head(data) ``` ``` ## observation replicate A B yield ## 1 1 I A0 B0 13.8 ## 2 2 I A0 B1 15.5 ## 3 3 I A0 B2 21.0 ## 4 4 I A0 B3 18.9 ## 5 5 I A1 B0 19.3 ## 6 6 I A1 B1 22.2 ``` --- # Interaction plot ```r with(data, interaction.plot(x.factor = A, trace.factor = B, response = yield)) ``` <!-- --> --- # Split-plot design in R Naive model. Without knowledge of the design, the analysis below is not accurate. ```r aov.bad <- aov(yield ~ replicate + A*B, data = data) summary(aov.bad) ``` ``` ## Df Sum Sq Mean Sq F value Pr(>F) ## replicate 2 7.87 3.93 4.486 0.0312 * ## A 1 262.02 262.02 298.862 7.68e-11 *** ## B 3 215.26 71.75 81.843 4.08e-09 *** ## A:B 3 18.70 6.23 7.109 0.0039 ** ## Residuals 14 12.27 0.88 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` --- # Split-plot design in R ```r aov.good<- aov(yield ~ replicate + A*B + replicate:A, data = data) summary(aov.good) ``` ``` ## Df Sum Sq Mean Sq F value Pr(>F) ## replicate 2 7.87 3.93 6.520 0.01211 * ## A 1 262.02 262.02 434.388 8.61e-11 *** ## B 3 215.26 71.75 118.956 3.43e-09 *** ## A:B 3 18.70 6.23 10.333 0.00121 ** ## replicate:A 2 5.04 2.52 4.174 0.04206 * ## Residuals 12 7.24 0.60 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` <!-- summary(lmer(yield~replicate+ A*B +(1|replicate:A), data=data)) --> --- # Model diagnostic ```r plot(aov.good, 1) ``` <!-- --> --- # Model diagnostic ```r plot(aov.good, 2) ``` <!-- --> --- <!-- # Example IV: Cookies --> <!-- A factorial experiment was devised to determine if there was a way to change the process of making the cookies. --> <!-- The factors that were chosen to be varied were: --> <!-- - A: the amount of shortening in the dough batch (80% of what the recipe called for or 100%), --> <!-- - B: baking temperature (low, high) --> <!-- - C: tray temperature (hot out of the oven, or cooled to room temperature). --> <!-- - Response: the diameter of the baked cookies. --> <!-- --- --> <!-- # Example IV: Cookies --> <!-- The amount of shortening was a hard-to-vary factor because each time it was changed it required making a new batch of cookie dough, while the baking temperature and tray temperature were easy to vary. --> <!-- There are 4 batches of dough, each was enough to make six trays of cookies. --> <!-- --- --> <!-- # Example IV: Design in R --> <!-- ```{r message=FALSE, warning=FALSE} --> <!-- library(FrF2) --> <!-- design <- FrF2(16, 3, --> <!-- WPs = 4, nfac.WP = 1, --> <!-- factor.names = list(short = c("80%", "100%"), --> <!-- bakeT = c("low", "high"), --> <!-- trayT = c("low", "high"))) --> <!-- ``` --> <!-- The arguments 16 and 3 indicate `\(2^{4}\)` runs with 3 factors. --> <!-- - WPs = 4 indicates four whole-plots (4 batches) --> <!-- - nfac.WP = 1 indicates one whole-plot factor (batch). --> <!-- --- --> <!-- # Example IV: Design in R --> <!-- ```{r} --> <!-- head(design) --> <!-- ``` --> <!-- --- --> <!-- # Example IV: Analysis in R --> <!-- For every `level` (whole-plot / split-plot) of the experiment we have to introduce a corresponding random effect (error). --> <!-- When there are both fixed and random effects in the model, due to the --> <!-- way the experiment was conducted, we call it a *mixed model*. --> <!-- The `lmer` function in the R package `lme4` is a better option than `aov` for analysis. --> <!-- By default, `lmer` uses the REML (Restricted Maximum Likelihood) method. --> <!-- --- --> <!-- # Example IV: Download data --> <!-- ```{r message = FALSE} --> <!-- library(daewr) --> <!-- data("splitPdes") --> <!-- # help(splitPdes) --> <!-- head(splitPdes) --> <!-- ``` --> <!-- --- --> <!-- # Example IV: Linear mixed-effect model --> <!-- ```{r} --> <!-- library(lme4) --> <!-- rmodel <- lmer(y ~ 1 + short + bakeT + trayT + --> <!-- short:bakeT + short:trayT + --> <!-- bakeT:trayT + short:bakeT:trayT + --> <!-- (1|short:batch), data = splitPdes) --> <!-- anova(rmodel) --> <!-- ``` --> <!-- --- --> <!-- # Example IV: Linear mixed-effect model --> <!-- The P-values are not given (for a valid reason), however they can be easily calculated using cumulative distribution function `pf` (F value, df1, df2), the p standing for 'probability'. --> <!-- For the shortening effect: --> <!-- ```{r} --> <!-- # upper tail test --> <!-- 1- pf(41.0738,1,2) --> <!-- # Also see here: --> <!-- # help("pvalues", package="lme4") --> <!-- ``` --> <!-- --- --> <!-- # Example IV: Linear mixed-effect model --> <!-- In addition to producing the F-tests on the fixed effect in the model, the --> <!-- `lmer` function provides estimates of the variances of the random effects. --> <!-- ```{r} --> <!-- # summary(rmodel) --> <!-- ``` --> <!-- ``` --> <!-- Random effects: --> <!-- Groups Name Variance Std.Dev. --> <!-- short:batch (Intercept) 0.012332 0.11105 --> <!-- Residual 0.009831 0.09915 --> <!-- Number of obs: 24, groups: short:batch, 4 --> <!-- ``` --> <!-- Here we see that the variability of batches of dough = 0.0123, and that --> <!-- the variability from tray to tray within a batch = 0.00983. --> <!-- --- --> <!-- # Example IV: Conclusions --> <!-- The shortening effect, the bake temperature effect, the tray temperature effect, and the shortening by tray temperature interaction effect are all significant. --> <!-- Since there is a significant interaction between shortening and tray temperature, the effect of shortening depends upon the tray temperature. --> # Summary - Despite its agricultural basis, the split-plot design is useful in many scientific and industrial experiments. - When some factors require large experimental units, whereas other factors require small experimental units. - The hard-to-vary factors form the whole plots, whereas the easy-to-vary factors are run in the split-plots. - Split-plot design is often employed in combination with other designs, such as completely randomised block design, factorial design or fractional factorial.