Author: Samantha Bouwmeester, Tilburg University

During my workshop ‘Power of a randomization test in a single case experimental design’ at the 2nd International Symposium on N=1 Designs in 2021, one of the participants asked whether the concept of power in single-case designs (SCDs) is paradoxical since the small number of participants typically involved in a single-case study will always lead to a low probability to find a statistically reliable effect. It was a relevant question, as sample size is clearly a very important factor affecting power in any statistical design. In most research, the sample size is the only factor that can be manipulated *a priori* by the researcher to increase statistical power. In contrast, there are various factors that can be considered in a study using a SCD; the factors that influence statistical power in a SCD include the size of the effect of an intervention, the variability in outcome scores, the within- or between-individual variance, the number of outcomes, and the type I error rate. The number of measurements collected from the same individual is particularly important, as a greater number of measurements increases the probability that a true intervention effect can be observed for an individual participant.

Nonparametric randomization tests are often used to test hypotheses in SCDs, as the assumptions of the parametric analyses are not met most of the time. That is, the sample size is too small and the serial measurements from the same participant may not be independent. In randomization tests the number of permutations influences the power, and this number is determined by the number of measurements from the same individual. The statistical power required to find a significant effect in a randomization test at the individual participant level will generally be low. I say ‘generally’ because it might be possible to find a significant effect when the size of the effect is very large and/or the number of measurements taken from the same participant is large and the autocorrelation (i.e. serial dependency between adjacent data points) is small. When autocorrelation is large, it may become more difficult to identify the effect of an intervention due to inflated standard errors.

**A Single-Case Design shiny app to explore statistical power in SCDs**

It can be worthwhile to investigate how different parameters influence statistical power in a study using a SCD prior to conducting it. When practical, a researcher may increase the number of baseline and/or intervention measurements to maximize the power at the individual level and/or increase the sample size to have more power at group level. In the Single-Case Designs shiny app (an interactive web application built in R), researchers can do visual analysis, draw start moments, calculate sample size, and conduct randomization tests for various SCDs. Researchers can also run simulations to find out the power of a SCD with one or a few participants or to estimate the number of participants that would be required to have sufficient statistical power to draw conclusions about between-subject treatment effectiveness. In the Single-Case Design shiny app, I distinguish several design properties that may affect the power (for a detailed explanation see Bouwmeester & Jongerling, 2020). These properties (which may differ for different kinds of randomization tests and designs) are; the (minimum) number of baseline and intervention measurements, the type of effect size estimate (e.g. Cohen’s d, Tau, Tau-corrected, PND, median difference, etc.), the within-individual variation in scores, the type I error rate, the degree of autocorrelation, whether the participants start the intervention at the same time moment or not, the percentage of missing observations and outlier scores from the same participant, and the percentage of outlying scores within a participant. Researchers can easily run the analysis for different scenarios and evaluate how these specific properties impact on statistical power while keeping other properties constant. The most influential property is the size of the effect and, unfortunately, this cannot easily be controlled. For example, the requirement that the start moments of the intervention differ for all participants has a more negative effect on statistical power when the total number of measurements within participants is small than when it is large. The goal is to understand how different design property combinations affect statistical power of a proposed SCD and to use this information to optimise the study design.

**Illustrative example of using the Single-Case Design shiny app**

A researcher is planning to conduct a concurrent multiple baseline design (MBD) with 6 intervention days of 3 participants/day (i.e. 3 patients per day, one per tier, with 6 replications intervention days). In a concurrent MBD, participants commence the baseline phase at the same time but the intervention is delivered to each participant at a different time, resulting in a different number of observations in the baseline phase for each participant. In this hypothetical example, the primary outcome is pain measured daily on a visual analogue scale from 0-100. Daily pain scores will be collected for 6, 7 or 8 weeks during the baseline phase and every 2 days for 6 months during the intervention phase. The researcher wants to know what statistical power will be obtained with a sample size of 18 participants. The calculation can be based on a range of assumptions (e.g., anticipated effect size based on pilot study data, autocorrelation, and missing data).

On the first screen of the shiny app, the researcher selects Randomization Test (Figure 1)

Power Analysis is selected, and three boxes appear. The researcher selects the type of randomization test, power or sample size and test statistic from the drop-down menus (Figure 2). In this hypothetical example, ‘multiple baseline design’, ‘power’ and ‘Cohen’s d’ are selected.

**Figure 1. First screen of the Single-Case Design shiny app where ‘Randomization Test’ is selected.**

**Figure 2. Once Power Analysis has been selected three purple boxes appear **

After completing the previous step, a range of white boxes appear for the number of participants, effect size, number of measurements (including number of measurements in the baseline and intervention phase) and within-individual percentage of outliers and missing data. In this hypothetical example, the number of participants, effect size, number of measurements (including number of measurements in the baseline and intervention phase) and within-individual percentage of outliers and missing data was entered (Figure 3). The researcher clicks the Power button to start the simulations. This step can take a while, you can see the progress in the plot which shows the updated power after each simulation (Figure 4).

**Figure 3. Screenshot with hypothetical data entered**

**Figure 4. Output of power for the parameters entered across 100 simulations.**

Assistance with using the Single-Case Design shiny app and can be obtained from the author (s.architecta@outoftheboxplot.site).

For an extended discussion of the use of ransomization tests in single-case designs, see

Bouwmeester, S., & Jongerling, J. (2020). Power of a randomization test in a single case multiple baseline AB design. *PLoS One*, *15*(2), e0228355, https://doi.org/10.1371/journal.pone.0228355

**About the Author: **

Samantha Bouwmeester is a statistics and methodology researcher with interests in education and mental health. She has a Ph.D. in methodology and statistics in the social sciences, and her current research interests include statistical testing in single-case designs, computerized adaptive testing, research methods and psychometrics. She is currently affiliated at Tilburg University and is the owner of Out of the Box Plot statistical consultancy agency.

## Comments