Swimmer Plot
Table of Contents
Introduction
A swimmer plot is a type of data visualization used in clinical trials/studies to represent the duration of treatment and the time to key events for individual patients. It is particularly useful for illustrating the progression of disease and the impact of treatments over time. In a swimmer plot, each patient is represented by a horizontal line (resembling a lane in a swimming pool), with various symbols or colors indicating significant events such as the start and end of treatment/response, occurrence of adverse/other events, measurement patterns, and loss/continue to follow-up. This makes swimmer plots a valuable tool for providing a clear, comprehensive overview of treatment efficacy and safety within a clinical trial.
The x-axis of the plot can represent the number of cycles or the number of months/days since C0D1 (For purpose of illustration, tou can have the plot starting from C1D1 by adding 1 to each time variables). In this post, we will plot events at CND1.
List of point shapes: https://www.datanovia.com/en/blog/ggplot-point-shapes-best-tips/
Reference and Resources
We recommend using the swimplot package as it’s straightforward for editing the legend and adding arrows. We suggest plot duration (lines) first, then points.
Colorblind Accessible Palette:
Reference Instructions:
https://cran.r-project.org/web/packages/swimplot/vignettes/Introduction.to.swimplot.html
https://www.khstats.com/blog/trt-timelines/multiple-vars
Library and Data
library(tidyverse)
library(ggplot2)
library(swimplot)
data("arm", "ae", "res")
List of Events
Treatment arm: Cohort 1 and Cohort 2 AE and SAE: Time point of event occurred and type of event. Response: Start/End time of response; type of response.
Code
Preparing the dataset
We recommend keeping different variables of interest in separate datasets to avoid confusion. For time-varying covariates such as treatment and response, add an indicator to specify whether the event is continuous at the last follow-up or not.
The example data is not perfect and has many missing details. Please carefully check your definitions for responses (i.e., how to define the start and end of a response), continuous treatment, and definitions of other key events.
arm = arm %>%
select(c(SUBJECT, ARM, END_TRT, CONTINUED_TREATMENT))
ae = ae %>%
select(c(SUBJECT, ARM, COURSE_NUM, EVENT)) %>%
group_by(SUBJECT, ARM, COURSE_NUM) %>%
filter(!(EVENT == "AE" & "SAE" %in% EVENT)) %>%
as.data.frame()
res = res %>%
select(c(SUBJECT, RESPONSE_START, RESPONSE_END, RESPONSE, CONTINUED_RESPONSE)) %>%
left_join(
arm %>% select(SUBJECT,END_TRT), by=("SUBJECT")
) %>%
mutate(
CONTINUED_RESPONSE = ifelse(RESPONSE_END==END_TRT & RESPONSE_START != RESPONSE_END, 1, NA),
RESPONSE = factor(RESPONSE, levels = c("PR","SD","PD","NE")) # Must has levels, otherwise the order will change
)
Take a look at the format of datasets.
head(arm)
## SUBJECT ARM END_TRT CONTINUED_TREATMENT
## 1 STUDY-001 Cohort 1 3 NA
## 2 STUDY-002 Cohort 1 2 NA
## 3 STUDY-003 Cohort 1 2 NA
## 4 STUDY-004 Cohort 1 2 NA
## 5 STUDY-005 Cohort 1 6 NA
## 6 STUDY-006 Cohort 1 1 NA
head(ae)
## SUBJECT ARM COURSE_NUM EVENT
## 1 STUDY-001 Cohort 1 2 AE
## 2 STUDY-002 Cohort 1 2 AE
## 3 STUDY-004 Cohort 1 2 AE
## 4 STUDY-005 Cohort 1 2 AE
## 5 STUDY-007 Cohort 1 2 AE
## 6 STUDY-007 Cohort 1 3 AE
head(res)
## SUBJECT RESPONSE_START RESPONSE_END RESPONSE CONTINUED_RESPONSE END_TRT
## 1 STUDY-001 2 2 SD NA 3
## 2 STUDY-002 2 2 PD NA 2
## 3 STUDY-003 2 2 PD NA 2
## 4 STUDY-004 2 2 PD NA 2
## 5 STUDY-005 2 4 SD NA 6
## 6 STUDY-005 4 6 SD 1 6
Using swimplot package
Part I: Timeline
Plot the duration of treatment of each participant. Note that the horizontal is y-axis
. And you will need ggplot2::
to call the function in ggplot2
.
The goal of this section is to present the overall timeline of the study. It doesn’t need to focus solely on treatment arms; it can also reflect patient statuses (e.g., on treatment 1, on treatment 2, off treatment) throughout the course of the study. You’ll need a long-format dataset that includes the event name
and end time
for each event associated with each subject.
treatment_dur = swimmer_plot(df=arm,id='SUBJECT',end='END_TRT',name_fill='ARM',
id_order='ARM',col="white",alpha=0.75,width=.8) +
ggplot2::scale_y_continuous(name = "Cycle",breaks=c(0:16)) +
scale_fill_manual(name="Treatment",values=c("Cohort 1" ="#fde0dd", "Cohort 2"="#c51b8a"))
# treatment_dur

Part II: Duration of Response/Events
Adding the duration of responses/events, as these are considered distinct events from those in part I.
Note that responses are typically plotted as points (see part III) at their starting time.
resp_dur_plot = treatment_dur +
swimmer_lines(df_lines=res,id='SUBJECT',start =
'RESPONSE_START',end='RESPONSE_END',name_col='RESPONSE',size=2) +
scale_color_manual(name="RESPONSE",values=c("#225ea8","#fdcc8a","#fc8d59",
"#d7301f"))
# resp_dur_plot

Part III: Event Time Point
Plot AEs.
ae_plot = resp_dur_plot +
swimmer_points(df_points=ae,
id='SUBJECT',time='COURSE_NUM',name_shape =
'EVENT',size=2,fill='white',col='black')
# ae_plot

Part IV: Arrows
Add arrows to show continuous treatment/response.
final_plot = ae_plot +
swimmer_arrows(df_arrows=res,id='SUBJECT',arrow_start='RESPONSE_END',
cont = 'CONTINUED_RESPONSE',name_col='RESPONSE',show.legend = FALSE,type =
"open",cex=1,arrow_positions = c(0.2,1)) + # arrow_positions = c(0.2,1) where 0.2 is the start position (END_TRT+0.2) of the arrow, 1 is the end position (END_TRT+1) of the arrow
annotate("text", x=2.5, y=20.2, label="Continued response",size=3.25)+
annotate("text",x=1.5, y=20, label=sprintf('\u2192'),size=8.25) + # the x- and y-position of arrow legent is quite tricky, need some tries.
coord_flip(clip = 'off', ylim = c(0, 17)) # ylim = c(your min trt cycle, your max txt cycle)
# final_plot

Session Info
sessionInfo()
## R version 4.4.0 (2024-04-24 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 22631)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=English_United States.utf8
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] swimplot_1.2.0 lubridate_1.9.0 timechange_0.1.1 forcats_1.0.0
## [5] stringr_1.5.1 dplyr_1.1.3 purrr_1.0.1 readr_2.1.4
## [9] tidyr_1.3.1 tibble_3.2.1 ggplot2_3.4.4 tidyverse_2.0.0
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.5 jsonlite_1.8.7 highr_0.10 compiler_4.4.0
## [5] tidyselect_1.2.1 jquerylib_0.1.4 scales_1.3.0 yaml_2.3.6
## [9] fastmap_1.1.1 R6_2.5.1 generics_0.1.3 knitr_1.46
## [13] bookdown_0.35 munsell_0.5.1 tzdb_0.3.0 bslib_0.5.1
## [17] pillar_1.9.0 rlang_1.1.1 utf8_1.2.3 stringi_1.7.12
## [21] cachem_1.0.8 xfun_0.43 sass_0.4.5 cli_3.6.1
## [25] withr_3.0.0 magrittr_2.0.3 digest_0.6.33 grid_4.4.0
## [29] rstudioapi_0.15.0 hms_1.1.3 lifecycle_1.0.4 vctrs_0.6.3
## [33] evaluate_0.22 glue_1.6.2 blogdown_1.18 fansi_1.0.4
## [37] colorspace_2.1-0 rmarkdown_2.25 tools_4.4.0 pkgconfig_2.0.3
## [41] htmltools_0.5.4