Tips of table1 R Package and Summary Tables
Table of Contents
Introduction
In this post, I will introduce how I utilize the table1 R package, which is designed to create publication-ready tables summarizing descriptive statistics and baseline characteristics of a study population. The package offers a user-friendly interface for generating customizable tables that can include categorical and continuous variables, stratified analyses, and statistical tests for group differences. Additionally, I will provide tips on using the table1 package.
- Beyond the examples provided here, should you wish for further customization, like merging two tables with different sample sizes or conducting other statistical tests, you may need to create your own code using basic R.
- More methods and render code will be updated, later… Hopefully!
Before we start
The package author has offered an excellent tutorial, please take a look to gain insights into using this package: https://benjaminrich.github.io/table1/vignettes/table1-examples.html
I will use the same dataset as the package author in this post for illustration.
I recommend preparing the data for table1 in a separate dataset. For instance, name the raw dataset as
dat
, the table1-specific dataset astable1_dat
, and the analysis dataset asanalysis_dat
. This separation helps when we want to apply different reference levels, variable labels, etc, for different purposes:- Set variables as factors, with levels ordered as they will appear in the table.
- Use the
expss::apply_labels
function to add labels to the column names
Sample Datasets
## Data from package
library(boot)
melanoma2 <- melanoma
# Factor the basic variables that
# we're interested in
melanoma2$status <-
factor(melanoma2$status,
levels=c(2,1,3),
labels=c("Alive", # Reference
"Melanoma death",
"Non-melanoma death"))
## Simulated data
f <- function(x, n, ...) factor(sample(x, n, replace=T, ...), levels=x)
set.seed(427)
n <- 146
dat <- data.frame(id=1:n)
dat$treat <- f(c("Placebo", "Treated"), n, prob=c(1, 2)) # 2:1 randomization
dat$age <- sample(18:65, n, replace=TRUE)
dat$sex <- f(c("Female", "Male"), n, prob=c(.6, .4)) # 60% female
dat$wt <- round(exp(rnorm(n, log(70), 0.23)), 1)
# Add some missing data
dat$wt[sample.int(n, 5)] <- NA
dat = dat %>%
mutate(treat = as.factor(treat),
sex = factor(sex, levels = c("Female", "Male")))
table1_dat = expss::apply_labels(dat,
age="Age",
sex="Sex",
wt="Weight",
treat="Treatment Group")
units(table1_dat$age) <- "years"
units(table1_dat$wt) <- "kg"
Render for Descriptive Statistics
Render option receives a function that specifies how to calculate summary statistics or p-value of each variable as an input.
The
render.missing=NULL
option will remove the “Missing” rows from the table, note however that the percentages will not change and will therefore not add up to 100% if there are missing values.
Render of Continuous Variables
- We will show N, mean (SD), min-max, median [Q1, Q3]. Uncomment the missing line if you want to show missing.
render.cont <- function(x, name, table1_data, ...) {
MIN <- min(x, na.rm = T)
MAX <- max(x, na.rm = T)
median <- median(x, na.rm = T)
Q1 <- quantile(x, 0.25, na.rm = T)
Q3 <- quantile(x, 0.75, na.rm = T)
N = length(x) - sum(is.na(x))
MEAN = mean(x, na.rm = T)
SD = sd(x, na.rm = T)
nmiss <- sum(is.na(x))
miss = (nmiss/length(x))*100
out <- c(#"N"=paste0("(N=",N,")"),
"N"=paste0(" "),
"Mean (SD)" = paste0(sprintf("%.1f",MEAN), " (", sprintf("%.1f",SD),")"),
"Min - Max" = paste0(sprintf("%.1f",MIN), " - ", sprintf("%.1f",MAX)),
"Median [Q1, Q3]" = paste0(sprintf("%.1f", median), " [", sprintf("%.1f", Q1), ", ", sprintf("%.1f", Q3), "]"))
# "Missing" = paste0(sprintf("%.0f",nmiss), " (", sprintf("%.1f",miss),"%)"))
out
}
Render of Categorical Variables
- will show N and column %.
render.cat = function(x) {
N = length(x) - sum(is.na(x))
FREQ_PCT = sub('.', '.', c(sapply(stats.default(x),
function(y) with(y, sprintf("%d (%0.1f %%)", FREQ, PCT)))), fixed = TRUE)
out = c(#"N"=paste0("(N=",N,")"),
"N"=paste0(" "),
FREQ_PCT)
out
}
Render for Statistical Tests
Two-Sample Parametric Tests
- Calculating p-values parametric tests to compare two independent groups. If the variable is numeric, run two-sample t-test, if the variable is categorical, run chi-square test.
pvalue_para <- function(x, ...) {
# Construct vectors of data y, and groups (strata) g
y <- unlist(x)
g <- factor(rep(1:length(x), times=sapply(x, length)))
if (is.numeric(y)) {
# For numeric variables, perform a standard 2-sample t-test
p <- t.test(y ~ g)$p.value
} else {
if (nrow(table(y,g)) !=1) {
# For categorical variables, perform a chi-squared test of independence
p <- chisq.test(table(y, g))$p.value
}
else {p <- ""}
}
# Format the p-value, using an HTML entity for the less-than sign.
# The initial empty string places the output on the line below the variable label.
if (!p %in% "") {
c("", sub("<", "<", format.pval(p, digits=3, eps=0.001)))
}
}
Two-Sample Non-Parametric Tests
- Calculating p-values from non-parametric tests to compare two independent groups. If the variable is numeric, run Wilcoxon Rank Sum test, if the variable is categorical, run Fisher’s exact test.
pvalue_nonpara <- function(x, ...) {
# Construct vectors of data y, and groups (strata) g
y <- unlist(x)
g <- factor(rep(1:length(x), times=sapply(x, length)))
if (is.numeric(y)) {
# For numeric variables, perform a standard 2-sample t-test
p <- wilcox.test(y ~ g)$p.value # non-parameric: wilcoxon rank sum test
} else {
if (nrow(table(y,g)) !=1) {
# For categorical variables, perform a chi-squared test of independence
p <- fisher.test(table(y, g))$p.value # non-parameric: fisher's exact test
}
else {p <- ""} # If only one level has non-missing counts, skip the testing.
}
# Format the p-value, using an HTML entity for the less-than sign.
# The initial empty string places the output on the line below the variable label.
if (!p %in% "") {
c("", sub("<", "<", format.pval(p, digits=3, eps=0.001)))
}
}
Multi-Sample Parametric Tests
- Calculating p-values parametric tests to assess the difference between the means of more than two groups. If the variable is numeric, run ANOVA test, if the variable is categorical, run chi-square test to assess whether the population proportions are equal.
pvalueANOVA <- function(x, ...) {
# Construct vectors of data y, and groups (strata) g
y <- unlist(x)
g <- factor(rep(1:length(x), times=sapply(x, length)))
if (is.numeric(y)) {
# For numeric variables, perform an ANOVA test
ano <- aov(y ~ g)
p <- summary(ano)[[1]][[5]][1]
} else {
if (nrow(table(y,g)) >=2) {
# For categorical variables, perform a chi-squared test of independence
p <- chisq.test(table(y, g))$p.value
}
else {p <- ""}
}
# Format the p-value, using an HTML entity for the less-than sign.
# The initial empty string places the output on the line below the variable label.
c("", sub("<", "<", format.pval(p, digits=3, eps=0.001)))
}
Multi-Sample Non-Parametric Tests
- Calculating p-values non-parametric tests to assess the difference between the means of more than two groups. If the variable is numeric, run Kruskal-Wallis test, if the variable is categorical, run Fisher’s exact test to assess whether the population proportions are equal.
pvalueKW <- function(x, ...) {
# Construct vectors of data y, and groups (strata) g
y <- unlist(x)
g <- factor(rep(1:length(x), times=sapply(x, length)))
if (is.numeric(y)) {
# For numeric variables, perform a Kruskal-Wallis test
km <- kruskal.test(y ~ g)
p <- km$p.value
} else {
if (nrow(table(y,g)) >=2) {
# For categorical variables, perform a fisher's exact test of independence
p <- fisher.test(table(y, g))$p.value
}
else {p <- ""}
}
# Format the p-value, using an HTML entity for the less-than sign.
# The initial empty string places the output on the line below the variable label.
c("", sub("<", "<", format.pval(p, digits=3, eps=0.001)))
}
Sample table1 Code with Overall
- The
render.missing=NULL
option will remove the “Missing” rows from the table, note however that the percentages will not change and will therefore not add up to 100% if there are missing values.
table1_ex1 = table1(~ age + sex + wt | treat, data=table1_dat,
render.continuous=render.cont, # optional, your continuous variable render
render.categorical=render.cat, # optional, your categorical variable render
# render.missing=NULL,
overall = "Modify Overall Column Name Here"
)
table1_ex1
Placebo (N=52) | Treated (N=94) | Modify Overall Column Name Here (N=146) | |
---|---|---|---|
Age (years) | |||
Mean (SD) | 39.2 (14.2) | 40.1 (13.3) | 39.8 (13.6) |
Min - Max | 18.0 - 65.0 | 18.0 - 65.0 | 18.0 - 65.0 |
Median [Q1, Q3] | 37.5 [26.8, 50.5] | 39.5 [30.0, 50.0] | 39.0 [28.2, 50.0] |
Sex | |||
Female | 34 (65.4 %) | 53 (56.4 %) | 87 (59.6 %) |
Male | 18 (34.6 %) | 41 (43.6 %) | 59 (40.4 %) |
Weight (kg) | |||
Mean (SD) | 68.1 (16.3) | 68.3 (16.7) | 68.2 (16.5) |
Min - Max | 37.5 - 116.3 | 40.0 - 118.8 | 37.5 - 118.8 |
Median [Q1, Q3] | 66.7 [57.2, 77.0] | 64.9 [57.4, 75.9] | 66.2 [57.3, 76.4] |
Missing | 2 (3.8%) | 3 (3.2%) | 5 (3.4%) |
Sample table1 Code with Customized render and p-value Calcualtion
⚠️**To conduct tests, the ‘overall’ option must be set to FALSE.**⚠️
table1_ex2 = table1(~ age + sex + wt | treat, data=table1_dat,
render.continuous=render.cont, # optional, your continuous variable render
render.categorical=render.cat, # optional, your categorical variable render
overall = F,
extra.col=list(`Parametric P-value`=pvalue_para,
`Non-Parametric P-value`=pvalue_nonpara)
)
table1_ex2
Placebo (N=52) | Treated (N=94) | Parametric P-value | Non-Parametric P-value | |
---|---|---|---|---|
Age (years) | ||||
Mean (SD) | 39.2 (14.2) | 40.1 (13.3) | 0.719 | 0.66 |
Min - Max | 18.0 - 65.0 | 18.0 - 65.0 | ||
Median [Q1, Q3] | 37.5 [26.8, 50.5] | 39.5 [30.0, 50.0] | ||
Sex | ||||
Female | 34 (65.4 %) | 53 (56.4 %) | 0.376 | 0.379 |
Male | 18 (34.6 %) | 41 (43.6 %) | ||
Weight (kg) | ||||
Mean (SD) | 68.1 (16.3) | 68.3 (16.7) | 0.944 | 0.936 |
Min - Max | 37.5 - 116.3 | 40.0 - 118.8 | ||
Median [Q1, Q3] | 66.7 [57.2, 77.0] | 64.9 [57.4, 75.9] | ||
Missing | 2 (3.8%) | 3 (3.2%) |
table1_ex3 = table1(~ factor(sex) + age + factor(ulcer) + thickness | status, data=melanoma2,
render.continuous=render.cont, # optional, your continuous variable render
render.categorical=render.cat, # optional, your categorical variable render
overall = F,
extra.col=list(`Parametric P-value`=pvalueANOVA,
`Non-Parametric P-value`=pvalueKW)
)
table1_ex3
Alive (N=134) | Melanoma death (N=57) | Non-melanoma death (N=14) | Parametric P-value | Non-Parametric P-value | |
---|---|---|---|---|---|
factor(sex) | |||||
0 | 91 (67.9 %) | 28 (49.1 %) | 7 (50.0 %) | 0.0335 | 0.0325 |
1 | 43 (32.1 %) | 29 (50.9 %) | 7 (50.0 %) | ||
age | |||||
Mean (SD) | 50.0 (15.9) | 55.1 (17.9) | 65.3 (10.9) | 0.0016 | 0.00148 |
Min - Max | 4.0 - 84.0 | 14.0 - 95.0 | 49.0 - 86.0 | ||
Median [Q1, Q3] | 52.0 [40.0, 61.8] | 56.0 [44.0, 68.0] | 65.0 [57.0, 71.8] | ||
factor(ulcer) | |||||
0 | 92 (68.7 %) | 16 (28.1 %) | 7 (50.0 %) | <0.001 | <0.001 |
1 | 42 (31.3 %) | 41 (71.9 %) | 7 (50.0 %) | ||
thickness | |||||
Mean (SD) | 2.2 (2.3) | 4.3 (3.6) | 3.7 (3.6) | <0.001 | <0.001 |
Min - Max | 0.1 - 12.9 | 0.3 - 17.4 | 0.2 - 12.6 | ||
Median [Q1, Q3] | 1.4 [0.8, 2.9] | 3.5 [2.2, 4.8] | 2.3 [1.3, 5.8] |
Knit to PDF or HTML
Table1 object can be converted to kable or flextable using build-in functions in the table1 package. Both of them provide features to easily create tables for reporting and publications.
I will convert my table1 object to a flexible format if I’m generating a PDF report. If the knit output is in HTML format, all three methods (table1, kable, flextable) will perform equally well.
kable and kableExtra vignettes: https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html
flextable book: https://ardata-fr.github.io/flextable-book/
Table1 to Kable
t1kable(your_table1_object)
Table1 to FlexTable
t1flex(your_table1_object)
table1_ex1 %>%
t1flex() %>%
line_spacing(space = 0, part = "body") %>%
bg(bg = "white", part = "all") # Change the background color to 'white' to display the table properly in 'Dark' mode. This step is unnecessary if you're generating your own report.
| Placebo | Treated | Modify Overall Column Name Here |
---|---|---|---|
Age (years) | |||
Mean (SD) | 39.2 (14.2) | 40.1 (13.3) | 39.8 (13.6) |
Min - Max | 18.0 - 65.0 | 18.0 - 65.0 | 18.0 - 65.0 |
Median [Q1, Q3] | 37.5 [26.8, 50.5] | 39.5 [30.0, 50.0] | 39.0 [28.2, 50.0] |
Sex | |||
Female | 34 (65.4 %) | 53 (56.4 %) | 87 (59.6 %) |
Male | 18 (34.6 %) | 41 (43.6 %) | 59 (40.4 %) |
Weight (kg) | |||
Mean (SD) | 68.1 (16.3) | 68.3 (16.7) | 68.2 (16.5) |
Min - Max | 37.5 - 116.3 | 40.0 - 118.8 | 37.5 - 118.8 |
Median [Q1, Q3] | 66.7 [57.2, 77.0] | 64.9 [57.4, 75.9] | 66.2 [57.3, 76.4] |
Missing | 2 (3.8%) | 3 (3.2%) | 5 (3.4%) |