| Title: | Tools for Designing, Simulating, and Analyzing Implementation Rollout Trials |
|---|---|
| Description: | Provides a unified framework for designing, simulating, and analyzing implementation rollout trials, including stepped wedge, sequential rollout, head-to-head, multi-condition, and rollout implementation optimization designs. The package enables users to flexibly specify rollout schedules, incorporate site-level and nested data structures, generate outcomes under rich hierarchical models, and evaluate analytic strategies through simulation-based power analysis. By separating data generation from model fitting, the tools support assessment of bias, Type I error, and robustness to model misspecification. The workflow integrates with standard mixed-effects modeling approaches and the tidyverse ecosystem, offering transparent and reproducible tools for implementation scientists and applied statisticians. |
| Authors: | Ian Cero [aut, cre] (ORCID: <https://orcid.org/0000-0002-2862-0450>), C. Hendricks Brown [aut] (ORCID: <https://orcid.org/0000-0002-0294-2419>) |
| Maintainer: | Ian Cero <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-05-12 09:10:02 UTC |
| Source: | https://github.com/iancero/rollout |
Generates a binary outcome by summing effects, computing probabilities via the logistic function, and drawing binary outcomes.
add_binary_outcome( data, linear_col = "y_linear", prob_col = "y_prob", binary_col = "y_bin" )add_binary_outcome( data, linear_col = "y_linear", prob_col = "y_prob", binary_col = "y_bin" )
data |
A data frame containing effect columns prefixed with |
linear_col |
Name of the column to store the summed linear predictor (default |
prob_col |
Name of the column to store probabilities (default |
binary_col |
Name of the column to store binary outcomes (default |
A tibble with added linear predictor, probability, and binary outcome columns.
df <- tibble::tibble(.beta = 0.5, .u = rnorm(5), .error = rnorm(5)) add_binary_outcome(df)df <- tibble::tibble(.beta = 0.5, .u = rnorm(5), .error = rnorm(5)) add_binary_outcome(df)
Adds a residual error term (column .error) to the data frame, drawn from a normal distribution with specified variance.
add_error(.data, variance = 1)add_error(.data, variance = 1)
.data |
A data frame to which the error term will be added. |
variance |
Numeric; variance of the residual error (default |
A tibble with an added .error column.
df <- tibble::tibble(x = 1:5) add_error(df, variance = 2)df <- tibble::tibble(x = 1:5) add_error(df, variance = 2)
Adds a fixed effect column (prefixed with ".") to the design data frame for simulation purposes.
add_fixed_effect(design_df, ...)add_fixed_effect(design_df, ...)
design_df |
A data frame containing the rollout design and any parameters. |
... |
A single named expression specifying the fixed effect to add (e.g., |
A tibble with the added fixed effect column.
df <- tibble::tibble(x = rnorm(5)) add_fixed_effect(df, beta = 0.5 * x)df <- tibble::tibble(x = rnorm(5)) add_fixed_effect(df, beta = 0.5 * x)
Generates a linear outcome variable by summing all columns that start with "." (representing fixed, random, and error effects).
add_linear_outcome(data, output_col = "y_linear")add_linear_outcome(data, output_col = "y_linear")
data |
A data frame containing effect columns prefixed with |
output_col |
Name of the column to store the linear outcome (default |
A tibble with the added linear outcome column.
df <- tibble::tibble(.beta = 0.5, .u = rnorm(5), .error = rnorm(5)) add_linear_outcome(df)df <- tibble::tibble(.beta = 0.5, .u = rnorm(5), .error = rnorm(5)) add_linear_outcome(df)
Adds combinations of specified parameter values to a data frame for simulation by expanding over all combinations.
add_parameter(df, ...)add_parameter(df, ...)
df |
A data frame to expand. |
... |
Named vectors specifying parameter values to expand, provided as |
A tibble with added parameter columns for each combination of values.
df <- tibble::tibble(site = "A", condition = "control") add_parameter(df, beta = c(0, 0.5), sigma = c(1, 2))df <- tibble::tibble(site = "A", condition = "control") add_parameter(df, beta = c(0, 0.5), sigma = c(1, 2))
Generates a Poisson-distributed count outcome by summing effects, exponentiating to obtain rates, and drawing counts.
add_poisson_outcome( data, linear_col = "y_linear", rate_col = "y_rate", count_col = "y_count" )add_poisson_outcome( data, linear_col = "y_linear", rate_col = "y_rate", count_col = "y_count" )
data |
A data frame containing effect columns prefixed with |
linear_col |
Name of the column to store the summed linear predictor (default |
rate_col |
Name of the column to store Poisson rates (default |
count_col |
Name of the column to store Poisson counts (default |
A tibble with added linear predictor, rate, and count columns.
df <- tibble::tibble(.beta = 0.5, .u = rnorm(5), .error = rnorm(5)) add_poisson_outcome(df)df <- tibble::tibble(.beta = 0.5, .u = rnorm(5), .error = rnorm(5)) add_poisson_outcome(df)
Adds a random effect column (prefixed with ".") to the design data frame, with optional grouping for nested random effects.
add_random_effect(design_df, ..., .nesting = NULL)add_random_effect(design_df, ..., .nesting = NULL)
design_df |
A data frame containing the rollout design and any parameters. |
... |
A single named expression specifying the random effect to add (e.g., |
.nesting |
Optional character vector specifying grouping columns for nested random effects (default |
A tibble with the added random effect column.
df <- tibble::tibble(site = rep(1:2, each = 3)) add_random_effect(df, u = rnorm(1, 0, 1), .nesting = "site")df <- tibble::tibble(site = rep(1:2, each = 3)) add_random_effect(df, u = rnorm(1, 0, 1), .nesting = "site")
Computes the proportion of x values falling within term-specific intervals within each group,
typically inside evaluate_model_results() for simulation evaluation pipelines.
eval_between(x, term = NULL, na.rm = FALSE)eval_between(x, term = NULL, na.rm = FALSE)
x |
A numeric vector of estimates or statistics. |
term |
A named list of numeric vectors of length 2, giving the lower and upper bounds for each term.
For example, |
na.rm |
Logical; whether to remove missing values when computing the proportion.
Defaults to |
This function is designed to be used inside dplyr::summarise() within a grouped
tidyverse pipeline, typically after grouping by term.
If term is provided, the current grouping must include a term variable matching
the names in term. If a term in the group is not found in the provided term mapping,
the function will return NA with a warning.
A numeric scalar representing the proportion of x within the term-specific interval within the current group.
library(dplyr) library(purrr) library(broom.mixed) sim_models <- tibble( id = 1:50, model = map(1:50, ~ lm(mpg ~ wt, data = mtcars)) ) |> extract_model_results() sim_models |> filter(term == "wt") |> group_by(term) |> evaluate_model_results( prop_between = eval_between( estimate, term = list("wt" = c(-1, 0)) ) )library(dplyr) library(purrr) library(broom.mixed) sim_models <- tibble( id = 1:50, model = map(1:50, ~ lm(mpg ~ wt, data = mtcars)) ) |> extract_model_results() sim_models |> filter(term == "wt") |> group_by(term) |> evaluate_model_results( prop_between = eval_between( estimate, term = list("wt" = c(-1, 0)) ) )
Computes the mean bias (difference between estimated values and true values)
within each group, typically inside evaluate_model_results() for simulation evaluation pipelines.
eval_bias(x, term = NULL, na.rm = FALSE, warnings = TRUE)eval_bias(x, term = NULL, na.rm = FALSE, warnings = TRUE)
x |
A numeric vector of estimates (e.g., from a model term). |
term |
A named numeric vector providing the true value for each term.
For example, |
na.rm |
Logical; whether to remove missing values when computing the mean bias.
Defaults to |
warnings |
Should warnings be returned? |
This function is designed to be used inside dplyr::summarise() within a grouped
tidyverse pipeline, typically after grouping by term. It computes the mean of
x minus the true value for the corresponding term.
If term is provided, the current grouping must include a term variable matching
the names in term. If a term in the group is not found in the provided term mapping,
the function will return NA with a warning.
A numeric scalar representing the mean bias within the current group.
library(dplyr) library(purrr) library(broom.mixed) # Simulate and fit models sim_models <- tibble( id = 1:50, model = map(1:50, ~ lm(mpg ~ wt, data = mtcars)) ) |> extract_model_results() # Compute bias relative to true value (hypothetical slope = -5) sim_models |> filter(term == "wt") |> group_by(term) |> evaluate_model_results( bias = eval_bias( estimate, term = c("wt" = -5) ) ) # Compute bias relative to zero for all terms sim_models |> group_by(term) |> evaluate_model_results( bias = eval_bias(estimate) )library(dplyr) library(purrr) library(broom.mixed) # Simulate and fit models sim_models <- tibble( id = 1:50, model = map(1:50, ~ lm(mpg ~ wt, data = mtcars)) ) |> extract_model_results() # Compute bias relative to true value (hypothetical slope = -5) sim_models |> filter(term == "wt") |> group_by(term) |> evaluate_model_results( bias = eval_bias( estimate, term = c("wt" = -5) ) ) # Compute bias relative to zero for all terms sim_models |> group_by(term) |> evaluate_model_results( bias = eval_bias(estimate) )
Computes the proportion of x values exceeding term-specific thresholds within each group,
typically inside evaluate_model_results() for simulation evaluation pipelines.
eval_greater_than(x, term = NULL, na.rm = FALSE)eval_greater_than(x, term = NULL, na.rm = FALSE)
x |
A numeric vector of estimates or statistics. |
term |
A named numeric vector providing the threshold for each term.
For example, |
na.rm |
Logical; whether to remove missing values when computing the proportion.
Defaults to |
This function is designed to be used inside dplyr::summarise() within a grouped
tidyverse pipeline, typically after grouping by term.
If term is provided, the current grouping must include a term variable matching
the names in term. If a term in the group is not found in the provided term mapping,
the function will return NA with a warning.
A numeric scalar representing the proportion of x exceeding the term-specific threshold within the current group.
library(dplyr) library(purrr) library(broom.mixed) sim_models <- tibble( id = 1:50, model = map(1:50, ~ lm(mpg ~ wt, data = mtcars)) ) |> extract_model_results() sim_models |> filter(term == "wt") |> group_by(term) |> evaluate_model_results( prop_above_0 = eval_greater_than( estimate, term = c("wt" = 0) ) )library(dplyr) library(purrr) library(broom.mixed) sim_models <- tibble( id = 1:50, model = map(1:50, ~ lm(mpg ~ wt, data = mtcars)) ) |> extract_model_results() sim_models |> filter(term == "wt") |> group_by(term) |> evaluate_model_results( prop_above_0 = eval_greater_than( estimate, term = c("wt" = 0) ) )
Computes the proportion of x values falling below term-specific thresholds within each group,
typically inside evaluate_model_results() for simulation evaluation pipelines.
eval_less_than(x, term = NULL, na.rm = FALSE)eval_less_than(x, term = NULL, na.rm = FALSE)
x |
A numeric vector of estimates or statistics. |
term |
A named numeric vector providing the threshold for each term.
For example, |
na.rm |
Logical; whether to remove missing values when computing the proportion.
Defaults to |
This function is designed to be used inside dplyr::summarise() within a grouped
tidyverse pipeline, typically after grouping by term.
If term is provided, the current grouping must include a term variable matching
the names in term. If a term in the group is not found in the provided term mapping,
the function will return NA with a warning.
A numeric scalar representing the proportion of x below the term-specific threshold within the current group.
library(dplyr) library(purrr) library(broom.mixed) sim_models <- tibble( id = 1:50, model = map(1:50, ~ lm(mpg ~ wt, data = mtcars)) ) |> extract_model_results() sim_models |> filter(term == "wt") |> group_by(term) |> evaluate_model_results( prop_below_0 = eval_less_than( estimate, term = c("wt" = 0) ) )library(dplyr) library(purrr) library(broom.mixed) sim_models <- tibble( id = 1:50, model = map(1:50, ~ lm(mpg ~ wt, data = mtcars)) ) |> extract_model_results() sim_models |> filter(term == "wt") |> group_by(term) |> evaluate_model_results( prop_below_0 = eval_less_than( estimate, term = c("wt" = 0) ) )
Computes the specified quantile of x within each group, typically inside
evaluate_model_results() for simulation evaluation pipelines.
eval_quantile(x, term = NULL, na.rm = FALSE)eval_quantile(x, term = NULL, na.rm = FALSE)
x |
A numeric vector of estimates or statistics. |
term |
A named numeric vector with quantile probabilities for each term.
For example, |
na.rm |
Logical; whether to remove missing values when computing the quantile.
Defaults to |
This function is designed to be used inside dplyr::summarise() within a grouped
tidyverse pipeline, typically after grouping by term.
If term is provided, the current grouping must include a term variable matching
the names in term. If a term in the group is not found in the provided term mapping,
the function will return NA with a warning.
A numeric scalar representing the observed quantile of x within the current group.
library(dplyr) library(purrr) library(broom.mixed) sim_models <- tibble( id = 1:50, model = map(1:50, ~ lm(mpg ~ wt, data = mtcars)) ) |> extract_model_results() sim_models |> filter(term == "wt") |> group_by(term) |> evaluate_model_results( lower_quantile = eval_quantile( estimate, term = c("wt" = 0.05) ), upper_quantile = eval_quantile( estimate, term = c("wt" = 0.95) ) )library(dplyr) library(purrr) library(broom.mixed) sim_models <- tibble( id = 1:50, model = map(1:50, ~ lm(mpg ~ wt, data = mtcars)) ) |> extract_model_results() sim_models |> filter(term == "wt") |> group_by(term) |> evaluate_model_results( lower_quantile = eval_quantile( estimate, term = c("wt" = 0.05) ), upper_quantile = eval_quantile( estimate, term = c("wt" = 0.95) ) )
Computes summary statistics (e.g., power, custom summaries) across a set of
extracted model results, typically from extract_model_results(), to facilitate
simulation evaluation and reporting.
evaluate_model_results( results, alpha = 0.05, ..., .summarise_standard_broom = FALSE, broom_cols = c("estimate", "std.error", "statistic", "df", "p.value") )evaluate_model_results( results, alpha = 0.05, ..., .summarise_standard_broom = FALSE, broom_cols = c("estimate", "std.error", "statistic", "df", "p.value") )
results |
A data frame of extracted model results, typically including columns
like |
alpha |
Significance level used to compute power. Defaults to |
... |
Additional summary expressions to compute within |
.summarise_standard_broom |
Logical; if |
broom_cols |
Character vector of standard |
A summarised data frame containing:
n_models: the number of models summarised.
power: the proportion of p-values less than alpha (NA if all p-values are NA).
Additional columns corresponding to custom summaries provided in ....
Mean and SD summaries of broom columns if .summarise_standard_broom = TRUE.
library(dplyr) library(purrr) library(broom.mixed) # Simulate and fit models sim_models <- tibble( id = 1:50, model = map(1:50, ~ lm(mpg ~ wt, data = mtcars)) ) |> extract_model_results() # Evaluate power and mean estimate for the slope sim_models |> filter(term == "wt") |> group_by(term) |> evaluate_model_results( alpha = 0.05, mean_estimate = mean(estimate, na.rm = TRUE), sd_estimate = sd(estimate, na.rm = TRUE) ) # Evaluate with .summarise_standard_broom = TRUE sim_models |> filter(term == "wt") |> group_by(term) |> evaluate_model_results( .summarise_standard_broom = TRUE ) # Evaluate with eval_bias to compute bias relative to the true value # Suppose the true slope of wt is -5 (hypothetical) sim_models |> filter(term == "wt") |> group_by(term) |> evaluate_model_results( bias = eval_bias( estimate, term = c("wt" = -5) ) )library(dplyr) library(purrr) library(broom.mixed) # Simulate and fit models sim_models <- tibble( id = 1:50, model = map(1:50, ~ lm(mpg ~ wt, data = mtcars)) ) |> extract_model_results() # Evaluate power and mean estimate for the slope sim_models |> filter(term == "wt") |> group_by(term) |> evaluate_model_results( alpha = 0.05, mean_estimate = mean(estimate, na.rm = TRUE), sd_estimate = sd(estimate, na.rm = TRUE) ) # Evaluate with .summarise_standard_broom = TRUE sim_models |> filter(term == "wt") |> group_by(term) |> evaluate_model_results( .summarise_standard_broom = TRUE ) # Evaluate with eval_bias to compute bias relative to the true value # Suppose the true slope of wt is -5 (hypothetical) sim_models |> filter(term == "wt") |> group_by(term) |> evaluate_model_results( bias = eval_bias( estimate, term = c("wt" = -5) ) )
Applies a tidying function (default broom.mixed::tidy) to a column of models,
returning a tidy data frame with one row per term per model, suitable for downstream
summarisation and evaluation in simulation studies.
extract_model_results( models, model_col = "model", tidy_fun = broom.mixed::tidy, .term = NULL )extract_model_results( models, model_col = "model", tidy_fun = broom.mixed::tidy, .term = NULL )
models |
A data frame containing a column of fitted model objects. |
model_col |
Unquoted column name containing the models. Default is |
tidy_fun |
A tidying function to apply to each model. Default is |
.term |
Optional string specifying a term to filter after tidying (e.g., |
A tidy data frame with the original columns of models joined to the
tidied model results, typically including columns such as term, estimate,
std.error, statistic, and p.value.
library(dplyr) library(purrr) library(broom.mixed) # Simulate and fit models sim_models <- tibble( id = 1:5, model = map(1:5, ~ lm(mpg ~ wt, data = mtcars)) ) # Extract all terms extract_model_results(sim_models) # Extract only the slope term extract_model_results(sim_models, .term = "wt")library(dplyr) library(purrr) library(broom.mixed) # Simulate and fit models sim_models <- tibble( id = 1:5, model = map(1:5, ~ lm(mpg ~ wt, data = mtcars)) ) # Extract all terms extract_model_results(sim_models) # Extract only the slope term extract_model_results(sim_models, .term = "wt")
Applies a user-specified model-fitting function to each element of a list-column
of datasets in .data, fitting models in parallel with a progress bar, and returns
the original data frame with a new model column containing the fitted models.
fit_models( .data, .x, .f, packages = NULL, n_cores = parallel::detectCores() - 1 )fit_models( .data, .x, .f, packages = NULL, n_cores = parallel::detectCores() - 1 )
.data |
A data frame containing a list-column of datasets to which the model function will be applied. |
.x |
Unquoted column name of the list-column containing the datasets. |
.f |
A function or formula to apply to each dataset to fit the desired model
(e.g., |
packages |
A character vector of package names to load on each parallel worker,
if your model-fitting function requires additional packages. Defaults to |
n_cores |
Number of cores to use for parallel processing.
Defaults to |
This function is intended for use in simulation pipelines where multiple datasets
are generated (e.g., via simulate_datasets()), and models need to be fitted to
each dataset efficiently in parallel.
It uses pbapply::pblapply() to provide a progress bar during model fitting,
and parallel::makeCluster() for multi-core processing.
Packages specified in packages will be loaded on each worker to ensure model-fitting
functions that depend on those packages work correctly in parallel.
The original .data data frame with an additional model column containing
the fitted model objects returned by .f.
library(dplyr) library(purrr) library(lme4) # Create example grouped datasets for mixed models datasets <- tibble( id = 1:5, data = map(1:5, ~ { df <- sleepstudy[sample(nrow(sleepstudy), 50, replace = TRUE), ] df$Subject <- factor(df$Subject) df }) ) # Fit linear mixed models in parallel fitted_models <- fit_models( datasets, .x = data, .f = ~ lme4::lmer(Reaction ~ Days + (Days | Subject), data = .), packages = c("lme4"), n_cores = 1 ) # Inspect the first fitted mixed model summary(fitted_models$model[[1]]) # Tidy the fitted models using extract_model_results() for further evaluation extracted <- extract_model_results(fitted_models) head(extracted) # Summarise estimates for 'Days' across simulated fits extracted |> filter(term == "Days") |> evaluate_model_results( mean_estimate = mean(estimate, na.rm = TRUE), sd_estimate = sd(estimate, na.rm = TRUE) )library(dplyr) library(purrr) library(lme4) # Create example grouped datasets for mixed models datasets <- tibble( id = 1:5, data = map(1:5, ~ { df <- sleepstudy[sample(nrow(sleepstudy), 50, replace = TRUE), ] df$Subject <- factor(df$Subject) df }) ) # Fit linear mixed models in parallel fitted_models <- fit_models( datasets, .x = data, .f = ~ lme4::lmer(Reaction ~ Days + (Days | Subject), data = .), packages = c("lme4"), n_cores = 1 ) # Inspect the first fitted mixed model summary(fitted_models$model[[1]]) # Tidy the fitted models using extract_model_results() for further evaluation extracted <- extract_model_results(fitted_models) head(extracted) # Summarise estimates for 'Days' across simulated fits extracted |> filter(term == "Days") |> evaluate_model_results( mean_estimate = mean(estimate, na.rm = TRUE), sd_estimate = sd(estimate, na.rm = TRUE) )
Expands a long-format schedule to include a replicate identifier for running multiple simulation replicates efficiently.
initialize_replicates(long_schedule, n)initialize_replicates(long_schedule, n)
long_schedule |
A long-format rollout schedule. |
n |
Integer specifying the number of replicates to generate. |
A tibble with an added sample_id column for replicate indexing.
schedule <- tibble::tibble(site = "A", cohort = 1, chron_time = 0, condition = "control") initialize_replicates(schedule, n = 3)schedule <- tibble::tibble(site = "A", cohort = 1, chron_time = 0, condition = "control") initialize_replicates(schedule, n = 3)
Merges unit-level characteristics or parameters into a long-format rollout schedule and optionally expands rows based on count variables to create multiple units per site.
join_info( long_schedule, unit_info, by = NULL, uncount_vars = NULL, .ids = NULL )join_info( long_schedule, unit_info, by = NULL, uncount_vars = NULL, .ids = NULL )
long_schedule |
A long-format schedule (output from |
unit_info |
A data frame with unit-level information to join. |
by |
Columns used to join |
uncount_vars |
Optional character vector or list of quosures indicating count variables to expand rows. |
.ids |
Optional character vector specifying names of id columns when uncounting, one per |
A tibble with joined and optionally expanded rows to reflect unit counts.
schedule <- tibble::tibble(site = "A", cohort = 1, chron_time = 0, condition = "control") unit_info <- tibble::tibble(site = "A", n_units = 3) join_info(schedule, unit_info, by = "site", uncount_vars = "n_units")schedule <- tibble::tibble(site = "A", cohort = 1, chron_time = 0, condition = "control") unit_info <- tibble::tibble(site = "A", n_units = 3) join_info(schedule, unit_info, by = "site", uncount_vars = "n_units")
Transforms a wide-format rollout schedule into a long-format schedule, extracting chronological time from column names, converting condition columns to factors, and adding local time within each cohort if desired.
pivot_schedule_longer( schedule, time_cols, names_to = "chron_time", names_pattern = ".*(\\d+)", names_transform = as.numeric, values_to = "condition", values_transform = as.factor, cohort_name = "cohort", local_time = TRUE )pivot_schedule_longer( schedule, time_cols, names_to = "chron_time", names_pattern = ".*(\\d+)", names_transform = as.numeric, values_to = "condition", values_transform = as.factor, cohort_name = "cohort", local_time = TRUE )
schedule |
A data frame containing the rollout schedule in wide format. |
time_cols |
Columns containing time-specific condition assignments (tidyselect syntax). |
names_to |
Name of the new column to store extracted chronological time (default |
names_pattern |
Regular expression to extract the numeric time from column names (default |
names_transform |
Function to transform extracted time values (default |
values_to |
Name of the new column to store condition values (default |
values_transform |
Function to transform condition values (default |
cohort_name |
The column indicating cohort membership for local time calculation (default |
local_time |
Logical; if |
A long-format tibble with columns for cohort, condition, chronological time, and optionally local time.
library(dplyr) library(tidyr) schedule <- tibble::tibble( site = c("A", "B"), cohort = c(1, 2), t1 = c("control", "intervention"), t2 = c("intervention", "intervention") ) pivot_schedule_longer(schedule, time_cols = starts_with("t"))library(dplyr) library(tidyr) schedule <- tibble::tibble( site = c("A", "B"), cohort = c(1, 2), t1 = c("control", "intervention"), t2 = c("intervention", "intervention") ) pivot_schedule_longer(schedule, time_cols = starts_with("t"))