| Title: | Simulated Pseudo-Individual Data Meta-Analysis with ABC-SMC |
|---|---|
| Description: | Meta-analysis via ABC-SMC by simulating pseudo-individual data from published group-level summary statistics. Handles binary, continuous, and generic effect-size outcomes within a one-stage mixed-model framework. Supports subgroup analysis. |
| Authors: | Yu Haichuan |
| Maintainer: | Yu Haichuan <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.2.0 |
| Built: | 2026-05-29 08:10:24 UTC |
| Source: | https://github.com/haichuanyu0703/spima |
A dataset of study-level summary statistics for continuous outcomes (blood pressure) from multiple clinical trials. Contains mean, standard deviation, and sample size per arm, suitable for the continuous module.
bp_contbp_cont
A data frame with columns:
Study identifier.
Treatment group indicator (0 = control, 1 = treatment).
Sample size per arm.
Mean blood pressure.
Standard deviation of blood pressure.
Each module exports a distance function that compares simulated summary statistics to the observed summary statistics.
spima_bin_distance(sim_stats, obs_stats) spima_cont_distance(sim_stats, obs_stats)spima_bin_distance(sim_stats, obs_stats) spima_cont_distance(sim_stats, obs_stats)
sim_stats |
Simulated summary statistics (vector or list). |
obs_stats |
Observed summary statistics (same structure). |
A non-negative scalar distance.
spima_bin_distance(): Binary outcome: Euclidean distance on
(possibly weighted) log-odds scale.
spima_cont_distance(): Continuous outcome: inverse-variance weighted
Euclidean distance on study-level mean differences.
Draws a forest plot showing study-level effect estimates with 95% CIs and the SPI-MA pooled posterior estimate.
## S3 method for class 'spima' forest( x, log_scale = FALSE, study_labels = NULL, col = "grey40", pooled_col = "#2166AC", xlab = NULL, ... ) spima_forest(x, ...)## S3 method for class 'spima' forest( x, log_scale = FALSE, study_labels = NULL, col = "grey40", pooled_col = "#2166AC", xlab = NULL, ... ) spima_forest(x, ...)
x |
A |
log_scale |
If |
study_labels |
Optional character vector of study labels. |
col |
Color for study-level points and CIs. |
pooled_col |
Color for the pooled diamond. |
xlab |
X-axis label (auto-detected if NULL). |
... |
Additional arguments passed to |
A dataset of study-level summary statistics for generic (continuous) effect sizes. Contains effect size estimates and their standard errors, suitable for the generic module.
gen_effectgen_effect
A data frame with columns:
Study identifier.
Effect size estimate.
Standard error of the effect size.
A dataset of study-level summary statistics for binary outcomes (kidney disease) from multiple clinical trials. Contains event counts and sample sizes per arm, suitable for the binary module.
kidney_binkidney_bin
A data frame with columns:
Study identifier.
Treatment group indicator (0 = control, 1 = treatment).
Sample size per arm.
Number of events per arm.
Evaluate log-prior density for a parameter vector
log_prior_density(theta, prior_obj)log_prior_density(theta, prior_obj)
theta |
Named numeric vector of parameters. |
prior_obj |
A |
Log-density value (summed across independent priors).
Generates a plot showing how the predicted treatment effect (absolute risk
difference or risk ratio) varies across the range of a continuous
covariate, based on interaction estimates from spima_int.
## S3 method for class 'spima_int' plot( x, covariate = NULL, ci_level = 0.95, at = NULL, scale = c("absolute", "relative"), ... )## S3 method for class 'spima_int' plot( x, covariate = NULL, ci_level = 0.95, at = NULL, scale = c("absolute", "relative"), ... )
x |
A |
covariate |
Character; name of the covariate to plot. If
|
ci_level |
Confidence level for the uncertainty band (default 0.95). |
at |
Numeric vector of covariate values at which to evaluate the
treatment effect. If |
scale |
|
... |
Additional arguments (ignored). |
The underlying model is either the pseudo-IPD individual-level GLMM (preferred) or the aggregate ecological GLMM (fallback). Uncertainty is propagated by sampling from the multivariate normal approximation of the fixed effects.
A ggplot object.
## Not run: res <- spima_int(data, input_spec) plot(res, covariate = "X1", scale = "absolute") ## End(Not run)## Not run: res <- spima_int(data, input_spec) plot(res, covariate = "X1", scale = "absolute") ## End(Not run)
Define Prior Distributions for ABC-SMC Parameters
prior(mu = "normal(0, 10)", tau = "halfnormal(0, 1)", ...)prior(mu = "normal(0, 10)", tau = "halfnormal(0, 1)", ...)
mu |
Prior specification for overall effect |
tau |
Prior specification for heterogeneity |
... |
Additional named priors (e.g. |
A list of class spima_prior with elements name,
pars, rfun (random generation), dfun (density),
and default.
prior(mu = "normal(0, 10)", tau = "halfnormal(0, 1)")prior(mu = "normal(0, 10)", tau = "halfnormal(0, 1)")
Run ABC-SMC Inference
run_abc_smc(prior_obj, sim_fn, distance_fn, obs_stats, ctrl, ...)run_abc_smc(prior_obj, sim_fn, distance_fn, obs_stats, ctrl, ...)
prior_obj |
A |
sim_fn |
Simulation function: |
distance_fn |
Distance function: |
obs_stats |
Observed (target) summary statistics. |
ctrl |
An |
... |
Additional arguments passed to |
A list of class spima_abc containing posterior samples,
weights, diagnostics, and generation records.
Sample from the joint prior
sample_prior(n, prior_obj)sample_prior(n, prior_obj)
n |
Number of samples. |
prior_obj |
A |
A matrix with n rows and one column per prior.
Control Parameters for ABC-SMC
smc_control( n_particles = 2000, n_particles_max = 10000, n_generations = 10, epsilon_init = NULL, epsilon_decay = 0.85, ess_min = 0.3, kernel = "gaussian", accept_rate_target = 0.2, verbose = TRUE, parallel = FALSE, n_cores = NULL )smc_control( n_particles = 2000, n_particles_max = 10000, n_generations = 10, epsilon_init = NULL, epsilon_decay = 0.85, ess_min = 0.3, kernel = "gaussian", accept_rate_target = 0.2, verbose = TRUE, parallel = FALSE, n_cores = NULL )
n_particles |
Number of particles (simulations) per generation. |
n_generations |
Maximum number of SMC generations. |
epsilon_init |
Initial acceptance threshold. If |
epsilon_decay |
Multiplicative factor applied to epsilon each generation (0 < decay < 1). |
ess_min |
Minimum effective-sample-size ratio (relative to
|
kernel |
Perturbation kernel type: |
accept_rate_target |
Target acceptance rate used for adaptive epsilon tuning. |
verbose |
Print progress information? |
parallel |
Logical; if |
n_cores |
Number of CPU cores for parallel execution. If
|
A list of class smc_control.
smc_control(n_particles = 500, n_generations = 8)smc_control(n_particles = 500, n_generations = 8)
The main entry point. Dispatches to the appropriate module based on
outcome_type and runs ABC-SMC for meta-analytic inference.
spima( data, outcome_type = c("binary", "continuous", "generic"), input_spec, prior, smc_control, parallel = FALSE, subgroup = NULL, family = c("gaussian", "Gamma"), ... )spima( data, outcome_type = c("binary", "continuous", "generic"), input_spec, prior, smc_control, parallel = FALSE, subgroup = NULL, family = c("gaussian", "Gamma"), ... )
data |
A data frame of study-level summary statistics; one row per
study (or per study-arm when |
outcome_type |
Outcome type: |
input_spec |
A named list mapping column names to roles. The
required entries depend on
|
prior |
A |
smc_control |
An |
parallel |
Logical; if |
subgroup |
Optional column name for subgroup analysis. When specified, the analysis is run separately for each level of this variable. |
family |
Distributional family for the pseudo-IPD likelihood.
Only used when |
... |
Additional arguments passed to module functions. |
A spima object with components:
call |
The matched call. |
outcome_type |
The outcome type. |
abc_result |
Full ABC-SMC output (generations, posterior, etc.). |
data |
The input data. |
input_spec |
The column mapping. |
## Not run: # Binary outcome meta-analysis (two-arm per study) data_bin <- data.frame( study = 1:4, group = c(0, 1, 0, 1, 0, 1, 0, 1), event = c(30, 45, 28, 32, 40, 58, 18, 22), n = c(100, 100, 80, 80, 120, 120, 60, 60) ) res <- spima(data_bin, "binary", input_spec = list(study = "study", event = "event", n = "n", group = "group"), prior = prior(mu = "normal(0, 10)", tau = "halfnormal(0, 1)"), smc_control = smc_control(n_particles = 500, n_generations = 5)) ## End(Not run)## Not run: # Binary outcome meta-analysis (two-arm per study) data_bin <- data.frame( study = 1:4, group = c(0, 1, 0, 1, 0, 1, 0, 1), event = c(30, 45, 28, 32, 40, 58, 18, 22), n = c(100, 100, 80, 80, 120, 120, 60, 60) ) res <- spima(data_bin, "binary", input_spec = list(study = "study", event = "event", n = "n", group = "group"), prior = prior(mu = "normal(0, 10)", tau = "halfnormal(0, 1)"), smc_control = smc_control(n_particles = 500, n_generations = 5)) ## End(Not run)
Computes per-study log odds ratios from the simulated pseudo-IPD by
constructing 2x2 tables. Also fits a one-stage logistic mixed model
(glmer) for the overall treatment effect estimate.
spima_bin_analyze(pseudo_ipd, input_spec)spima_bin_analyze(pseudo_ipd, input_spec)
pseudo_ipd |
A data frame from |
input_spec |
Column mapping. |
A list with estimates (mixed-model fixed effects),
summary_stats (named vector of per-study log ORs, matching
the format from spima_bin_observed_stats), and
converged (logical).
Compute Observed Summary Statistics for Binary Data
spima_bin_observed_stats(data, input_spec)spima_bin_observed_stats(data, input_spec)
data |
Original data frame per blueprint. |
input_spec |
Column mapping. |
Named vector of observed log-ORs (or log-odds) with optional inverse-variance weights as attribute.
For each study, the control-group proportion is taken from observed data, and the treatment-group log-odds are shifted by a study-specific effect drawn from N(mu, tau^2). Individual Bernoulli outcomes are then generated.
spima_bin_simulate(study_spec, params, input_spec)spima_bin_simulate(study_spec, params, input_spec)
study_spec |
A data frame (subset for one study) containing the observed counts. |
params |
Named vector |
input_spec |
Column mapping (passed through from |
A data frame with columns study, group, y.
Validate Binary Outcome Input
spima_bin_validate(data, input_spec)spima_bin_validate(data, input_spec)
data |
A data frame with columns for events and sample sizes. |
input_spec |
A named list specifying column mappings, e.g.
|
TRUE invisibly; stops with a message on failure.
Computes per-study mean differences from the simulated pseudo-IPD.
Also attempts a linear mixed model (lmer) for overall estimate.
spima_cont_analyze(pseudo_ipd, input_spec)spima_cont_analyze(pseudo_ipd, input_spec)
pseudo_ipd |
A data frame from |
input_spec |
Column mapping. |
A list with estimates, summary_stats (per-study
mean differences and pooled SDs, matching observed_stats format),
and converged.
Compute Observed Summary Statistics for Continuous Data
spima_cont_observed_stats(data, input_spec)spima_cont_observed_stats(data, input_spec)
data |
Original data frame. |
input_spec |
Column mapping. |
A list with means and sds (named vectors).
For each study, individual data are drawn from a normal (or skew-normal) distribution matching the observed mean and SD. The treatment group mean is shifted by a study-specific effect drawn from N(mu, tau^2).
spima_cont_simulate(study_spec, params, input_spec)spima_cont_simulate(study_spec, params, input_spec)
study_spec |
A data frame (subset for one study) containing the observed counts. |
params |
Named vector |
input_spec |
Column mapping (passed through from |
A data frame with columns study, group, y.
Validate Continuous Outcome Input
spima_cont_validate(data, input_spec)spima_cont_validate(data, input_spec)
data |
A data frame with means, SDs, and sample sizes. |
input_spec |
A named list, e.g.
|
TRUE invisibly.
Fits a one-stage Gamma GLMM via glmer(y ~ group + (1 | study),
family = Gamma(link = "log")) on the simulated pseudo-IPD. Also
returns per-study log-Rate Ratio values for use as summary statistics
in the ABC distance computation.
spima_gamma_analyze(pseudo_ipd, input_spec, quick = TRUE)spima_gamma_analyze(pseudo_ipd, input_spec, quick = TRUE)
pseudo_ipd |
A data frame from |
input_spec |
Column mapping. |
quick |
If |
Use quick = TRUE (default) during ABC-SMC sampling where only
the per-study summary statistics are needed for distance computation.
Set quick = FALSE to additionally fit the full GLMM (useful for
external diagnostics).
A list with components:
estimatesNamed vector of fixed effects from the
Gamma GLMM (or NULL if quick = TRUE or the model
does not converge).
summary_statsNamed vector of per-study log-RR values.
convergedLogical indicating GLMM convergence
(TRUE when quick = TRUE).
fitThe glmer fit object (or NULL).
Weighted Euclidean distance on the per-study log-Rate Ratio vector.
Delegates to spima_generic_distance.
spima_gamma_distance(sim_stats, obs_stats)spima_gamma_distance(sim_stats, obs_stats)
sim_stats |
Simulated summary statistics (vector or list). |
obs_stats |
Observed summary statistics (same structure). |
For each study, computes the observed log-Rate Ratio and its delta-method variance for inverse-variance weighting.
spima_gamma_observed_stats(data, input_spec)spima_gamma_observed_stats(data, input_spec)
data |
Original data frame with arm-level means, SDs, and sample sizes. |
input_spec |
Column mapping. |
A named vector of per-study log-RR values with an attribute
"weights" containing inverse-variance weights.
For each study, individual data are drawn from a Gamma distribution
matching the observed mean and SD via method-of-moments. The treatment
group mean is shifted by a multiplicative factor exp(theta_i)
where theta_i ~ N(mu, tau^2) — this encodes the log-Rate Ratio
treatment effect on the original scale. The shape parameter is held
constant within each study, preserving the variance structure implied
by the Gamma GLM with log link.
spima_gamma_simulate(study_spec, params, input_spec)spima_gamma_simulate(study_spec, params, input_spec)
study_spec |
A data frame (subset for one study) containing the observed counts. |
params |
Named vector |
input_spec |
Column mapping (passed through from |
A data frame with columns study, group, y.
Delegates to spima_cont_validate (same data format: mean, sd, n
per arm) and additionally checks that all means are positive (Gamma
distribution is supported on the positive real line).
spima_gamma_validate(data, input_spec)spima_gamma_validate(data, input_spec)
data |
A data frame with means, SDs, and sample sizes. |
input_spec |
A named list, e.g.
|
TRUE invisibly.
For the generic module, the "pseudo-IPD" is already the summary statistics (a vector of effect sizes). This function simply passes them through with converged = TRUE.
spima_generic_analyze(pseudo_ipd, input_spec)spima_generic_analyze(pseudo_ipd, input_spec)
pseudo_ipd |
A data frame from |
input_spec |
Column mapping. |
A list with estimates = NULL,
summary_stats (the effect-size vector), and
converged = TRUE.
Weighted Euclidean distance using inverse-variance weights.
spima_generic_distance(sim_stats, obs_stats) spima_generic_distance(sim_stats, obs_stats)spima_generic_distance(sim_stats, obs_stats) spima_generic_distance(sim_stats, obs_stats)
sim_stats |
Simulated summary statistics (vector or list). |
obs_stats |
Observed summary statistics (same structure). |
Compute Observed Summary Statistics for Generic Effect-Size
spima_generic_observed_stats(data, input_spec)spima_generic_observed_stats(data, input_spec)
data |
Original data frame. |
input_spec |
Column mapping. |
Named vector of effect sizes with "weights" attribute
(inverse-variance: 1/sei^2).
No individual-level data is generated. Instead, study-level effect sizes are drawn from the random-effects model: theta_i ~ N(mu, tau^2) y_i* ~ N(theta_i, sei_i^2)
spima_generic_simulate(study_spec, params, input_spec)spima_generic_simulate(study_spec, params, input_spec)
study_spec |
A data frame (subset for one study) containing the observed counts. |
params |
Named vector |
input_spec |
Column mapping (passed through from |
The returned vector can be treated as "pseudo-IPD" since it directly represents the summary statistics needed for distance computation.
A named numeric vector of simulated effect sizes (one per study).
Validate Generic Effect-Size Input
spima_generic_validate(data, input_spec)spima_generic_validate(data, input_spec)
data |
A data frame with effect-size and SE columns. |
input_spec |
A named list, e.g.
|
TRUE invisibly.
Tests whether continuous covariate(s) modify the treatment effect using aggregate data only. The primary method fits a mixed-effects logistic regression on the aggregate data. A sensitivity analysis generates pseudo-IPD and fits an individual-level model.
spima_int(data, input_spec, rho = 0, ...) spima_int_validate(data, input_spec)spima_int(data, input_spec, rho = 0, ...) spima_int_validate(data, input_spec)
data |
Data frame, one row per study-arm. |
input_spec |
Named list:
|
rho |
Assumed between-covariate correlation for pseudo-IPD generation. Default 0. |
... |
Additional arguments to |
spima_int object.