A summary of: Chen, S., Lin, Z., Shen, X., Li, L., & Pan, W. (2023). Inference of causal metabolite networks in the presence of invalid instrumental variables with GWAS summary data. Genetic Epidemiology, 1–15, https://doi.org/10.1002/gepi.22535, (Chen et al., n.d.).
Briefly, this paper is about using instrumental variables (SNPs) in causal inference with applications to genome-wide association studies (GWAS).
For a more gradual background jump down to the “In context” section.
Structural equation models (SEMs) for inferring causal networks in metabolites and other complex traits. The method:
The approach uses a stepwise selection to identify invalid IVs, and demonstrates its superior performance using both real and simulated GWAS data.
For one-sample GWAS individual-level data:
The original source code is here: https://github.com/chen-siyi7/one-sample-stepwise-IV-selection/blob/main/one-sample%20stepwise%20IV%20code.R
The function onesample_mvstepIV
conducts one-sample stepwise IV.
Input Parameters:
p
: Total number of predictors.R
: Correlation matrix.betaZX
: Regression coefficients for predictors.betaZY
: Regression coefficients for outcomes.se_betaZY
: Standard error of betaZY
.n
: Sample size.gamma_hat
: Gamma hat values (prior information).Main Computations:
ZTZ
using R
as:
\(ZTZ = R\)ZTY
as the element-wise product of ZTZ
diagonal and betaZY
: \(ZTY = \text{diag}(ZTZ) \times \beta{ZY}\)YTY
for each predictor SNP
as: \(YTY[SNP] = (n-1) \times ZTZ[SNP,SNP] \times (se_\beta{ZY}^2)[SNP] + ZTY[SNP] \times \beta{ZY}[SNP]\), excluding NA values.BIC
) for each predictor. For each predictor i
:
test11
with diagonal element i
set to 1.W1
by combining test11
and gamma_hat
.W1
using: \(\text{solve.W1} = W1^T \times ZTZ \times W1\)beta1
as: \(\beta1 = (solve.W1^{-1} \times W1^T \times ZTY)\)BIC
as: \(testbic[i] = n \times \log(YTY - \beta1^T \times W1^T \times ZTY) + \log(n) \times \sum_{i} \text{diag}(test11)\)IVs
) based on BIC
:
j
, select the predictor i
with the smallest BIC
.IV
are the same.whichIV
and set their diagonal elements in test11
to 1.beta1
as: \(\beta1 = (solve.W1^{-1} \times W1^T \times ZTY)\)Varbeta
as: \(\text{Var\beta} = \text{diag}(solve.W1 \times n) \times \sigma_u2\) where \(\sigma_u2 = YTY - \beta1^T \times W1^T \times ZTY\).Output:
invalidIV
: Indices of invalid IVs.beta_est
: Estimated beta values.beta_se
: Standard error of beta estimates.K
: Number of invalid IVs.The original source code is here: https://github.com/chen-siyi7/one-sample-stepwise-IV-selection/blob/main/onesample_mvstepIV_ind%20code.R
The function onesample_mvstepIV_ind
performs one-sample stepwise IV for independent SNPs.
Input Parameters:
Y
: Response variable.Z
: Predictor matrix.n
: Sample size.gamma_hat
: Gamma hat values (prior information).Main Computations:
testbic
for Bayesian Information Criterion.i
:
l
with length dim(Z)[2]
and set the i
th element to 1.Z22
such that for each row j
, Z22[j,]
is Z[j,]*l
.lm_stage2
) of Y
on Z22
and Z*gamma_hat
.BIC
for this predictor using: \(testbic[i] = n \times \log\left(\frac{\sum(lm\_stage2\text{residuals}^2)}{n}\right) + \log(n) \times \sum(l)\)IVs
) based on BIC
:
j
, select the predictor i
with the smallest BIC
.Z22
for the selected predictors and add one predictor at a time.BIC
as in step 2.IV
are the same.which.invalid
, from whichIV
and sort them to obtain K
.K
from Z
to form Z22
.lm_stage2
) of Y
on Z22
and Z*gamma_hat
.betaest
as: \(\beta{est} = \text{summary}(lm\_stage2)\text{coef[,1]}\)sigma_u2
as: \(\sigma_u2 = \frac{\sum(lm\_stage2\text{residuals}^2)}{n}\)Varbeta
using: \(\text{Varbeta} = \text{diag}(ginv(X^TX)) \times \sigma_u2\) where \(X = \text{cbind}(Z22, Dhat)\) and \(Dhat = Z*gamma\_hat\).betase
as: \(\beta{se} = \sqrt{\text{Varbeta}}\)Output:
beta_est
: Estimated beta values.beta_se
: Standard error of beta estimates.invalid IVs
: Indices of invalid IVs.no. of invalid IV
: Number of invalid IVs.Recap:
In GWAS, associations are generally sought between single nucleotide polymorphisms (SNPs) and a single trait. But GWAS data can also be used to analyze multiple related traits, leading to improved power and new biological insights. Specifically, network analysis of multiple traits is gaining interest, especially when it comes to causal network analysis. This is pivotal for elucidating relationships among multiple traits, such as in gene network and protein network analyses. Metabolite network analysis, the focal point of this research, posits that metabolites are integral parts of many biological processes, often interacting with each other in regulatory networks. By inferring these networks, we can gain insight into relationships among metabolites in biological processes.
In causal networks, traits, including metabolites, proteins, and genes, serve as the nodes. Their causal relationships are represented by directed edges connecting them. SNPs are utilized as instrumental variables (IVs). To model these intricate biological networks, structural equation models (SEMs) have been adopted.
An instrumental variable is associated with the exposure but does not have a direct association with the outcome, except through its relationship with the exposure. Its role is to isolate the variability in the exposure that is independent of the confounders.
IV analysis uses the variation in the exposure explained by the instrument to estimate the causal effect of the exposure on the outcome.
In the context of GWAS and metabolite network analysis, IV methods are crucial. They help determine causal relationships in complex biological processes, especially when metabolites, which do not function in isolation, interact within metabolite regulatory networks.
Instrumental Variables are a pivotal tool in causal inference, especially in genome-wide association studies (GWAS). When utilized properly, they can provide valuable insights into causal relationships in settings laden with confounding and endogeneity. However, they come with their own assumptions and potential limitations.