Jarred M. Kvamme
Major: Bioinformatics and Computational Biology
Faculty Advisor: Audrey Q. Fu
An Accurate and Efficient Causal Gene Network Inference Method That Handles Many Confounding Variables
Learning complex regulatory relationships among genes has been a challenge in biology. Several methods have been developed to conduct causal inference for molecular phenotypes (e.g., gene expression and DNA methylation), using genetic variants as instrumental variables under the principle of Mendelian randomization (PMR). However, these methods are often limited to mediation and ignore other possible regulatory relationships, or do not account for confounding variables well. Here, we introduce a new causal network inference method for trios consisting of a genetic variant (denoted by V) and two associated molecular phenotypes (denoted by X and Y). Our method, Mendelian Randomization Genomic Network (MRGN), improves existing methods by i) inferring diverse regulatory relationships for a trio; ii) allowing for the inclusion of many confounding variables; and iii) eliminating the need for a large set of independence tests to infer each causal edge.
Specifically, MRGN performs conditional and marginal tests to detect five basic models: mediation (V → X → Y), v-structure (V → X → Y), conditional independence (X ← V → Y), fully connected (X ← V → Y and X ↔ Y), and the null model (V → X; no relationship between X and Y). We test the performance of MRGN by simulating trios from the five basic models with confounding variables. We investigate four simulation parameters: strength of the effect, noise in the residuals, minor allele frequency, and the number of confounders. We compare our method with two related PMR-based methods (Yang et. al., 2017; Badsha and Fu, 2019). For each trio, confounders are identified from a candidate pool before causal inference. Across different scenarios, MRGN correctly detects substantially more edges than MRPC (recall 0.944 vs 0.141; precision 0.832 vs 0.989, respectively). MRGN also performs comparably with GMAC in detecting mediation (recall 0.918 vs 0.967; precision 0.789 vs 0.754, respectively).
Funding: NIH P20GM104420.