Variational Inference for Multiple Outcomes in GWAS

March 7, 2020

NSFC 71501089 high dimention multiple traits GWAS

Introduction

It is common to have multiple outcomes of interest in a single study. Usually, these outcomes are analysed separately, ignoring the correlation between them. In high dimensional setting, one would expect that a carefully designed multivariate approach would be a more efficient alternative to individual analyses of each outcome [Rothman et al.,2010]. For jointly modeling multiple correlated outcomes in GWAS, we have developed a statisitcal method in this paper. Here we provide same demonstrations for our method, which is implemented in the R package vimco.

Assume we have measured \(q\) responses \(y_1, \dots, y_q\) and the same set of \(p\) predictors on each individual \(x_1,\dots, x_p\). Each response has its own regression model: \[y_j = \mathbf X\beta_j + \varepsilon_j, ~ j=1, \dots, p,\] where \(\beta_j\) is the regression vector for the \(j\)th response. We can formulate the multivariate multiple regression model as follows: \[\mathbf Y = \mathbf X \mathbf B + \mathbf E,\] where \(\mathrm E(\mathbf E)=0\), and \(\mathrm {Var}(\mathbf E)=\Sigma\).

Simulate high dimensional data

You can simulate a simple example with the following R code:

AR = function(rho, p) {
    outer(1:p, 1:p, FUN = function(x, y) rho^(abs(x - y))) 
}
library(mvtnorm)
n = 300  # sample size
p = 400  # dimension of predictors
K = 5    # dimension of outcomes
set.seed(20132014)
X   = rmvnorm(n, mean=rep(0, p))
sigma.beta = rep(1, K)

bet = matrix(0, nrow = p, ncol = K)
lambda = 0.15    # proportion of non-zero entries in matrix \beta
eta = rbinom(p, 1, lambda)
alpha = 1
gam = matrix(rbinom(p*K, 1, alpha), ncol=K)
for (k in 1:K){
  bet[, k] = rnorm(p, mean = 0, sd = sigma.beta[k]) * gam[,k] * eta 
}
  
sigma = AR(0.8, K)
lp  = X %*% bet
sigma.e = diag(sqrt(diag(var(lp)))) %*% sigma %*% diag(sqrt(diag(var(lp))))
err = rmvnorm(n, rep(0, K), sigma.e)
Y   = lp + err
print(round(cor(Y),2))
##      [,1] [,2] [,3] [,4] [,5]
## [1,] 1.00 0.39 0.25 0.11 0.20
## [2,] 0.39 1.00 0.35 0.29 0.25
## [3,] 0.25 0.35 1.00 0.44 0.23
## [4,] 0.11 0.29 0.44 1.00 0.42
## [5,] 0.20 0.25 0.23 0.42 1.00

Fit each outcome seperately

library(vimco)
fit_Ind = emInd(X, Y)

Fit all outcomes jointly

# initial values
p = ncol(X)
mu0     = fit_Ind$mu
sigb0   = fit_Ind$sigb
Theta0  = matrix(0, nrow=ncol(Y), ncol=ncol(Y))
diag(Theta0)  =   1/c(fit_Ind$sige)
Lambda0 = rep(1, p)
Alpha0  = fit_Ind$Alpha 
lambda0 = 1
#fit_Mul = emMultiple(X, Y)
fit_Mul = emMultiple(X, Y, mu0, sigb0, Theta0, Lambda0, Alpha0, TRUE)

Compare results

plot(bet, fit_Ind$Beta, xlab = "ture beta", ylab = "estimate", col=2)
points(bet, fit_Mul$Beta, col=3)
abline(0, 1, lwd=2, col=1)
legend("topleft", c("jointly", "seperately"), pch=21, col=3:2)

Fit VIMCO for GWAS data with multiple traits

To handle GWAS data that is in a PLINK format (incudes sim.bed, sim.bim, sim.fam) directly, vimco package provides a function vimco::vimco(), check its help file for more details. Here we provide a simple examples. The following 3 PLINK files are already provided in our packages:

sim.bed
sim.bim
sim.fam

path <- system.file(package = "vimco")
setwd(path)
stringname <- "sim"
tmp <- vimco(stringname, nPheno = 4, fit = TRUE)

Fabs for Nonconvex Loss and Adaptive LASSO

March 17, 2019

NSFC 71501089 Fabs algorithm nonconvex loss adaptive LASSO