Package 'wsprv'

Title: Weighted Selection Probability for Rare Variant Analysis
Description: A weighted selection probability to locate rare variants associated with multiple phenotypes.
Authors: Xianglong Liang [aut, cre], Hokeun Sun [ctb]
Maintainer: Xianglong Liang <[email protected]>
License: GPL-2
Version: 0.1.0
Built: 2025-03-01 03:19:57 UTC

Help Index

A weighted selection probability is developed to locate individual rare variants associated with multiple phenotypes.


Recently, rare variant association studies with multiple phenotypes have drawn a lot of attentions because association signals can be boosted when rare variants are related with more than one phenotype. Most of existing statistical methods to identify rare variants associated with multiple phenotypes are based on a group test, where a gene or a genetic region is tested one at a time. However, these methods are not designed to locate individual rare variants within a gene or a genetic region. We propose a weighted selection probability to locate individual rare variants within a group after a multiple-phenotype based group test finds significance.


  alpha = 1,
  penalty.factor = NULL,
  standardize = TRUE,
  type.multinomial = c("grouped", "ungrouped"),
  rep = 100,
  rate = 0.05,
  gamma = 0.01



A n×(m+p)n \times (m+p) matrix with nn samples, mm covariates and pp rare variants where mm can be zero, i.e., there does not exist covariates.


A n×Qn \times Q phenotype matrix with nn samples and QQ phenotypes where Q>1Q>1.


The mixing parameter of elastic-net, alpha=1 is the lasso, and alpha=0 is the ridge. Default value is 1.


Separate penalty factors factors can be applied to each coefficient. Can be 0 for some variables, which implies no shrinkage, and that variable is always included in the model.


Genotype standardization. Default is TRUE.


A group lasso penalty is used on the multinomial coefficients for a variable when 'grouped'. It ensures the multinomial coefficents are all in or out. Default is 'grouped'.


The number of bootstrap replications. We recommend to use 100 or more to compute weighted selection probability. Default value is 100.


A tuning parameter represents rate of degree of freedom to the number of rare variants. Default value is 0.05.


The upper gamma quantile of selection frequencies of individual variants each phenotype to compute the threshold. Default value is 0.01.


The penalty function of elastic-net is defined as


where α\alpha is a mixing proportion of ridge and the lasso, and β\beta is regression coefficients. This penalty is equivalent to the Lasso penalty if alpha=1.

Let η\eta be the degree of freedom and it depends on the tuning parameter λ\lambda, and rate is computed as


Note that ηn\eta \leq n is set up in weight_sp function.

Let δγ\delta_{\gamma} be a threshold of SFSF and it depends on the upper γth\gamma^{th} qunatile value of SFSF. Where SF={SF11(η),SF21(η),,SFpQ(η)}SF=\left\{SF_{11}(\eta),SF_{21}(\eta),\cdots,SF_{pQ}(\eta) \right\} is a set that contains selection frequencies of individual rare variants each phenotype.



A matrix contains the order of weighted selection probabilities from the largest to the smallest and the corresponding weighted selection probabilities.


eta used.


The number of bootstrap replications used.


The tuning parameter rate used.


The upper gamma quantile of selection frequencies of individual rare variants each phenotype used.


# Generate simulation data
 n <- 400
 p <- 100
 q <- 5
 MAF <- 0.01
 geno.prob <- rbind((1-MAF)^2,2*(1-MAF)*MAF,MAF^2)
 x <- matrix(NA,n,p)
 for(i in 1:p) x[,i] <- sample(0:2,n,prob=geno.prob,replace=TRUE)
 beta <- c(rep(3.0,10),rep(0,(p-10)))
 cova <- matrix(0.75,q,q)
 diag(cova) <- 1
 err.mat <- rmnorm(n,rep(0,q),cova)

 y1 <- x %*% beta+err.mat[,1]
 y2 <- x %*% beta+err.mat[,2]
 y <- cbind(y1,y2,err.mat[,3:5])
 # Weighted selection probabilities for individual rare variants without covariates.
 #If rep=100, time consuming.
 wsp.rv1 <- weight_sp(x,y,rep=5) # continuous phenotypes

 # Weighted selection probabilities for individual rare variants with covariates.
 #If rep=100, time consuming.
 cx <- cbind(rnorm(n),sample(0:1,n,replace=TRUE))
 x <- cbind(cx,x)
 penalty.factor <- c(rep(0,2),rep(1,p))
 colnames(x) <- c('Age','Gender',paste0('V',3:102))
 wsp.rv2 <- weight_sp(x,y,penalty.factor=penalty.factor,rep=5) # continuous phenotypes