High-Dimensional Inference in Genetical Genomics
主 题: High-Dimensional Inference in Genetical Genomics
报告人: 林伟 (University of Pennsylvania)
时 间: 2014-02-28 15:00-16:00
地 点: 理科一号楼1114（数学所活动）
We consider two high-dimensional statistical inference problems motivated by genetical genomics applications, where high-throughput data on genetic variants and gene expression levels, as well as clinical traits, are available for joint analysis. In the first problem, we aim at identifying and estimating important causal effects of gene expressions on the clinical trait, using genetic variants as instrumental variables. To deal with the high dimensionality and unknown optimal instruments, we propose a two-stage regularization methodology, which extends the classical twostage least squares method by exploiting sparsity in both stages. In the second problem, we are concerned with simultaneous dimension reduction and variable selection in the multivariate regression of gene expressions on genetic variants. We introduce a sparse orthogonal factor regression approach to reveal a low-dimensional latent factor structure represented by a sparse singular value decomposition of the regression coefficient matrix. The methodology is formulated as an orthogonality constrained regularization problem, coupled with an efficient algorithm via the alternating direction method of multipliers. In both contexts, we investigate theoretical properties of theregularized estimators in the high-dimensional setting where the dimensionality of genetic variants and gene expressions may be comparable to or much larger than the sample size. The practical performance and usefulness of the proposed methods are illustrated by simulation studies and the analysis of mouse and yeast expression quantitative trait loci (eQTL) data sets.