Estimation of Isoform Expression in RNA-Seq Data Using a Hierarchical Bayesian Model

Zengmiao Wang1, Jun Wang1, Changjing Wu2 and Minghua Deng1,2,3, ¡ì


1. Center for Quantitative Biology, Peking University, Beijing 100871, PR China.

2.
LMAM, School of Mathematical Sciences, Peking University, Beijing 100871, PR China.
3.
Center for Statistical Science, Peking University, Beijing 100871, PR China.

 

¡ìE-mail: dengmh@math.pku.edu.cn

 

 

Introduction:

 

Estimation of gene or isoform expression is a fundamental step in many transcriptome analysis tasks, such as differential expression analysis, eQTL (or sQTL) studies, and biological network construction. RNA-seq technology enables us to monitor the expression on genome-wide scale at single base pair resolution and offers the possibility of accurately measuring expression at the level of isoform. However, challenges remain because of non-uniform read sampling and the presence of various biases in RNA-seq data. In this article, we present a novel hierarchical Bayesian method to estimate isoform expression.
While most of the existing methods treat gene expression as a by-product, we incorporate it into our model and explicitly describe its relationship with corresponding isoform expression using a Multinomial distribution. In this way, gene and isoform expression are included in a unified framework and it helps us achieve a better performance over other state-of-the-art algorithms for isoform expression estimation. The effectiveness of the proposed method is demonstrated using both simulated data with known ground truth and two real RNA-seq datasets from MAQC project.

 

Source codes:

The algorithm is implemented in R language and a compressed file for the source code and test data is available here.  

 

*********************************************************************************************************************

Last Update: 06/06/2015

Questions, comments, suggestions, please contact wangzengmiao@pku.edu.cn , dengmh@math.pku.edu.cn