PDEGEM: Modeling non-uniform read distribution in RNA-seq data
Yuchao Xia1,*, Fugui Wang1,*,
Minping Qian1,2, Zhaohui Qin3
and Minghua Deng1,2,4,
¡ì
1. Center for Quantitative Biology, Peking
University, Beijing 100871, PR China.
2. LMAM, School of Mathematical Sciences, Peking University, Beijing 100871, PR
China.
3.
Department
of Biostatistics and Bioinformatics, Emory University, Atlanta, USA.
4. Center for Statistical Science, Peking University, Beijing 100871, PR China.
*These
authors contributed equally to this work.
¡ìE-mail:
dengmh@math.pku.edu.cn
Introduction:
In this article, we borrow the idea from the Positional Dependent Nearest Neighborhood (PDNN) model, originally developed for analyzing microarray data, to model the non-uniformity of read distribution in RNA-seq data. We propose a robust nonlinear regression model named PDEGEM, a Positional Dependent Energy Guided Expression Model to estimate the abundance of transcripts. Using real data, we find that the PDEGEM fits the data better than mseq in all three real datasets we tested. We also find that the expression measure obtained using PDEGEM showed higher correlation with that obtained from alterative assays for quantifying gene and isoform expressions.
Based on these results, we believe that our PDEGEM can improve the accuracy in modeling and estimating the transcript abundance and isoform expression in RNA-Seq data. Additionally, although the stacking energy and positional weight of the PDEGEM are relatively related to sequencing platforms and species, they share some common trends, which indicates that the PDEGEM could partly reflect the mechanism of DNA binding between the template strain and the new synthesized read.
Source
codes:
The algorithm is implemented in R language and the source code is available here.
Pre-installations:
1.
Install R. (R is available from http://www.r-project.org/) |
2.
Install Bioconductor package mseq
in R. (Bioconductor is available from http://www.bioconductor.org/) |
Usage:
PDEGEM is R program that can be applied to estimate
the transcript abundance for RNA-seq data. Please see PDEGEM manual for details.
*********************************************************************************************************************
Last Update:
03/25/2014
Questions, comments,
suggestions, please contact xiayuchao@pku.edu.cn