CNVhac: an integrated CNV calling package based on hybridization and

amplification rate correction for Affymetrix SNP arrays

Quan Wang1, Peichao Peng2, Minping Qian1,2, Lin Wan3,4,* and Minghua Deng1,2,5,*

1. Center for Theoretical Biology, Peking University, Beijing 100871, PR China.
2. LMAM, School of Mathematical Sciences, Peking University, Beijing 100871, PR China.
3. Molecular and Computational Biology Program, University of Southern California, Los Angeles, California.
4. National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, PR China.
5. Center for Statistical Science, Peking University, Beijing 100871, PR China.
*To whom correspondence should be addressed.


Introduction:

Copy number variation (CNV) is essential to understand pathology of many complex diseases at gene level. Affymetrix SNP arrays have been widely used for CNV studies, which depend on accurate copy number (CN) estimation significantly. Nevertheless the CN estimation may be biased by sample dependency of parameter estimation and genomic waves of intensitites owing to sequence dependent hybridization rate and amplification efficiency. Many available softwares pay little attention to these effects. For this reason, we developed a new CNV detection method based on hybridization and amplification rate correction (CNVhac) here. The sample independent parameters trained through physicochemical hybridization law are used to estimate the allelic concentrations (ACs) of target sequences. Then the CN is estimated by taking the ratio of AC to the corresponding average AC from a reference sample set for one specific site. The results of public HapMap data reveal that the biases driven from sample batches and genomic waves are adjusted effectively by CNVhac. Finally, a hidden Markov model (HMM) segmentation process is posed to detect CNV events. The results on two public data sets show that CNVhac can improve the precision significantly with lower false positive rate compared to other algorithms.

 

Source codes:

The algorithm is implemented in R and C++ language and the source codes is available. Please see CNVhac manual for details.

 

Pre-installations:

1. Install R. (R is available from http://www.r-project.org/)
2. Install Bioconductor package "affxparser" in R. (Bioconductor is available from http://www.bioconductor.org/)

 

Usage:

CNVhac is one Linux based program and can be applied to two array types: Affymetix SNP 5.0 and SNP 6.0. In order to estimate the copy number (CN), CNVhac needs a collection of normal samples as reference (see our paper for details). In the case-control assay pattern, the control samples are good choice as reference. For assays without control samples, another collection of normal samples hybridized to the same chip types can be treated as reference. Here for data without control samples, we treat the HapMap data of SNP 5.0 and SNP 6.0 platforms as reference respectively. We provide four algorithms here for SNP 5.0 array with/without control samples and SNP 6.0 array with/without control samples. Please see CNVhac manual for details.

กก

*********************************************************************************************************************

Last Update: 07/28/2014

Questions, comments, suggestions or bug report, please contact dengmh@math.pku.edu.cn.