Introduction
BICseq2
is
an algorithm developed for the normalization of
high-throughput sequencing (HTS) data and detection
of copy number variations (CNV) in the genome. BICseq2 can
be used for detecting CNVs with or without a control
genome. There are two main components of the algorithm:
- BICseq2-norm
is
for normalizing potential biases in the sequencing
data.
- BICseq2-seg is
for
detecting CNVs based on the normalized data given by
BICseq2-norm.
The
general
pipeline using BICseq2 for CNV detection is as follows.
- Only
a
case genome is sequenced and no control genome is
available.
- Get
the
uniquely mapped reads from the bam file. (You
may
use our modified samtools provided here).
- Use
BICseq2-norm
to remove the biases in the data.
- Use
BICseq2-seg
to detect CNVs based on the normalized data.
- Both
the
case genome and control genome are available (In a
cancer study, the case genome is the tumor genome, and
the control genome can be the matched normal genome).
- Get
the
uniquely mapped reads from the case and control
genome bam files, respectively.
- Normalize
the
case and control genomes individually using
BICseq2-norm.
- Detect
CNVs
in the case genome based on the normalized data of
the case genome and the control genome.
BICseq2-norm usage
Before
using
BICseq2-norm, you have to first compile the C code. To
compile, you may simply type
make clean
make
After
the
compilation, you
can use the perl program BICseq2-norm.pl for
normalization.
Usage:
BICseq2-norm.pl
[options]
<configFile><output>
Options:
--help
-l=<int>:
read length
-s=<int>:
fragment size
-p=<float>: a subsample percentage (Default 0.0002).
-b=<int>:
bin the expected and observed as< int> bp bins
(Default 100).
--gc_bin: if
specified, report the GC-content in the bins
--NoMapBin: if
specified, do NOT bin the reads according to the
mappability
--bin_only:
only bin the reads without normalization
--fig=<string>: plot the read count VS GC figure in
the specified file (in PDF format)
--title=<string>: title of the figure
--tmp=<string>: the temp directory
<
configFile>
specifies the location of the configuration file that
has the necessary information for normalization. See
below for the format of the configuration file.
<
output>
is the file that stores the parameter
estimates in the GAM model. This is not useful for
general users.
The <configFile> has the following format:
chromName |
faFile |
MapFile |
readPosFile |
binFileNorm |
chr1 |
chr1.fa |
hg18.CRC.50mer.chr1.txt |
chr1.seq |
chr1.norm.bin |
chr2 |
chr2.fa |
hg18.CRC.50mer.chr2.txt |
chr2.seq |
chr2.norm.bin |
In
the <configFile>,
the columns should be tab-delimited. The first row
of this file is assumed to be the header of the file and
will be omitted by BICseq2-norm.
The 1st column (chromName) is the chromosome name.
The 2nd column (faFile) is the reference sequence of this
chromosome (Human hg18 and hg19 are available for
download.).
The 3rd column (MapFile) is the mappability file of this
chromosome (Human hg18 (50bp) and hg19 (50bp and 75bp) are
available for download).
The 4th column (readPosFile) is the file that stores all
the mapping positions of all reads that uniquely mapped to
this chromosome.
The 5th column (binFile) is the file that stores the
normalized data. The data will be binned with the bin size
as specified by the option -b.
BICseq2-seg usage
Similar
to
BICseq2-norm, you can first compile BICseq2-seg with
make clean
make
After
compilation,
you can detect CNV with the perl program BICseq2-seg.pl.
Usage:
BICseq2-seg.pl [options]
<configFile><output>
Options:
--lambda=<float>: the (positive) penalty used for
BICseq2
--tmp=<string>: the temp
directory
--help: print
this message
--fig=<string>: plot the CNV profile in a PNG file
--title=<string>: the title of
the figure
--nrm: do not
remove likely germline CNVs (with a matched normal) or
segments with bad mappability (without a matched normal)
--bootstrap: perform bootstrap test to assign confidence
(only for one sample case)
--noscale: do not automatically
adjust the lambda parameter according to the noise level
in the data
--strict: if
specified, use a more stringent method to adjust the
lambda parameter
--control: the data has a control genome
--detail: if specified, print the
detailed segmentation result (for multiSample only)
As
with
the original BIC-seq algorithm, the --lambda parameter is
the main parameter used for tuning the smoothness of the
CNV profile. The larger the value, the fewer segments the
file profile will have. The default value is 2.
<
configFile>
stores the necessary information for BICseq2-seg to detect
CNVs.
< output> stores the final CNV detection results.
<
configFile>
has the following format (tab-delimited; first row treated
as header).
- If there is no control, the format is
chromName |
binFileNorm |
chr1 |
chr1.norm.bin |
chr2 |
chr2.norm.bin |
The
1st
column (chromName) is just the chromosome name.
The 2nd column (binFileNorm) is the normalized bin
file as obtained from BICseq2-norm.
- If there is a control, the format is
chromName |
binFileNorm.Case |
binFileNorm.Control |
chr1 |
CaseChr1.norm.bin |
ControlChr1.norm.bin |
chr2 |
CaseChr1.norm.bin |
ControlChr1.norm.bin |
The
2nd
column (binFileNorm.Case) is the normalized bin file of
the case genome as obtained from BICseq2-norm.
The 3rd column (binFileNorm.Control) is the normalized
bin file of the control genome as obtained from
BICseq2-norm.
Note: If you have a control, you must specify to --control
to let BICseq2 know that the data is a case/control
study.
How to cite BIC-seq2:
Xi, R.*, Lee, S., Xia, Y., Kim, T. and Park, P.* (2016) Copy
number analysis of whole-genome data using BIC-seq2 and its
application to detection of cancer susceptibility variants,
Nucleic Acids Research, 44(13):6274-86.
Xi,
R.,
Hadjipanayis, A.G., Luquette, L.J., Kim, T.M.,
Lee, E., Zhang, J.H., Johnson, M.D., Muzny, D.M.,
Wheeler, D.A., Kucherlapati, R., and Park, P.*
(2011). Copy number alteration detection in
sequencing data using the Bayesian information
criterion, Proceedings of the National Academy of
Sciences, USA, 108(46):E1128-36.
Frequently Asked Questions.
Please see this document.
|