6. CpG_distrb_region.py

6.1. Description

This program calculates the distribution of CpG over user-specified genomic regions.

Notes

  • A maximum of ten BED files (define ten different genomic regions) can be analyzed together.

  • The order of BED files is important (i.e., considered as “priority order”). Overlapped genomic regions will be kept in the BED file with the highest priority and removed from BED files of lower priorities. For example, users provided 3 BED files via “-i promoters.bed,enhancers.bed,intergenic.bed”, then if an enhancer region is overlapped with promoters, the overlapped part will be removed from “enhancers.bed”.

  • BED files can be regular or compressed by ‘gzip’ or ‘bz’.

6.2. Options

--version

show program’s version number and exit

-h, --help

show this help message and exit

-i CPG_FILE, --cpg=CPG_FILE

BED file specifying the C position. This BED file should have at least three columns (Chrom, ChromStart, ChromeEnd). Note: the first base in a chromosome is numbered 0. This file can be a regular text file or compressed file (.gz, .bz2).

-b BED_FILES, --bed=BED_FILES

List of BED files specifying the genomic regions.

-o OUT_FILE, --output=OUT_FILE

The prefix of the output file.

6.3. Input files (examples)

6.4. Command

# check the distribution of 850K probes in 4 genomic regions (CpG islands, Promoters,
# Bivalent promoters, and Heterochromatin regions)

$CpG_distrb_region.py -i 850K_probe.hg19.bed3.gz -b  hg19_H3K4me3.bed4,hg19_CGI.bed4,\
 hg19_H3K27ac_with_H3K4me1.bed4,hg19_H3K27me3.bed4 -o regionDist

6.5. Output files

  • regionDist.tsv

  • regionDist.r

  • regionDist.pdf

../_images/regionDist.png