Overview ========= CpGtools package provides a number of Python programs to annotate, QC, visualize, and analyze DNA methylation data generated from Illumina `HumanMethylation450 BeadChip (450K) `_ / `MethylationEPIC BeadChip (850K) `_ array or `RRBS / WGBS `_. These programs can be divided into three groups: - CpG position analysis modules - CpG signal analysis modules - Differential CpG analysis modules CpG position analysis modules ----------------------------- These modules are primarily used to analyze CpG's genomic locations. +------------------------------+-------------------------------------------------------------------+ |Name |Description | +------------------------------+-------------------------------------------------------------------+ |CpG_anno_probe.py |add comprehensive annotation information to each 450K/850K probe | | |ID. | +------------------------------+-------------------------------------------------------------------+ |CpG_anno_position.py |add comprehensive annotation information to each CpG based on its | | |genomic coordinate. | +------------------------------+-------------------------------------------------------------------+ |CpG_aggregation.py |Aggregate proportion values of a list of CpGs that located in give | | |genomic regions. | +------------------------------+-------------------------------------------------------------------+ |CpG_distrb_chrom.py |Calculates the distribution of CpG over chromosomes. | +------------------------------+-------------------------------------------------------------------+ |CpG_distrb_gene_centered.py |Calculates the distribution of CpG over gene-centered genomic | | |regions including 'Coding exons', 'UTR exons', 'Introns', | | |'Upstream intergenic regions', and 'Downstream intergenic regions'.| +------------------------------+-------------------------------------------------------------------+ |CpG_distrb_region.py |Calculates the distribution of CpG over user-specified genomic | | |regions (such as promoters, enhancers). | +------------------------------+-------------------------------------------------------------------+ |CpG_logo.py |Generates DNA motif logo for a given set of CpGs (to visualize | | |the genomic context of these CpGs). | +------------------------------+-------------------------------------------------------------------+ |CpG_to_gene.py |Assigns CpGs to their putative target genes. Follows the "Basel | | |plus extension" rules used by the `GREAT `_ algorithm. | +------------------------------+-------------------------------------------------------------------+ CpG signal analysis modules ---------------------------- These modules are primarily used to analyze CpG's DNA methylation beta values +------------------------------+-------------------------------------------------------------------+ |Name |Description | +------------------------------+-------------------------------------------------------------------+ |beta_PCA.py |Performs `PCA `_ for samples. | +------------------------------+-------------------------------------------------------------------+ |beta_jitter_plot.py |Generates jitter plot and bean plot for each sample (column). | +------------------------------+-------------------------------------------------------------------+ |beta_m_conversion.py |Converts Beta-value into M-value or vice versa. | +------------------------------+-------------------------------------------------------------------+ |beta_profile_gene_centered.py |Calculates the methylation profile (i.e., average beta value) for | | |gene-centered genomic regions. | +------------------------------+-------------------------------------------------------------------+ |beta_profile_region.py |Calculates the methylation profile (i.e., average beta value) for | | |user specified genomic regions. | +------------------------------+-------------------------------------------------------------------+ |beta_stacked_barplot.py |Creates stacked barplot for each sample. The stacked barplot | | |showing the proportions of CpGs whose beta values are falling into | | |these 4 ranges: [0.00, 0.25], (0.25, 0.50], (0.50, 0.75], and | | |(0.75, 1.00]. | +------------------------------+-------------------------------------------------------------------+ |beta_stats.py |Gives basic information of CpGs located in genomic regions. These | | |information include "Number of CpGs", "Min methylation level", | | |"Max methylation level", "Mean methylation level across all CpGs", | | |"Median methylation level across all CpGs" and "Standard deviation"| +------------------------------+-------------------------------------------------------------------+ |beta_topN.py |This program picks the N most variable CpGs from the input file. | | |The resulting file can be used for PCA/t-SNE or clustering | | |analysis. | +------------------------------+-------------------------------------------------------------------+ |beta_trichotmize.py |This program uses `Bayesian Gaussian Mixture model `_ to trichotmize beta values into three status: | | |"Un-methylated","Semi-methylated", "Full-methylated", and | | |"unassigned" | +------------------------------+-------------------------------------------------------------------+ |beta_tSNE.py |This program performs `t-SNE (t-Distributed Stochastic Neighbor | | |Embedding ) `_ analysis. | +------------------------------+-------------------------------------------------------------------+ Differential CpG analysis modules ---------------------------------- These modules are primarily used to identify CpGs that are differentially methylated between groups +------------------------------+-------------------------------------------------------------------+ |Name |Description | +------------------------------+-------------------------------------------------------------------+ |dmc_Bayes.py |Different from statistical testing, this program tries to estimates| | |"how different the means between the two groups are" using Bayesian| | |approach. An `MCMC `_ is used to estimate the "means", "difference of | | |means", "95% HDI (highest posterior density interval)", and the | | |posterior probability that the HDI does NOT include "0". It is | | |similar to John Kruschke's `BEST <(http://www.indiana.edu/~kruschke| | |/BEST/)>`_ algorithm. | +------------------------------+-------------------------------------------------------------------+ |dmc_bb.py |This program performs differential CpG analysis using the `beta | | |binomial model `_ based on methylation proportions (in the form of | | |"c,n", where "c" indicates "number of reads with methylated C" and | | |"n" indicates "Number of total reads". | +------------------------------+-------------------------------------------------------------------+ |dmc_fisher.py |This program performs differential CpG analysis using Fisher's | | |Exact Test. It only applies to two sample comparison with no | | |replicates. | +------------------------------+-------------------------------------------------------------------+ |dmc_glm.py |This program performs differential CpG analysis using the linear | | |regression model. | +------------------------------+-------------------------------------------------------------------+ |dmc_logit.py |This program performs differential CpG analysis using the logistic | | |regression model. | +------------------------------+-------------------------------------------------------------------+ |dmc_nonparametric.py |This program performs differential CpG analysis using `Mann-Whitney| | |test `_ | | |for two group comparison, and the `Kruskal-Wallis H-test `_ for multiple groups comparison. | +------------------------------+-------------------------------------------------------------------+ |dmc_ttest.py |This program performs differential CpG analysis using the T test | | |for two group comparison, and `ANOVA `_ for multiple groups comparison. | +------------------------------+-------------------------------------------------------------------+