17. beta_topN.py¶

17.1. Description¶

This program picks the top N rows (according to standard deviation) from the input file. The resulting file can be used for clustering and PCA analysis.

Example of input

CpG_ID  Sample_01       Sample_02       Sample_03       Sample_04
cg_001  0.831035        0.878022        0.794427        0.880911
cg_002  0.249544        0.209949        0.234294        0.236680
cg_003  0.845065        0.843957        0.840184        0.824286

17.2. Options¶

Options:

--version: show program’s version number and exit
-h, --help: show this help message and exit
-i INPUT_FILE, --input_file=INPUT_FILE: Tab-separated data frame file containing beta values with the 1st row containing sample IDs and the 1st column containing CpG IDs.
-c CPG_COUNT, --count=CPG_COUNT: Number of most variable CpGs (ranked by standard deviation) to keep. default=1000
-o OUT_FILE, --output=OUT_FILE: The prefix of the output file.

17.3. Input files (examples)¶

test_05_TwoGroup.tsv.gz

17.4. Command¶

$beta_topN.py -i test_05_TwoGroup.tsv.gz -c 500 -o test_05_TwoGroup

17.5. Output file¶

test_05_TwoGroup.sortedStdev.tsv
test_05_TwoGroup.sortedStdev.topN.tsv