17. beta_topN.py¶
17.1. Description¶
This program picks the top N rows (according to standard deviation) from the input file. The resulting file can be used for clustering and PCA analysis.
Example of input
CpG_ID Sample_01 Sample_02 Sample_03 Sample_04
cg_001 0.831035 0.878022 0.794427 0.880911
cg_002 0.249544 0.209949 0.234294 0.236680
cg_003 0.845065 0.843957 0.840184 0.824286
17.2. Options¶
- Options:
- --version
show program’s version number and exit
- -h, --help
show this help message and exit
- -i INPUT_FILE, --input_file=INPUT_FILE
Tab-separated data frame file containing beta values with the 1st row containing sample IDs and the 1st column containing CpG IDs.
- -c CPG_COUNT, --count=CPG_COUNT
Number of most variable CpGs (ranked by standard deviation) to keep. default=1000
- -o OUT_FILE, --output=OUT_FILE
The prefix of the output file.
17.3. Input files (examples)¶
17.4. Command¶
$beta_topN.py -i test_05_TwoGroup.tsv.gz -c 500 -o test_05_TwoGroup
17.5. Output file¶
test_05_TwoGroup.sortedStdev.tsv
test_05_TwoGroup.sortedStdev.topN.tsv