Sample operations

yame index generates the index file for the .cx input. This is helpful when .cx contains multiple samples.

Example of merging a feature file with multiple .cm files

Here’s an example showing how to merge multiple histone modification ChIP-Seq peak bed files into one merged .cm feature file using yame index. First, obtain a .tsv file describing the index and path for each of the bed file like controlfiles.tsv below:

268     GSM648494       human_hm/268_sort_peaks.narrowPeak.bed
269     GSM648495       human_hm/269_sort_peaks.narrowPeak.bed
272     GSM575295       human_hm/272_b_sort_peaks.broadPeak.bed
273     GSM575280       human_hm/273_sort_peaks.narrowPeak.bed
274     GSM575296       human_hm/274_b_sort_peaks.broadPeak.bed
275     GSM575281       human_hm/275_sort_peaks.narrowPeak.bed
367     GSM575223       human_hm/367_sort_peaks.narrowPeak.bed
368     GSM575222       human_hm/368_sort_peaks.narrowPeak.bed
382     GSM610328       human_hm/382_sort_peaks.narrowPeak.bed

Then, make individual .cm files for each feature, see yame pack.

cat controlfiles.tsv | parallel --colsep '\t' -j 72 'id={1};path={3}; sortbed $path | bedtools intersect -a cpg_nocontig.bed.gz -b - -sorted -c | cut -f4 | yame pack -f b - $id.cm'

Then, run the following commands and tailor the threshold for quality control.

awk '{print ""$1".cm", $2";"$4;}' controlfiles.tsv | while read fn anno; do yame summary $fn; done > qc.txt

awk '$1!~/QFile/ && $6>5000' qc.txt | awk 'NR==FNR{a[$1]=1;}NR!=FNR&&($1".cm" in a){print $0;}' - controlfiles.tsv | awk '{print ""$1".cm", $2";"$4;}' | sort -k2,2 | while read fn anno; do cat $fn >> merged.cm; yame index -1 $anno merged.cm; done

yame split splits the samples when provided with -s sample name list.

For more help with split, run yame split in the terminal or check out the split help page.

yame chunk and yame chunkchar breakdown .cx file into smaller and more manageable parts.

For more help with chunk, run yame chunk in the terminal or check out the chunk help page.