Methylation data pack and unpack

yame pack

yame papck provides the functionality of packaging different inputs into .cx file for easier downstream analysis.
Note: please make sure that the input file match the dimension and order of the reference bed file. You can use bedtools intersect to match with the reference bed file.

yame pack has the following format specification -f

For binary data: 0. 1 byte for 8 binary CpGs 1. Value (1 byte) + Run-Length Encoding (RLE) (2 bytes)

For state data: 2. State text + Index RLE (Best for chromatin states)

For sequencing data: 3. MU RLE + Ladder byte (Input: 2-column text, M and U)

For fraction data: 4. Fraction / NA-RLE (32 bytes)

For differential meth data: 5. 2-bits + NA-RLE (Input: only 0, 1, 2 values) 6. 2-bits boolean for S (set) and U (universe)

For referene coordinates: 7. Compressed BED format for CGs

Example of generating a feature file using yame pack

Here’s an example showing how to generate the chrommHMM full stack hg38 annotation feature file. First, we have an input bed file hg38_genome_100_segments.bed.gz that might look like this:

chr1    10000   10400   2_GapArtf2
chr1    10400   10600   27_Acet1
chr1    10600   10800   38_EnhWk4
chr1    10800   12800   1_GapArtf1
chr1    12800   13000   38_EnhWk4
chr1    13000   13200   37_EnhWk3
chr1    13200   14800   6_Quies3
chr1    14800   15200   72_TxWk2
chr1    15200   16000   6_Quies3
chr1    16000   16200   83_TxEx3
chr1    16200   16400   82_TxEx2

We can run the following commands in the terminal, where we use bedtools intersect with the reference CpG .cr file mm10/hg38 to obtain the feature each CpG belongs to.

yame unpack cpg_nocontig.cr | gzip > cpg_nocontig.bed.gz

zcat hg38_genome_100_segments.bed.gz | sortbed | bedtools intersect -a cpg_nocontig.bed.gz -b - -loj -sorted | bedtools groupby -g 1-3 -c 7 -o first -i - | cut -f4 | yame pack -f s - ChromHMMfullStack.cm

The output .cm feature file can be used to run enrichment and obtain aggregated methylation levels over different features, see enrichment. For merging multiple .cm files, see yame index.

For more help with pack, run yame pack in the terminal or check out the pack help page.

yame unpack

yame unpack provides the decoding functionality from yame pack.

Example usage:

yame unpack -a -f 1 yourfile.cg

This command will unpack the .cg files for all the samples it contains, and output the fraction of methylation/coverage with coverage >= 1.

For more help with unpack, run yame unpack in the terminal or check out the unpack help page.