2. Summarization of Packed .cx Files
yame summary and yame info provide quick, high-level summaries of .cx files created via yame pack. These tools help users inspect sample names, structural parameters, and feature-level statistics such as enrichment, methylation levels, and coverage.
To run a simple summary:
yame info yourfile.cg
yame summary yourfile.cg
Both single-sample and multi-sample .cx files are supported.
Overview of What yame summary Computes
For each query sample, YAME reports:
- N_univ — total number of sites considered (universe)
- N_query — number of sites present or methylated in the query
- N_mask — number of sites in the mask
- N_overlap — intersection of query and mask
- Log2OddsRatio — an enrichment score
- Beta — average methylation or proportion (format dependent)
- Depth — approximate sequencing depth (only for formats with M/U counts)
Mask files may contain multiple masks; each is evaluated independently.
Supported Input Formats
The query .cx file may be in any of the following internal YAME formats:
| Format | Meaning | How Summary Interprets It |
|---|---|---|
| 0 / 1 | Binary vectors | Presence/absence |
| 2 | State labels | Multi-class state-specific summary |
| 3 | Methylation (M/U counts) | Beta, depth, overlap |
| 6 | Binary with universe bit | Sparse methylation (e.g., single cell) |
Masks (-m) may also be in formats 0, 1, 2, or 6.
Format 7 (BED-like coordinates) is ignored by summary.
Using Mask Feature Files
Masks allow aggregation of methylation or presence across functional features, e.g.:
- Fixed-size windows
- Chromatin states
- CpG islands, gene promoters
- Custom regions
Each mask is processed as a separate feature, producing a separate row in the summary output.
If no mask is provided, YAME produces a global summary of the entire dataset.
Example: Summarization with a Window Mask File
Single-cell DNA methylomes are sparse, so window bins (e.g., 100 kb) are commonly used.
Download the mask file:
wget https://raw.githubusercontent.com/zhou-lab/KYCGKB_mm10/main/Win100k.20220228.cm
Run:
yame summary -m Win100k.20220228.cm single_cell.cg
Example output:
QFile Query MFile Mask N_univ N_query N_mask N_overlap Log2OddsRatio Beta Depth
single_cell.cg Sample1 Win100k.20220228.cm chr1:30 21867837 1861715 589 48 -0.07 0.688 0.1
single_cell.cg Sample1 Win100k.20220228.cm chr1:31 21867837 1861715 574 36 -0.48 0.917 0.1
Explanation of Output Columns
1. QFile
The query .cx file.
2. Query
Sample name from the sample index. If missing, YAME assigns numerical IDs.
3. MFile
The mask file used (or "global" when no mask is provided).
4. Mask
The name of each mask:
- State label for format 2
- Sample name for binary masks
- Key names if
-Tis used
5. N_univ
Number of valid universe sites, depending on format:
- Format 6 uses only universe-bit sites
- Other formats use total vector length
6. N_query
Number of “present” positions in the query:
- Formats 0/1: value = 1
- Format 3: sites with nonzero M+U
- Format 6: universe-bit and set-bit sites
7. N_mask
Number of positions included in the mask.
8. N_overlap
Intersection of query and mask.
9. Log2OddsRatio
Enrichment score computed from the 2×2 contingency table of query × mask membership.
10. Beta
Average methylation or binary fraction inside the mask:
- Format 3: mean M/(M+U)
- Binary formats: fraction of “1”
- Format 6: fraction of SET sites within universe
11. Depth
Average sequencing depth (M+U) across mask sites (format 3 only).
Special Behaviors
Multiple Masks
If the mask file contains multiple samples, each is processed independently.
State Masks (Format 2)
Each state produces a distinct summary row.
Universe Subsetting (-u)
Applies an additional universe mask to both query and features.
Memory Mode (-M)
Loads all masks into RAM to minimize disk access.
Header Suppression (-H)
Removes the header line for scripting convenience.
Command Reference
Usage: yame summary [options] <query.cx>
Options:
-m FILE Mask feature (.cx) file. May contain multiple masks.
-M Load all masks into memory.
-u FILE Optional universe .cx file.
-H Suppress header output.
-q NAME Name to use when query file is '-'.
-F Use full file paths in output.
-T Always show section names (format 2).
-s FILE Sample list overriding the query index.
-h Display help.
Additional Documentation
See also: