4. Subsetting Rows in a Packed `.cx` File

YAME provides several tools for extracting subsets of rows (genomic sites) or breaking a .cx file into smaller pieces. The main tools covered here are:

yame rowsub — extract specific CpG rows or ranges
yame chunk — divide a .cx file into size-based chunks
yame chunkchar — divide large text files into size-based chunks

This section describes how to use each tool and the most common workflows.

4.1 Subset CpG Rows with `yame rowsub`

yame rowsub allows you to extract specific genomic rows from a .cx file.
This is useful when:

You want methylation values only at specific CpG sites
You want to subset a contiguous genomic block
You want to filter using a binary mask (fmt0)
You want to extract rows by genomic coordinate rather than integer index

The output is written to stdout, allowing direct piping or saving to file.

Basic Usage

yame rowsub [options] <in.cx> > subset.cx

The command supports three major ways to select rows:

A. Select rows by integer index (`-l`)

If you have a list of row indices (1-based), e.g.:

Then run:

yame rowsub -l row_ids.txt yourfile.cx > subset.cx

No sorting is required; YAME internally maintains the given order.

B. Select rows by genomic coordinate labels (`-L` + `-R`)

This is the most common method for extracting specific CpG sites.

You will need:

A row coordinate file (.cr) containing genome coordinates for each .cx row YAME provides:
- mm10 row coordinates
- hg38 row coordinates
A list of site names in the format chr_beg1, one per line, e.g.:

chr16_18300002
chr16_18300046
chr16_18300140
chr16_18300162
chr16_18300172

Then run:

yame rowsub -R cpg_nocontig.cr -L CpG_sites.tsv yourfile.cx > subset.cx

YAME internally:

Loads the .cr coordinate file
Maps each chrX_pos string to the correct row index
Extracts only those rows
Outputs a smaller .cx

You can also include coordinates as the first dataset in the output with:

yame rowsub -1 -R cpg_nocontig.cr -L CpG_sites.tsv yourfile.cx > subset.cx

C. Select rows using a binary mask (`-m`)

Mask file must be format 0 or 1, representing a vector of 0/1 flags.

Example:

yame rowsub -m mask.cx yourfile.cx > subset.cx

Any row where the mask equals 1 is retained.

This acts similarly to feature masking in yame summary.

D. Select a contiguous block (`-B` or `-I`)

Select a row range (0-based):

yame rowsub -B 1000_2000 yourfile.cx > subset.cx

Extracts rows 1000–1999.

If only one number is given:

yame rowsub -B 1000 yourfile.cx

Outputs a single row.

Select a block by block index (`-I`)

Useful for chunked batch processing:

yame rowsub -I 5_1000000 yourfile.cx

This extracts:

Block index = 5
Block size = 1,000,000 rows
Row range = 5,000,000 to 5,999,999

If block size is omitted, default = 1,000,000.

Summary: Ways to Select Rows

Method	Option	When to Use
Integer row list	`-l <file>`	You know row indices
Genomic coords	`-L <file> -R <row.cr>`	You know genome coordinates
Binary mask	`-m <mask.cx>`	Filtering by precomputed selection
Range of rows	`-B <beg_end>`	Simple slicing by index
Block slicing	`-I <block_blockSize>`	Batch processing
Include coords in output	`-1`	To append `.cr` for clarity

4.1.1 `rowsub` Common Examples

Extract a single genomic region

yame rowsub -B 500000_510000 input.cx > region.cx

Extract CpGs in a BED region

(using row coordinate file + interval expansion)

awk '{print $1"_"$2+1}' region.bed > coords.txt
yame rowsub -R genome.cr -L coords.txt input.cx > subset.cx

Apply a mask file

yame rowsub -m mymask.cx input.cx > masked.cx

4.2 Chunking `.cx` Files with `yame chunk`

yame chunk splits a packed .cx file into multiple smaller .cx files, each containing a fixed number of rows.

This is essential for:

Distributed computing
Parallel model training
Memory-efficient processing
Splitting extremely large .cx files into manageable pieces

Basic Usage

yame chunk -s <chunkSize> input.cx output_dir/

If output_dir is not provided, a directory named:

input.cx_chunks/

is automatically created.

Example

Split into chunks of 500,000 rows each:

yame chunk -s 500000 input.cx chunks/

This produces:

chunks/0.cx
chunks/1.cx
chunks/2.cx
...

Each chunk file maintains the same number of samples as the original. All samples are split identically row-wise.

4.3 Chunking Text Files with `yame chunkchar`

yame chunkchar works like chunk, but for plain text files rather than .cx files. This is useful for splitting:

BED files
FASTA headers
List files
Any long line-based text file

Basic Usage

yame chunkchar -s <chunkSize> input.txt

By default, output is written to:

input.txt_chunks/

Example

Split a large text file into 1M-line chunks:

yame chunkchar -s 1000000 sites.txt

Outputs:

sites.txt_chunks/0.txt
sites.txt_chunks/1.txt
sites.txt_chunks/2.txt
...

Each file contains up to chunkSize lines.

4.4 Help and Developer References

For additional details:

Run with -h

yame rowsub -h
yame chunk -h
yame chunkchar -h

4.5 Summary Table

Command	Input	Output	Purpose
`rowsub`	`.cx`	`.cx` to stdout	Fine-grained row selection
`chunk`	`.cx`	multiple `.cx`	Split methylation matrix into fixed-size parts
`chunkchar`	text	multiple `.txt`	Split large text files

4. Subsetting Rows in a Packed .cx File

4.1 Subset CpG Rows with yame rowsub

Basic Usage

A. Select rows by integer index (-l)

B. Select rows by genomic coordinate labels (-L + -R)

C. Select rows using a binary mask (-m)

D. Select a contiguous block (-B or -I)

Select a row range (0-based):

Select a block by block index (-I)

Summary: Ways to Select Rows

4.1.1 rowsub Common Examples

Extract a single genomic region

Extract CpGs in a BED region

Apply a mask file

4.2 Chunking .cx Files with yame chunk

Basic Usage

Example

4.3 Chunking Text Files with yame chunkchar

Basic Usage

Example

4.4 Help and Developer References

4.5 Summary Table

4. Subsetting Rows in a Packed `.cx` File

4.1 Subset CpG Rows with `yame rowsub`

A. Select rows by integer index (`-l`)

B. Select rows by genomic coordinate labels (`-L` + `-R`)

C. Select rows using a binary mask (`-m`)

D. Select a contiguous block (`-B` or `-I`)

Select a block by block index (`-I`)

4.1.1 `rowsub` Common Examples

4.2 Chunking `.cx` Files with `yame chunk`

4.3 Chunking Text Files with `yame chunkchar`