Modules¶

This section documents the Python modules in GRiD.

Command-Line Interface¶

grid¶

GRiD - Genomic Repeat inference from Depth

A modular pipeline for estimating copy number variants in the LPA gene’s KIV-2 VNTR region.

grid [OPTIONS] COMMAND [ARGS]...

Options

-v, --version¶: Show the GRiD version

compute-dipcn¶

Compute diploid copy numbers for LPA KIV-2 repeats.

Uses neighbor-based normalization to compute dipCN for each exon type: 1B_KIV3, 1B_notKIV3, 1B, and 1A.

grid compute-dipcn [OPTIONS]

Options

-c, --count-file <count_file>¶: Required Realignment output file with read counts

-n, --neighbor-file <neighbor_file>¶: Required Find neighbors output file (can be gzipped)

-o, --output-prefix <output_prefix>¶: Required Output prefix for diploid CN files

-N, --n-neighbors <n_neighbors>¶

Number of top neighbors to use

Default:: 200

count-reads¶

Count properly paired reads in a specified LPA VNTR region for all CRAMs.

grid count-reads [OPTIONS]

Options

-C, --cram-dir <cram_dir>¶: Required Directory with CRAM files

-o, --output-file <output_file>¶: Required Output TSV file

-r, --ref-fasta <ref_fasta>¶: Required Reference genome FASTA

-c, --chrom <chrom>¶: Required Chromosome (e.g., chr6)

-s, --start <start>¶: Required Start position

-e, --end <end>¶: Required End position

--config <config>¶: Required Path to YAML config file

-t, --threads <threads>¶: Number of threads for parallel processing (default: 1)

crai¶

Ensure CRAI index exists for a CRAM file.

Args:: cram: Path to the CRAM file. reference: Path to the reference genome FASTA.
Returns:: Path to the CRAI index file.

grid crai [OPTIONS]

Options

-c, --cram <cram>¶: Required Input CRAM file

-r, --reference <reference>¶: Required Reference genome FASTA

estimate-kiv¶

Estimate LPA KIV2 copy numbers from exon1A and exon1B diploid copy numbers.

Formula:: diploid_estimate = 34.9 × exon1A + 5.2 × exon1B - 1 haploid_estimate = diploid_estimate / 2

grid estimate-kiv [OPTIONS]

Options

-a, --exon1a <exon1a>¶: Required Exon1A diploid CN file (*.exon1A.dipCN.txt)

-b, --exon1b <exon1b>¶: Required Exon1B diploid CN file (*.exon1B.dipCN.txt)

-o, --output <output>¶: Required Output file path for KIV2 copy number estimates

-f, --format <format>¶

Output file format

Default:: 'tsv'
Options:: tsv | txt | csv

extract-reference¶

Extract FASTA sequences from a reference genome based on BED regions.

grid extract-reference [OPTIONS]

Options

-r, --reference-fa <reference_fa>¶: Required Path to reference genome FASTA (e.g., hs37d5.fa)

-b, --bed-file <bed_file>¶: Required BED file defining regions to extract

-o, --output-dir <output_dir>¶: Required Output directory for FASTA file

-f, --output-prefix <output_prefix>¶

Prefix for output FASTA file

Default:: 'ref_lpa'

find-neighbors¶

Find nearest neighbors among individuals using normalized z-depths.

grid find-neighbors [OPTIONS]

Options

-i, --input-file <input_file>¶: Required Input normalized z-depth file (.gz) from normalize_mosdepth

-o, --output-file <output_file>¶: Required Output file (will be gzipped and converted to .zMax{zmax}.txt.gz)

-z, --zmax <zmax>¶

Maximum z-score clipping threshold

Default:: 2.0

-n, --n-neighbors <n_neighbors>¶

Number of nearest neighbors to find

Default:: 500

--sigma2-max <sigma2_max>¶

Maximum variance threshold for filtering regions

Default:: 1000.0

gcloud-copy¶

Copy CRAM files from a Google Cloud bucket to a local directory.

grid gcloud-copy [OPTIONS]

Options

-b, --bucket-path <bucket_path>¶: Required Google Cloud Storage bucket path (e.g., gs://your-bucket/path/)

-C, --cram-dir <cram_dir>¶: Local directory to copy CRAM files into

lpa-realign¶

Realign reads in LPA KIV-2 region for all CRAMs in a directory.

grid lpa-realign [OPTIONS]

Options

-C, --cram-dir <cram_dir>¶: Required Directory with CRAM files

-o, --output-file <output_file>¶: Required Output TSV file

-r, --ref-fasta <ref_fasta>¶: Required Reference genome FASTA

-f, --lpa-ref-fasta <lpa_ref_fasta>¶: Required LPA KIV-2 reference FASTA for realignment

-P, --positions-file <positions_file>¶: Required Hardcoded positions file for LPA KIV-2

-g, --genome-build <genome_build>¶

Genome build

Default:: 'hg38'
Options:: hg19 | hg37 | hg38

-c, --chrom <chrom>¶: Required Chromosome (e.g., chr6)

-s, --start <start>¶: Required Start position (0-based)

-e, --end <end>¶: Required End position (0-based)

-t, --threads <threads>¶: Number of threads for parallel processing (default: 1)

mosdepth¶

Run mosdepth on all CRAMs in a directory and extract coverage for the LPA VNTR region.

grid mosdepth [OPTIONS]

Options

-C, --cram-dir <cram_dir>¶: Required Directory with CRAM files

-o, --output-file <output_file>¶: Required Output TSV file

-r, --ref-fasta <ref_fasta>¶: Required Reference genome FASTA

-c, --chrom <chrom>¶: Required Chromosome (e.g., chr6)

-s, --start <start>¶: Required Start position

-e, --end <end>¶: Required End position

-n, --region-name <region_name>¶: Required Name of the region for mosdepth output (default: LPA_VNTR)

-w, --work-dir <work_dir>¶: Working directory for intermediate files

--by <by>¶: Bin size for mosdepth –by (Default: 1000)

--fast, --no-fast¶: Enable or disable mosdepth –fast-mode

-t, --threads <threads>¶: Number of threads for parallel processing (default: 1)

normalize-mosdepth¶

Normalize mosdepth coverage across all samples in a directory.

grid normalize-mosdepth [OPTIONS]

Options

-M, --mosdepth-dir <mosdepth_dir>¶: Required Directory with mosdepth files

-o, --output-file <output_file>¶: Required Normalized output file (.gz)

-r, --repeat-mask <repeat_mask>¶: Required Repeat mask BED file

-c, --chrom <chrom>¶: Required Chromosome (e.g., chr6)

-s, --start <start>¶: Required Start coordinate

-e, --end <end>¶: Required End coordinate

--min-depth <min_depth>¶: Minimum depth to keep (default: 20)

--max-depth <max_depth>¶: Maximum depth to keep (default: 100)

--top-frac <top_frac>¶: Top fraction of high-variance regions to keep (default: 0.1)

-t, --threads <threads>¶: Number of threads for parallel processing (default: 1)

subset¶

Subset a CRAM file to a specific genomic region.

Args:: cram: Input CRAM file. region: Genomic region in ‘chr:start-end’ format. chrom: Chromosome (if region not provided). start: Start position (if region not provided). end: End position (if region not provided). output: Output CRAM file path. reference: Reference genome FASTA.
Returns:: Path to the subset CRAM file.

grid subset [OPTIONS]

Options

-c, --cram <cram>¶: Required Input CRAM file

-r, --region <region>¶: Genomic region (e.g., ‘chr6:160000000-160100000’)

-C, --chrom <chrom>¶: Chromosome (e.g., chr6)

-s, --start <start>¶: Start position (e.g., 160000000)

-e, --end <end>¶: End position (e.g., 160100000)

-o, --output <output>¶: Required Output subset CRAM file

-R, --reference <reference>¶: Required Reference genome FASTA

Modules¶

Command-Line Interface¶

grid¶

compute-dipcn¶

count-reads¶

crai¶

estimate-kiv¶

extract-reference¶

find-neighbors¶

gcloud-copy¶

lpa-realign¶

mosdepth¶

normalize-mosdepth¶

subset¶

Module Documentation¶