Modules

This section documents the Python modules in GRiD.

Command-Line Interface

grid

GRiD - Genomic Repeat inference from Depth

A modular pipeline for estimating copy number variants in the LPA gene’s KIV-2 VNTR region.

grid [OPTIONS] COMMAND [ARGS]...

Options

-v, --version

Show the GRiD version

compute-dipcn

Compute diploid copy numbers for LPA KIV-2 repeats.

Uses neighbor-based normalization to compute dipCN for each exon type: 1B_KIV3, 1B_notKIV3, 1B, and 1A.

grid compute-dipcn [OPTIONS]

Options

-c, --count-file <count_file>

Required Realignment output file with read counts

-n, --neighbor-file <neighbor_file>

Required Find neighbors output file (can be gzipped)

-o, --output-prefix <output_prefix>

Required Output prefix for diploid CN files

-N, --n-neighbors <n_neighbors>

Number of top neighbors to use

Default:

200

count-reads

Count properly paired reads in a specified LPA VNTR region for all CRAMs.

grid count-reads [OPTIONS]

Options

-C, --cram-dir <cram_dir>

Required Directory with CRAM files

-o, --output-file <output_file>

Required Output TSV file

-r, --ref-fasta <ref_fasta>

Required Reference genome FASTA

-c, --chrom <chrom>

Required Chromosome (e.g., chr6)

-s, --start <start>

Required Start position

-e, --end <end>

Required End position

--config <config>

Required Path to YAML config file

-t, --threads <threads>

Number of threads for parallel processing (default: 1)

crai

Ensure CRAI index exists for a CRAM file.

Args:

cram: Path to the CRAM file. reference: Path to the reference genome FASTA.

Returns:

Path to the CRAI index file.

grid crai [OPTIONS]

Options

-c, --cram <cram>

Required Input CRAM file

-r, --reference <reference>

Required Reference genome FASTA

estimate-kiv

Estimate LPA KIV2 copy numbers from exon1A and exon1B diploid copy numbers.

Formula:

diploid_estimate = 34.9 × exon1A + 5.2 × exon1B - 1 haploid_estimate = diploid_estimate / 2

grid estimate-kiv [OPTIONS]

Options

-a, --exon1a <exon1a>

Required Exon1A diploid CN file (*.exon1A.dipCN.txt)

-b, --exon1b <exon1b>

Required Exon1B diploid CN file (*.exon1B.dipCN.txt)

-o, --output <output>

Required Output file path for KIV2 copy number estimates

-f, --format <format>

Output file format

Default:

'tsv'

Options:

tsv | txt | csv

extract-reference

Extract FASTA sequences from a reference genome based on BED regions.

grid extract-reference [OPTIONS]

Options

-r, --reference-fa <reference_fa>

Required Path to reference genome FASTA (e.g., hs37d5.fa)

-b, --bed-file <bed_file>

Required BED file defining regions to extract

-o, --output-dir <output_dir>

Required Output directory for FASTA file

-f, --output-prefix <output_prefix>

Prefix for output FASTA file

Default:

'ref_lpa'

find-neighbors

Find nearest neighbors among individuals using normalized z-depths.

grid find-neighbors [OPTIONS]

Options

-i, --input-file <input_file>

Required Input normalized z-depth file (.gz) from normalize_mosdepth

-o, --output-file <output_file>

Required Output file (will be gzipped and converted to .zMax{zmax}.txt.gz)

-z, --zmax <zmax>

Maximum z-score clipping threshold

Default:

2.0

-n, --n-neighbors <n_neighbors>

Number of nearest neighbors to find

Default:

500

--sigma2-max <sigma2_max>

Maximum variance threshold for filtering regions

Default:

1000.0

gcloud-copy

Copy CRAM files from a Google Cloud bucket to a local directory.

grid gcloud-copy [OPTIONS]

Options

-b, --bucket-path <bucket_path>

Required Google Cloud Storage bucket path (e.g., gs://your-bucket/path/)

-C, --cram-dir <cram_dir>

Local directory to copy CRAM files into

lpa-realign

Realign reads in LPA KIV-2 region for all CRAMs in a directory.

grid lpa-realign [OPTIONS]

Options

-C, --cram-dir <cram_dir>

Required Directory with CRAM files

-o, --output-file <output_file>

Required Output TSV file

-r, --ref-fasta <ref_fasta>

Required Reference genome FASTA

-f, --lpa-ref-fasta <lpa_ref_fasta>

Required LPA KIV-2 reference FASTA for realignment

-P, --positions-file <positions_file>

Required Hardcoded positions file for LPA KIV-2

-g, --genome-build <genome_build>

Genome build

Default:

'hg38'

Options:

hg19 | hg37 | hg38

-c, --chrom <chrom>

Required Chromosome (e.g., chr6)

-s, --start <start>

Required Start position (0-based)

-e, --end <end>

Required End position (0-based)

-t, --threads <threads>

Number of threads for parallel processing (default: 1)

mosdepth

Run mosdepth on all CRAMs in a directory and extract coverage for the LPA VNTR region.

grid mosdepth [OPTIONS]

Options

-C, --cram-dir <cram_dir>

Required Directory with CRAM files

-o, --output-file <output_file>

Required Output TSV file

-r, --ref-fasta <ref_fasta>

Required Reference genome FASTA

-c, --chrom <chrom>

Required Chromosome (e.g., chr6)

-s, --start <start>

Required Start position

-e, --end <end>

Required End position

-n, --region-name <region_name>

Required Name of the region for mosdepth output (default: LPA_VNTR)

-w, --work-dir <work_dir>

Working directory for intermediate files

--by <by>

Bin size for mosdepth –by (Default: 1000)

--fast, --no-fast

Enable or disable mosdepth –fast-mode

-t, --threads <threads>

Number of threads for parallel processing (default: 1)

normalize-mosdepth

Normalize mosdepth coverage across all samples in a directory.

grid normalize-mosdepth [OPTIONS]

Options

-M, --mosdepth-dir <mosdepth_dir>

Required Directory with mosdepth files

-o, --output-file <output_file>

Required Normalized output file (.gz)

-r, --repeat-mask <repeat_mask>

Required Repeat mask BED file

-c, --chrom <chrom>

Required Chromosome (e.g., chr6)

-s, --start <start>

Required Start coordinate

-e, --end <end>

Required End coordinate

--min-depth <min_depth>

Minimum depth to keep (default: 20)

--max-depth <max_depth>

Maximum depth to keep (default: 100)

--top-frac <top_frac>

Top fraction of high-variance regions to keep (default: 0.1)

-t, --threads <threads>

Number of threads for parallel processing (default: 1)

subset

Subset a CRAM file to a specific genomic region.

Args:

cram: Input CRAM file. region: Genomic region in ‘chr:start-end’ format. chrom: Chromosome (if region not provided). start: Start position (if region not provided). end: End position (if region not provided). output: Output CRAM file path. reference: Reference genome FASTA.

Returns:

Path to the subset CRAM file.

grid subset [OPTIONS]

Options

-c, --cram <cram>

Required Input CRAM file

-r, --region <region>

Genomic region (e.g., ‘chr6:160000000-160100000’)

-C, --chrom <chrom>

Chromosome (e.g., chr6)

-s, --start <start>

Start position (e.g., 160000000)

-e, --end <end>

End position (e.g., 160100000)

-o, --output <output>

Required Output subset CRAM file

-R, --reference <reference>

Required Reference genome FASTA

Module Documentation