Modules¶
This section documents the Python modules in GRiD.
Command-Line Interface¶
grid¶
GRiD - Genomic Repeat inference from Depth
A modular pipeline for estimating copy number variants in the LPA gene’s KIV-2 VNTR region.
grid [OPTIONS] COMMAND [ARGS]...
Options
- -v, --version¶
Show the GRiD version
compute-dipcn¶
Compute diploid copy numbers for LPA KIV-2 repeats.
Uses neighbor-based normalization to compute dipCN for each exon type: 1B_KIV3, 1B_notKIV3, 1B, and 1A.
grid compute-dipcn [OPTIONS]
Options
- -c, --count-file <count_file>¶
Required Realignment output file with read counts
- -n, --neighbor-file <neighbor_file>¶
Required Find neighbors output file (can be gzipped)
- -o, --output-prefix <output_prefix>¶
Required Output prefix for diploid CN files
- -N, --n-neighbors <n_neighbors>¶
Number of top neighbors to use
- Default:
200
count-reads¶
Count properly paired reads in a specified LPA VNTR region for all CRAMs.
grid count-reads [OPTIONS]
Options
- -C, --cram-dir <cram_dir>¶
Required Directory with CRAM files
- -o, --output-file <output_file>¶
Required Output TSV file
- -r, --ref-fasta <ref_fasta>¶
Required Reference genome FASTA
- -c, --chrom <chrom>¶
Required Chromosome (e.g., chr6)
- -s, --start <start>¶
Required Start position
- -e, --end <end>¶
Required End position
- --config <config>¶
Required Path to YAML config file
- -t, --threads <threads>¶
Number of threads for parallel processing (default: 1)
crai¶
Ensure CRAI index exists for a CRAM file.
- Args:
cram: Path to the CRAM file. reference: Path to the reference genome FASTA.
- Returns:
Path to the CRAI index file.
grid crai [OPTIONS]
Options
- -c, --cram <cram>¶
Required Input CRAM file
- -r, --reference <reference>¶
Required Reference genome FASTA
estimate-kiv¶
Estimate LPA KIV2 copy numbers from exon1A and exon1B diploid copy numbers.
- Formula:
diploid_estimate = 34.9 × exon1A + 5.2 × exon1B - 1 haploid_estimate = diploid_estimate / 2
grid estimate-kiv [OPTIONS]
Options
- -o, --output <output>¶
Required Output file path for KIV2 copy number estimates
- -f, --format <format>¶
Output file format
- Default:
'tsv'- Options:
tsv | txt | csv
extract-reference¶
Extract FASTA sequences from a reference genome based on BED regions.
grid extract-reference [OPTIONS]
Options
- -r, --reference-fa <reference_fa>¶
Required Path to reference genome FASTA (e.g., hs37d5.fa)
- -b, --bed-file <bed_file>¶
Required BED file defining regions to extract
- -o, --output-dir <output_dir>¶
Required Output directory for FASTA file
- -f, --output-prefix <output_prefix>¶
Prefix for output FASTA file
- Default:
'ref_lpa'
find-neighbors¶
Find nearest neighbors among individuals using normalized z-depths.
grid find-neighbors [OPTIONS]
Options
- -i, --input-file <input_file>¶
Required Input normalized z-depth file (.gz) from normalize_mosdepth
- -o, --output-file <output_file>¶
Required Output file (will be gzipped and converted to .zMax{zmax}.txt.gz)
- -z, --zmax <zmax>¶
Maximum z-score clipping threshold
- Default:
2.0
- -n, --n-neighbors <n_neighbors>¶
Number of nearest neighbors to find
- Default:
500
- --sigma2-max <sigma2_max>¶
Maximum variance threshold for filtering regions
- Default:
1000.0
gcloud-copy¶
Copy CRAM files from a Google Cloud bucket to a local directory.
grid gcloud-copy [OPTIONS]
Options
- -b, --bucket-path <bucket_path>¶
Required Google Cloud Storage bucket path (e.g., gs://your-bucket/path/)
- -C, --cram-dir <cram_dir>¶
Local directory to copy CRAM files into
lpa-realign¶
Realign reads in LPA KIV-2 region for all CRAMs in a directory.
grid lpa-realign [OPTIONS]
Options
- -C, --cram-dir <cram_dir>¶
Required Directory with CRAM files
- -o, --output-file <output_file>¶
Required Output TSV file
- -r, --ref-fasta <ref_fasta>¶
Required Reference genome FASTA
- -f, --lpa-ref-fasta <lpa_ref_fasta>¶
Required LPA KIV-2 reference FASTA for realignment
- -P, --positions-file <positions_file>¶
Required Hardcoded positions file for LPA KIV-2
- -g, --genome-build <genome_build>¶
Genome build
- Default:
'hg38'- Options:
hg19 | hg37 | hg38
- -c, --chrom <chrom>¶
Required Chromosome (e.g., chr6)
- -s, --start <start>¶
Required Start position (0-based)
- -e, --end <end>¶
Required End position (0-based)
- -t, --threads <threads>¶
Number of threads for parallel processing (default: 1)
mosdepth¶
Run mosdepth on all CRAMs in a directory and extract coverage for the LPA VNTR region.
grid mosdepth [OPTIONS]
Options
- -C, --cram-dir <cram_dir>¶
Required Directory with CRAM files
- -o, --output-file <output_file>¶
Required Output TSV file
- -r, --ref-fasta <ref_fasta>¶
Required Reference genome FASTA
- -c, --chrom <chrom>¶
Required Chromosome (e.g., chr6)
- -s, --start <start>¶
Required Start position
- -e, --end <end>¶
Required End position
- -n, --region-name <region_name>¶
Required Name of the region for mosdepth output (default: LPA_VNTR)
- -w, --work-dir <work_dir>¶
Working directory for intermediate files
- --by <by>¶
Bin size for mosdepth –by (Default: 1000)
- --fast, --no-fast¶
Enable or disable mosdepth –fast-mode
- -t, --threads <threads>¶
Number of threads for parallel processing (default: 1)
normalize-mosdepth¶
Normalize mosdepth coverage across all samples in a directory.
grid normalize-mosdepth [OPTIONS]
Options
- -M, --mosdepth-dir <mosdepth_dir>¶
Required Directory with mosdepth files
- -o, --output-file <output_file>¶
Required Normalized output file (.gz)
- -r, --repeat-mask <repeat_mask>¶
Required Repeat mask BED file
- -c, --chrom <chrom>¶
Required Chromosome (e.g., chr6)
- -s, --start <start>¶
Required Start coordinate
- -e, --end <end>¶
Required End coordinate
- --min-depth <min_depth>¶
Minimum depth to keep (default: 20)
- --max-depth <max_depth>¶
Maximum depth to keep (default: 100)
- --top-frac <top_frac>¶
Top fraction of high-variance regions to keep (default: 0.1)
- -t, --threads <threads>¶
Number of threads for parallel processing (default: 1)
subset¶
Subset a CRAM file to a specific genomic region.
- Args:
cram: Input CRAM file. region: Genomic region in ‘chr:start-end’ format. chrom: Chromosome (if region not provided). start: Start position (if region not provided). end: End position (if region not provided). output: Output CRAM file path. reference: Reference genome FASTA.
- Returns:
Path to the subset CRAM file.
grid subset [OPTIONS]
Options
- -c, --cram <cram>¶
Required Input CRAM file
- -r, --region <region>¶
Genomic region (e.g., ‘chr6:160000000-160100000’)
- -C, --chrom <chrom>¶
Chromosome (e.g., chr6)
- -s, --start <start>¶
Start position (e.g., 160000000)
- -e, --end <end>¶
End position (e.g., 160100000)
- -o, --output <output>¶
Required Output subset CRAM file
- -R, --reference <reference>¶
Required Reference genome FASTA
Module Documentation¶
Core Modules
Utility Modules
Support Modules