Helper Functions

Overview

Helper Utilities

Shared utility functions used across multiple GRiD modules. These provide common functionality for file handling, configuration management, and output formatting.

Modules

create_region.py

Purpose: Format genomic region strings for samtools/pysam Functions:

  • create_region_string(region, chrom, start, end) - Creates standardized region strings (e.g., “chr6:160000000-160100000”)

Usage:

from grid.utils.helper_dir.create_region import create_region_string

# From separate components
region = create_region_string(None, "chr6", 160000000, 160100000)
# Returns: "chr6:160000000-160100000"

# From existing region string
region = create_region_string("chr6:160000000-160100000", None, None, None)
# Returns: "chr6:160000000-160100000"

display_results.py

Purpose: Pretty-print results to console using Rich library Functions:

  • display_results(results, title) - Display tabular results with formatting

Usage:

from grid.utils.helper_dir.display_results import display_results

results = [
    {"sample": "S001", "count": 1234},
    {"sample": "S002", "count": 2345}
]
display_results(results, "Read Counts")

find_all_cram_files.py

Purpose: Discover CRAM files in directories Functions:

  • find_all_cram_files(directory) - Recursively find all .cram files

Usage:

from grid.utils.helper_dir.find_all_cram_files import find_all_cram_files

cram_files = find_all_cram_files("data/crams/")
# Returns: List of Path objects

load_flags_from_yaml.py

Purpose: Load SAM flag filters from YAML configuration Functions:

  • load_flags_from_yaml(config_path) - Parse SAM flags from config

Usage:

from grid.utils.helper_dir.load_flags_from_yaml import load_flags_from_yaml

flags = load_flags_from_yaml("config.yaml")
# Returns: Dict with 'required_flags' and 'excluded_flags'

Config Format:

sam_flags:
  required_flags:
    - PROPER_PAIR
    - READ_MAPPED
  excluded_flags:
    - DUPLICATE
    - SECONDARY

setup_output_file.py

Purpose: Prepare output files and directories Functions:

  • setup_output_file(output_path, create_dirs=True) - Ensure output path exists

Usage:

from grid.utils.helper_dir.setup_output_file import setup_output_file

output_file = setup_output_file("results/analysis/output.tsv")
# Creates 'results/analysis/' directory if needed

write_result_to_file.py

Purpose: Write analysis results to disk Functions:

  • write_result_to_file(results, output_file, header) - Write TSV output

  • append_result_to_file(result, output_file) - Append single result

Usage:

from grid.utils.helper_dir.write_result_to_file import write_result_to_file

results = [
    {"sample": "S001", "reads": 1234, "coverage": 45.2},
    {"sample": "S002", "reads": 2345, "coverage": 52.1}
]
write_result_to_file(
    results,
    "output.tsv",
    header=["sample", "reads", "coverage"]
)

Design Principles

These helpers follow DRY (Don’t Repeat Yourself) principles:

  • Reusable: Used across multiple pipeline modules

  • Focused: Each module does one thing well

  • Flexible: Accept various input formats when reasonable

  • Type-safe: Use pathlib.Path for file operations

Common Patterns

Most utilities follow these conventions:

  • Accept Path or str for file paths

  • Return standardized types (Path, dict, list)

  • Raise descriptive exceptions on errors

  • Support both programmatic and CLI usage

Python API Documentation

grid.utils.helper_dir

grid.utils.helper_dir.create_region_string(region, chrom, start, end)

Create a genomic region string in ‘chr:start-end’ format.

Parameters:
  • region (str) – Full region string (e.g., ‘chr6:160000000-160100000’)

  • chrom (str) – Chromosome name (e.g., ‘chr6’)

  • start (int) – Start position (e.g., 160000000)

  • end (int) – End position (e.g., 160100000)

Returns:

Genomic region string.

Return type:

str

grid.utils.helper_dir.find_cram_files(cram_dir)

Return all CRAM files in a directory.

Parameters:

cram_dir (str) – Path to the directory containing CRAM files.

Returns:

List of paths to CRAM files.

Return type:

List[str]

grid.utils.helper_dir.load_flags(config_file, parameter)

Load read count flags from a YAML configuration file.

Parameters:
  • config_file (str) – Path to the YAML configuration file.

  • parameter (str) – The parameter name to look for in the YAML file.

Returns:

A set of read count flags.

Return type:

Set[int]

grid.utils.helper_dir.print_batch_summary(total_processed, failed_items, operation='processed')

Print summary of a batch operation.

Parameters:
  • total_processed (int) – Total number of items processed

  • failed_items (list) – List of failed item names/basenames

  • operation (str) – Description of operation (e.g., “indexed”, “processed”, “counted”)

Return type:

None

grid.utils.helper_dir.print_individual_error(basename, error, progress_console=None)

Print error message for an individual item.

Parameters:
  • basename (str) – Item basename/name

  • error (str) – Error message

  • progress_console – Optional Progress console for proper rendering

Return type:

None

grid.utils.helper_dir.print_individual_success(basename, message, progress_console=None)

Print success message for an individual item.

Parameters:
  • basename (str) – Item basename/name

  • message (str) – Success message (e.g., “done: reads=1234” or “indexed”)

  • progress_console – Optional Progress console for proper rendering

Return type:

None

grid.utils.helper_dir.setup_output_file(output_file, chrom, start, end)

Create output file with header.

Parameters:
  • output_file (str) – Path to output TSV file

  • chrom (str) – Chromosome name

  • start (int) – Start position

  • end (int) – End position

Returns:

Path object for the output file

Return type:

Path

grid.utils.helper_dir.write_result_to_file(output_file, basename, count, write_lock, progress_console=None)

Write a single result to the output file in a thread-safe manner.

Parameters:
  • output_file (Path) – Path to output file

  • basename (str) – Sample basename

  • count (int | str) – Read count or “Error”

  • write_lock (lock) – Threading lock for safe file writing

  • progress_console – Optional Progress console for proper rendering

Return type:

None