Skip to content
TUCCA Our TeamHelpCAAIL ↗

Tufts HPC Best Practices

This guide provides advanced tips and best practices for using the tucca-rna-seq workflow on the Tufts HPC cluster efficiently and effectively.


Recommended organization:

/cluster/tufts/your_lab/your_utln/
├── projects/
│ ├── project_1/
│ │ ├── tucca-rna-seq/
│ │ ├── data/
│ │ └── results/
│ └── project_2/
└── shared_resources/
├── reference_genomes/
└── software_cache/
  • Home directory (/cluster/home/your_utln): 30 GB, fixed — too small to run from
  • Lab workspace (/cluster/tufts/your_lab/your_utln): from 50 GB up; request more as needed (usage over 10 TB becomes chargeable starting July 1, 2026)
  • Monitor usage: module load hpctools && hpctools (project-storage and directory-usage options)
  • Archive old projects to free up space

Quotas and policies are set by Tufts, not this workflow — always confirm the current numbers in the storage guide below before relying on them.

TTS Research Technology Lab Research Project Storage Guide

To avoid filling your home directory and to save time by reusing software environments across multiple projects, it is highly recommended to set up a permanent, shared cache. You can do this by defining environment variables in your shell’s startup file (e.g., ~/.bashrc).

This is also a good place to add the module load commands you use frequently.

Terminal window
# Add to your ~/.bashrc
# Load required software
module load snakemake/8.27.1
module load singularity/3.8.4
module load miniforge/24.11.2-py312
# Set permanent cache directories for Snakemake
export SNAKEMAKE_CONDA_PREFIX="/cluster/tufts/your_lab/shared/conda_envs"
export SNAKEMAKE_APPTAINER_PREFIX="/cluster/tufts/your_lab/shared/apptainer_images"
Tufts HPC Guide: Conda Environments

Creating Private Modules for Custom Software

Section titled “Creating Private Modules for Custom Software”

For any software you install manually (like a specific version of Snakemake via Mambaforge), creating a private environment module is an excellent way to integrate it seamlessly into your HPC workflow. This makes loading your custom tools as easy as loading system-wide modules (e.g., module load my-snakemake).

The process involves creating a simple text file that defines the necessary environment variables (like PATH) for your software and placing it in a specific directory (~/privatemodules on the Tufts cluster).

Tufts HPC Guide: Creating Private Modules

The Snakemake-specific environment variables described in the previous section are an excellent way to control software caching per-workflow.

As a more general alternative, you can configure Conda’s default settings globally using a .condarc file in your home directory. This tells Conda where to store package caches and environments for all uses, not just for this Snakemake workflow. This is a powerful, “set-it-and-forget-it” solution that is a common best practice on many HPC systems.

To configure this, you add the desired paths to a .condarc file:

~/.condarc
envs_dirs:
- /path/to/your/shared/storage/conda_envs/
pkgs_dirs:
- /path/to/your/shared/storage/conda_pkgs/

Use symbolic links for large data:

Terminal window
# Link reference genomes to avoid duplication
ln -s /cluster/tufts/kaplanlab/shared/reference_genomes/human_GRCh38/ \\
resources/reference_genomes/human_GRCh38

Compress old results:

Terminal window
# Archive completed projects
tar -czf project_archive_$(date +%Y%m%d).tar.gz project_directory/

Basic interactive session:

Terminal window
srun -p batch --pty bash

Customized interactive session:

Terminal window
srun -p batch -n 4 --mem=8G --time=2:00:00 --pty bash

Parameters:

  • -n: Number of CPU cores
  • --mem: Memory allocation
  • --time: Maximum runtime (the batch, gpu, and preempt partitions each cap at 2 days)
  • -p: Partition — batch (CPU, default), gpu, or preempt

For running Snakemake on the Tufts HPC, we recommend launching the workflow from within a long-running interactive session, rather than submitting the entire workflow as a single batch job.

The main Snakemake process is not computationally intensive; its primary role is to manage and submit the actual resource-intensive tasks to the cluster scheduler. Running it interactively makes it much easier to monitor progress, view logs in real-time, and debug any issues that arise.

For detailed instructions on how to execute the workflow, please refer to the main Running Guide. The sections below provide supplementary, Tufts-specific best practices.

  1. Start a long-running screen or tmux session. A terminal multiplexer is essential for preventing your workflow from being terminated if your local connection to the HPC is interrupted.

    Terminal window
    # Start a screen session
    screen -S snakemake_session
    # To detach: Ctrl+A, then D
    # To reattach later: screen -r snakemake_session
  2. Request an interactive node. From within your screen session, request an interactive node with enough time for your workflow to complete. The batch partition caps at 2 days, so request up to that limit.

    Terminal window
    # Request a 2-day interactive session (the partition maximum)
    srun -p batch --time=2-00:00:00 --pty bash
  3. Prepare your environment and run the workflow. Once in the interactive session, navigate to your project directory, load the necessary modules, and launch Snakemake.

    Terminal window
    # Navigate to your project
    cd /cluster/tufts/your_lab/your_utln/my_rnaseq_project/tucca-rna-seq
    # Load modules
    module purge
    module load snakemake/8.27.1 singularity/3.8.4 miniforge/24.11.2-py312
    # Launch the workflow
    snakemake all --workflow-profile profiles/slurm

If you cannot maintain a persistent interactive session, you can submit the workflow as a “fire-and-forget” batch job. You will be notified by email when the job completes or fails, but you lose the ability to monitor the workflow in real-time easily.

Example submission script:

# Create submission script
cat > submit_workflow.sh << 'EOF'
#!/bin/bash
#SBATCH -p batch
#SBATCH --time=2-00:00:00
#SBATCH --mem=32G
#SBATCH --cpus-per-task=12
#SBATCH --mail-type=ALL
#SBATCH --mail-user=your.email@tufts.edu
module purge
module load snakemake/8.27.1 singularity/3.8.4 miniforge/24.11.2-py312
snakemake all --workflow-profile profiles/slurm
EOF
# Submit the job
sbatch submit_workflow.sh

Always purge modules first:

Terminal window
module purge
module load snakemake/8.27.1
module load singularity/3.8.4
module load miniforge/24.11.2-py312

Check module conflicts:

Terminal window
module list
module show snakemake/8.27.1

Tune workflow resources:

The slurm profile allows you to specify default and rule-specific resources. For a complete guide on how this works, see the main Configuration Guide. Below is a Tufts-specific example.

profiles/slurm/config.v8+.yaml
default-resources:
slurm_partition: "batch"
# slurm_account: omit unless your lab uses Slurm accounts
runtime: 1440 # 24 h (batch caps at 2 days = 2880 min)
mem_mb: 32000 # 32GB RAM
cpus_per_task: 12 # 12 CPU cores
# Override for specific rules (keep runtime <= 2880; the partition cap is 2 days)
set-resources:
star_index:
mem_mb: 64000 # 64GB RAM for genome indexing
runtime: 2880 # 2 days (partition maximum)
salmon_quant:
cpus_per_task: 8 # 8 cores for quantification

Parallel execution:

Terminal window
# Limit concurrent jobs
snakemake all --workflow-profile profiles/slurm --jobs 50
# Use all available cores locally
snakemake all --use-conda --cores all

Job and resource monitoring on Tufts HPC is standard Slurm plus a Tufts helper — this workflow doesn’t change any of it, so use the cluster’s own tooling:

  • Your queued/running jobs: squeue -u $USER
  • Why a finished job over/under-ran its resources: seff <job_id>
  • Free CPU/GPU/memory, project quota, directory usage: module load hpctools && hpctools (its menu is the canonical Tufts answer for “what’s free / how much have I used”)
  • Workflow progress: tail the Snakemake log under .snakemake/log/, and re-run with --show-failed-logs to surface a failed rule’s job log

For the full Slurm monitoring reference (squeue, sacct, seff, utilization), see the Tufts SLURM monitoring guide.


| Issue | Cause | Solution | |-------|-------|----------| | Jobs stuck in queue | Resource limits too high | Reduce memory/CPU requirements | | Module not found | Module not available | Check module avail | | Permission denied | Wrong directory | Use lab workspace, not home | | Quota exceeded | Storage limit reached | Archive old projects | | Job killed | Time limit exceeded | Increase --time (max 2 days per partition); split long rules or use preempt |

TTS Research Technology:

Workflow-specific issues:


After mastering these best practices:

  1. Optimize your workflow for your specific analysis needs
  2. Share resources with lab members when possible
  3. Contribute to the workflow development
  4. Help others in your lab get started

These best practices will help you use the Tufts HPC cluster efficiently and avoid common pitfalls. Remember to always check the main documentation first!

For complete workflow documentation, see the main guides.

Linked external resources are independent of TUCCA and Tufts University and remain under their own licenses.