Tufts HPC Best Practices
This guide provides advanced tips and best practices for using the tucca-rna-seq
workflow on the Tufts HPC cluster efficiently and effectively.
Storage Management
Section titled “Storage Management”Directory Structure
Section titled “Directory Structure”Recommended organization:
/cluster/tufts/your_lab/your_utln/├── projects/│ ├── project_1/│ │ ├── tucca-rna-seq/│ │ ├── data/│ │ └── results/│ └── project_2/└── shared_resources/ ├── reference_genomes/ └── software_cache/Storage Quotas
Section titled “Storage Quotas”- Home directory (
/cluster/home/your_utln): 30 GB, fixed — too small to run from - Lab workspace (
/cluster/tufts/your_lab/your_utln): from 50 GB up; request more as needed (usage over 10 TB becomes chargeable starting July 1, 2026) - Monitor usage:
module load hpctools && hpctools(project-storage and directory-usage options) - Archive old projects to free up space
Quotas and policies are set by Tufts, not this workflow — always confirm the current numbers in the storage guide below before relying on them.
TTS Research Technology Lab Research Project Storage GuidePersistent Software Caching
Section titled “Persistent Software Caching”To avoid filling your home directory and to save time by reusing software
environments across multiple projects, it is highly recommended to set up a
permanent, shared cache. You can do this by defining environment variables in
your shell’s startup file (e.g., ~/.bashrc).
This is also a good place to add the module load commands you use frequently.
# Add to your ~/.bashrc
# Load required softwaremodule load snakemake/8.27.1module load singularity/3.8.4module load miniforge/24.11.2-py312
# Set permanent cache directories for Snakemakeexport SNAKEMAKE_CONDA_PREFIX="/cluster/tufts/your_lab/shared/conda_envs"export SNAKEMAKE_APPTAINER_PREFIX="/cluster/tufts/your_lab/shared/apptainer_images"Creating Private Modules for Custom Software
Section titled “Creating Private Modules for Custom Software”For any software you install manually (like a specific version of Snakemake via
Mambaforge), creating a private environment module is an excellent way to
integrate it seamlessly into your HPC workflow. This makes loading your custom
tools as easy as loading system-wide modules (e.g., module load my-snakemake).
The process involves creating a simple text file that defines the necessary
environment variables (like PATH) for your software and placing it in a
specific directory (~/privatemodules on the Tufts cluster).
Global Conda Configuration (.condarc)
Section titled “Global Conda Configuration (.condarc)”The Snakemake-specific environment variables described in the previous section are an excellent way to control software caching per-workflow.
As a more general alternative, you can configure Conda’s default settings
globally using a .condarc file in your home directory. This tells Conda where
to store package caches and environments for all uses, not just for this
Snakemake workflow. This is a powerful, “set-it-and-forget-it” solution that
is a common best practice on many HPC systems.
To configure this, you add the desired paths to a .condarc file:
envs_dirs: - /path/to/your/shared/storage/conda_envs/pkgs_dirs: - /path/to/your/shared/storage/conda_pkgs/Data Management
Section titled “Data Management”Use symbolic links for large data:
# Link reference genomes to avoid duplicationln -s /cluster/tufts/kaplanlab/shared/reference_genomes/human_GRCh38/ \\ resources/reference_genomes/human_GRCh38Compress old results:
# Archive completed projectstar -czf project_archive_$(date +%Y%m%d).tar.gz project_directory/Resource Allocation
Section titled “Resource Allocation”Interactive Sessions
Section titled “Interactive Sessions”Basic interactive session:
srun -p batch --pty bashCustomized interactive session:
srun -p batch -n 4 --mem=8G --time=2:00:00 --pty bashParameters:
-n: Number of CPU cores--mem: Memory allocation--time: Maximum runtime (thebatch,gpu, andpreemptpartitions each cap at 2 days)-p: Partition —batch(CPU, default),gpu, orpreempt
Workflow Execution Strategy
Section titled “Workflow Execution Strategy”For running Snakemake on the Tufts HPC, we recommend launching the workflow from within a long-running interactive session, rather than submitting the entire workflow as a single batch job.
The main Snakemake process is not computationally intensive; its primary role is to manage and submit the actual resource-intensive tasks to the cluster scheduler. Running it interactively makes it much easier to monitor progress, view logs in real-time, and debug any issues that arise.
For detailed instructions on how to execute the workflow, please refer to the main Running Guide. The sections below provide supplementary, Tufts-specific best practices.
Recommended: Interactive Session
Section titled “Recommended: Interactive Session”-
Start a long-running
screenortmuxsession. A terminal multiplexer is essential for preventing your workflow from being terminated if your local connection to the HPC is interrupted.Terminal window # Start a screen sessionscreen -S snakemake_session# To detach: Ctrl+A, then D# To reattach later: screen -r snakemake_session -
Request an interactive node. From within your
screensession, request an interactive node with enough time for your workflow to complete. Thebatchpartition caps at 2 days, so request up to that limit.Terminal window # Request a 2-day interactive session (the partition maximum)srun -p batch --time=2-00:00:00 --pty bash -
Prepare your environment and run the workflow. Once in the interactive session, navigate to your project directory, load the necessary modules, and launch Snakemake.
Terminal window # Navigate to your projectcd /cluster/tufts/your_lab/your_utln/my_rnaseq_project/tucca-rna-seq# Load modulesmodule purgemodule load snakemake/8.27.1 singularity/3.8.4 miniforge/24.11.2-py312# Launch the workflowsnakemake all --workflow-profile profiles/slurm
Alternative: Batch Submission
Section titled “Alternative: Batch Submission”If you cannot maintain a persistent interactive session, you can submit the workflow as a “fire-and-forget” batch job. You will be notified by email when the job completes or fails, but you lose the ability to monitor the workflow in real-time easily.
Example submission script:
# Create submission scriptcat > submit_workflow.sh << 'EOF'#!/bin/bash#SBATCH -p batch#SBATCH --time=2-00:00:00#SBATCH --mem=32G#SBATCH --cpus-per-task=12#SBATCH --mail-type=ALL#SBATCH --mail-user=your.email@tufts.edu
module purgemodule load snakemake/8.27.1 singularity/3.8.4 miniforge/24.11.2-py312
snakemake all --workflow-profile profiles/slurmEOF
# Submit the jobsbatch submit_workflow.shPerformance Optimization
Section titled “Performance Optimization”Module Management
Section titled “Module Management”Always purge modules first:
module purgemodule load snakemake/8.27.1module load singularity/3.8.4module load miniforge/24.11.2-py312Check module conflicts:
module listmodule show snakemake/8.27.1Workflow Optimization
Section titled “Workflow Optimization”Tune workflow resources:
The slurm profile allows you to specify default and rule-specific resources.
For a complete guide on how this works, see the main
Configuration Guide. Below is a
Tufts-specific example.
default-resources: slurm_partition: "batch" # slurm_account: omit unless your lab uses Slurm accounts runtime: 1440 # 24 h (batch caps at 2 days = 2880 min) mem_mb: 32000 # 32GB RAM cpus_per_task: 12 # 12 CPU cores
# Override for specific rules (keep runtime <= 2880; the partition cap is 2 days)set-resources: star_index: mem_mb: 64000 # 64GB RAM for genome indexing runtime: 2880 # 2 days (partition maximum) salmon_quant: cpus_per_task: 8 # 8 cores for quantificationParallel execution:
# Limit concurrent jobssnakemake all --workflow-profile profiles/slurm --jobs 50
# Use all available cores locallysnakemake all --use-conda --cores allMonitoring and Debugging
Section titled “Monitoring and Debugging”Job and resource monitoring on Tufts HPC is standard Slurm plus a Tufts helper — this workflow doesn’t change any of it, so use the cluster’s own tooling:
- Your queued/running jobs:
squeue -u $USER - Why a finished job over/under-ran its resources:
seff <job_id> - Free CPU/GPU/memory, project quota, directory usage:
module load hpctools && hpctools(its menu is the canonical Tufts answer for “what’s free / how much have I used”) - Workflow progress: tail the Snakemake log under
.snakemake/log/, and re-run with--show-failed-logsto surface a failed rule’s job log
For the full Slurm monitoring reference (squeue, sacct, seff, utilization),
see the Tufts SLURM monitoring guide.
Troubleshooting
Section titled “Troubleshooting”Common Issues
Section titled “Common Issues”| Issue | Cause | Solution |
|-------|-------|----------|
| Jobs stuck in queue | Resource limits too high | Reduce memory/CPU requirements |
| Module not found | Module not available | Check module avail |
| Permission denied | Wrong directory | Use lab workspace, not home |
| Quota exceeded | Storage limit reached | Archive old projects |
| Job killed | Time limit exceeded | Increase --time (max 2 days per partition); split long rules or use preempt |
Getting Help
Section titled “Getting Help”TTS Research Technology:
- Email: tts-research@tufts.edu
- HPC Guides: https://rtguides.it.tufts.edu/hpc/
Workflow-specific issues:
- GitHub Issues: Report problems
- Team Contact: benjamin.bromberg@tufts.edu
Next Steps
Section titled “Next Steps”After mastering these best practices:
- Optimize your workflow for your specific analysis needs
- Share resources with lab members when possible
- Contribute to the workflow development
- Help others in your lab get started
These best practices will help you use the Tufts HPC cluster efficiently and avoid common pitfalls. Remember to always check the main documentation first!
For complete workflow documentation, see the main guides.
Linked external resources are independent of TUCCA and Tufts University and remain under their own licenses.