Skip to main content
warning

🚧 Coming Soon! 🚧

Our website is currently under construction and will be launching soon. Stay tuned!

In the meantime, feel free to contact us with any questions.

Thank you for your patience

Workflow Installation for New Users

tip

🔄 Setting Up a Fresh Project

If you've already run the tucca-rna-seq workflow but want to start a new project or analysis without affecting your previous work—which is crucial for maintaining reproducibility—check out our guide on setting up a fresh project:

1. Login to the Tufts HPC​

You can SSH into the Tufts HPC in the command-line of your preferred terminal or you can access the cluster through Tufts' OnDemand service. You will need a Tufts Cluster Account to be able to log in to the Tufts HPC.

warning

The process of requesting and being approved for a Tufts HPC account can take multiple business days.

Requirements

  • You must complete an online account request form and be approved to use the Tufts Research Cluster.
  • Account requests require a valid Tufts Username and Tufts Password
  • Guest and student accounts require faculty or researcher sponsorship

SSH, or Secure Shell, is a protocol that allows you to securely connect to remote servers over a network. An HPC, or High-Performance Computing system, provides powerful computational resources for executing complex simulations and data analyses. Using SSH to access the HPC ensures that your data and commands are transmitted securely. The Tufts HPC enables researchers and students to perform large-scale computations efficiently.

info

If you are not familiar with accessing the Tufts HPC check out these fantastic resources to learn more:

Box folder with many helpful introductory PDFs and slideshows that outline the ins-and-outs of the Tufts HPC.

The most recent version of the Intro to Tufts HPC workshop and New User Guide PAX.pdf are highly recommend.

tip

For this workflow, we highly recommend SSHing into the cluster via VSCode as it provides a convenient user experience.

Visual Studio Code (VSCode) is a free, open-source code editor developed by Microsoft. It's widely used for software development due to its versatility, rich feature set, and extensive extension ecosystem. In the context of interacting with the Tufts High-Performance Computing (HPC) resources, VSCode offers several advantages:

  • Remote Development: VSCode can seamlessly connect to remote servers, allowing you to edit files, run code, and manage projects directly on the HPC without leaving your local environment.

  • Integrated Terminal: Access the HPC terminal within VSCode, enabling efficient command-line operations alongside your coding activities.

  • Extensions and Customization: Enhance your development experience with extensions tailored for specific programming languages, debugging tools, and workflow optimizations.

Additionally, you can connect via the OnDemand service or your preferred terminal.


2. Get off of a login node​

Once logged into the Tufts HPC run the following to get off a login node and onto a compute node so that you can start performing computations.

After login you are on a Login Node of the cluster (login-prod-0x) and in your Home Directory (~ or /cluster/home/your_utln). Your terminal should look like this:

$ [your_utln@login-prod-02 ~]

To get off a login node run:

srun -p interactive --pty bash

This will allocate a compute node with bash shell, 1 CPU core (default), 4 hours (interactive partition default and maxmimum value), and 2GB of CPU memory (default). If desired, modify with number of CPU cores -n , memory --mem= , an alternative partition -p , and/or runtime --time= .

Depending on the current usage of the Tufts HPC, you may need to wait in a queue to be allocated enough resources. This could take some time depending on the current usage of the cluster.


3. Clone the repository​

As explained on GitHub's Docs...

Cloning a repository pulls down a full copy of all the repository data that GitHub.com has at that point in time, including all versions of every file and folder for the project. You can push your changes to the remote repository on GitHub.com, or pull other people's changes from GitHub.com. For more information, see Using Git.

Navigate to the desired parent directory (aka folder) on the Tufts HPC that you would like to clone (aka copy) the template workflow repository into. Then clone the repository with the following command:

git clone https://github.com/benjibromberg/tucca-rna-seq.git

After cloning the repo, you can change the name of the cloned directory to suit your project. The default name will always be tucca-rna-seq.

tip

For Kaplan Lab users, it is recommended to clone the repository into your personal directory in the kaplanlab workspace. After cloning the repo, you can change the name of the cloned directory to suit your project. Refer to the Tufts HPC resources linked above if this does not make sense.


4. Dependencies installation​

For improved reproducibility and reusability of the workflow, each individual step of the workflow runs either in its own Conda virtual environment. As a consequence, running this workflow has very few individual dependencies.

You will need to add the bioconda channel to your base conda environment which loads everytime you load a conda manager like miniforge. If this is problematic for some reason, you will need to load another conda environment that includes the conda-forge, bioconda, and default channels before loading the snakemake conda environment which will require manually modifying the run.sh runner script.

To add the bioconda channel to your base conda environment run:

module load miniforge/24.7.1-py312
conda activate base
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority strict
conda config --show channels

The result of the final command should match the following:

channels:
- conda-forge
- bioconda
- defaults

Then run the following to create the snakemake conda environment:

conda env create -f tucca-rna-seq/install/snakemake.yaml