🚧 Coming Soon! 🚧
Our website is currently under construction and will be launching soon. Stay tuned!
In the meantime, feel free to contact us with any questions.
Thank you for your patience
Workflow Installation for New Users
- Tufts HPC Users
- External Users
1. Login to the Tufts HPC​
You can SSH into the Tufts HPC in the command-line of your preferred terminal or you can access the cluster through Tufts' OnDemand service. You will need a Tufts Cluster Account to be able to log in to the Tufts HPC.
The process of requesting and being approved for a Tufts HPC account can take multiple business days.
Requirements
- You must complete an online account request form and be approved to use the Tufts Research Cluster.
- Account requests require a valid Tufts Username and Tufts Password
- Guest and student accounts require faculty or researcher sponsorship
SSH, or Secure Shell, is a protocol that allows you to securely connect to remote servers over a network. An HPC, or High-Performance Computing system, provides powerful computational resources for executing complex simulations and data analyses. Using SSH to access the HPC ensures that your data and commands are transmitted securely. The Tufts HPC enables researchers and students to perform large-scale computations efficiently.
If you are not familiar with accessing the Tufts HPC check out these fantastic resources to learn more:
- Tufts HPC Home Page
- Tufts HPC's Welcome Page
- Tufts HPC's New User Support
For this workflow, we highly recommend SSHing into the cluster via VSCode as it provides a convenient user experience.
Visual Studio Code (VSCode) is a free, open-source code editor developed by Microsoft. It's widely used for software development due to its versatility, rich feature set, and extensive extension ecosystem. In the context of interacting with the Tufts High-Performance Computing (HPC) resources, VSCode offers several advantages:
-
Remote Development: VSCode can seamlessly connect to remote servers, allowing you to edit files, run code, and manage projects directly on the HPC without leaving your local environment.
-
Integrated Terminal: Access the HPC terminal within VSCode, enabling efficient command-line operations alongside your coding activities.
-
Extensions and Customization: Enhance your development experience with extensions tailored for specific programming languages, debugging tools, and workflow optimizations.
Additionally, you can connect via the OnDemand service or your preferred terminal.
2. Get off of a login node​
Once logged into the Tufts HPC run the following to get off a login node and onto a compute node so that you can start performing computations.
After login you are on a Login Node of the cluster (login-prod-0x
) and in
your Home Directory (~
or /cluster/home/your_utln
). Your terminal
should look like this:
$ [your_utln@login-prod-02 ~]
To get off a login node run:
srun -p interactive --pty bash
This will allocate a compute node with bash shell, 1 CPU core (default),
4 hours (interactive partition default and maxmimum value), and 2GB of CPU
memory (default). If desired, modify with number of CPU cores -n
, memory
--mem=
, an alternative partition -p
, and/or runtime --time=
.
Depending on the current usage of the Tufts HPC, you may need to wait in a queue to be allocated enough resources. This could take some time depending on the current usage of the cluster.
3. Clone the repository​
Cloning a repository pulls down a full copy of all the repository data that GitHub.com has at that point in time, including all versions of every file and folder for the project. You can push your changes to the remote repository on GitHub.com, or pull other people's changes from GitHub.com. For more information, see Using Git.
Navigate to the desired parent directory (aka folder) on the Tufts HPC that you would like to clone (aka copy) the template workflow repository into. Then clone the repository with the following command:
git clone https://github.com/benjibromberg/tucca-rna-seq.git
After cloning the repo, you can change the name of the cloned directory
to suit your project. The default name will always be tucca-rna-seq
.
For Kaplan Lab users, it is recommended to clone the repository into your
personal directory in the kaplanlab
workspace. After cloning the repo,
you can change the name of the cloned directory to suit your project. Refer
to the Tufts HPC resources linked above if this does not make sense.
4. Dependencies installation​
For improved reproducibility and reusability of the workflow, each individual step of the workflow runs either in its own Conda virtual environment. As a consequence, running this workflow has very few individual dependencies.
You will need to add the bioconda
channel to your base
conda environment
which loads everytime you load a conda manager like miniforge.
If this is problematic for some reason, you will need to load another conda
environment that includes the conda-forge
, bioconda
, and default
channels before loading the snakemake
conda environment which will
require manually modifying the run.sh
runner script.
To add the bioconda
channel to your base
conda environment run:
module load miniforge/24.7.1-py312
conda activate base
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority strict
conda config --show channels
The result of the final command should match the following:
channels:
- conda-forge
- bioconda
- defaults
Then run the following to create the snakemake
conda environment:
conda env create -f tucca-rna-seq/install/snakemake.yaml
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.