GitLab

Update: Jupyter Lab on Compute Node Using Python/miniforge-25.3.0: Setup and Connection Documentation

Key Takeaway: This guide provides a complete SLURM job script, detailed connection instructions, session management procedures, alternative ThinLinc access, and reconnection steps for running a Jupyter Lab session on an RCC compute node using the python/miniforge-25.3.0 module.

Below is a comprehensive documentation that covers all steps from job submission to session termination as well as troubleshooting pointers for potential installation or disk-related issues.

1. Complete SLURM Job Script for Jupyter Lab

Place the following script in your SLURM submission file. This job script assigns a random port, loads the specific Python module (python/miniforge-25.3.0), activates your conda environment, and starts Jupyter Lab (instead of Notebook).

#!/bin/bash
#SBATCH --job-name=jupyter_lab
#SBATCH --time=05:00:00
#SBATCH --output=jupyter_lab_%j.txt
#SBATCH --error=jupyter_lab_%j.err
#SBATCH --account=pi-<group>
#SBATCH --mem=16gb

# Assign a random port between 8000 and 9000
PORT_NUM=$(shuf -i8000-9000 -n1)
node=$(hostname -s)
user=$(whoami)
cluster="midway3"

# Display SSH tunnel creation instructions to STDERR
cat 1>&2 <<END
1. Create an SSH tunnel from your workstation with the following command:

   ssh -N -f -L $PORT_NUM:${node}:$PORT_NUM ${user}@${cluster}.rcc.uchicago.edu

   Then, open your browser at http://localhost:$PORT_NUM

2. When finished with Jupyter Lab, terminate the session by:
   a. Exiting the Lab interface.
   b. Issuing the command: scancel -f ${SLURM_JOB_ID} from the login node.
END

# Load the Miniforge Python module
module load python/miniforge-25.3.0

# Activate your specific environment (replace 'vnev' with your environment name)
source activate vnev

# Start Jupyter Lab without launching a browser; specify the working directory as needed
jupyter lab --no-browser --ip=${node} --port=$PORT_NUM --notebook-dir="path_to_working_directory"

printf 'Jupyter Lab session exited' 1>&2

Note: Replace <group>, vnev, and "path_to_working_directory" with your actual account group, conda environment name, and your desired working directory respectively .

2. Step-by-Step Connection Instructions

Starting a New Job

Once the SLURM job starts, a file named jupyter_lab_<jobnumber>.err is created in your job directory on the compute node.
This file logs the node name and the port chosen (e.g., port 8631 on node midway3-0184).

Creating the SSH Tunnel

From Your Local Machine:

Open a terminal and run the following command (adjust the port, node, user, and cluster as printed in the error file):
```
ssh -N -f -L <PORT_NUM>:<node>:<PORT_NUM> <user>@midway3.rcc.uchicago.edu
```
For example:
```
ssh -N -f -L 8631:midway3-0184:8631 pnsinha@midway3.rcc.uchicago.edu
```
Then open your browser and navigate to:

http://127.0.0.1:8631 or http://localhost:8631

This forwards the compute node port to your local machine, giving you access to Jupyter Lab from your browser .

Alternative Access via ThinLinc

If you prefer not to use SSH tunnels, you can use the ThinLinc client:
1. Open Firefox (or your preferred browser) within the ThinLinc environment.
2. Navigate to the server address using the compute node’s hostname and the port number, for example:
```
http://midway3-0184:8631/?token=<your_token_here>
```
This method bypasses the need for local SSH tunneling and is especially useful if firewall or network restrictions apply.

3. Session Management Procedures

Managing the Live Session

Monitoring: When the session is running, the SSH tunnel command terminal will “freeze” and run in the background as long as the job is active.
Termination: To safely terminate your Jupyter Lab session:
- Exit Jupyter Lab from your browser.
- Then, login to the RCC login node and execute:
```
scancel -f ${SLURM_JOB_ID}
```
This stops the job on the compute node and shuts down Jupyter Lab gracefully.

4. Reconnecting to an Existing Session

If you need to reconnect to a Jupyter Lab session that is still running, follow these steps:

Identify the Running Job:
- On the RCC login node, run:
```
squeue -u $USER
```
- Note the compute node (e.g., midway3-0007) where Jupyter Lab is running.
Retrieve the Session Details:
- SSH to that compute node and run:
```
jupyter lab list
```
- The command will output the URLs and tokens of the running session.
Recreate the SSH Tunnel:
- Use the provided host and port parameters to recreate the SSH tunnel:
```
ssh -N -f -L <PORT_NUM>:<node>:<PORT_NUM> <user>@midway3.rcc.uchicago.edu
```
- For example, if the running session is on port 8888 on node midway3-0007:
```
ssh -N -f -L 8888:midway3-0007:8888 pnsinha@midway3.rcc.uchicago.edu
```
Connect:
- Open your browser and head to:
```
http://localhost:8888/?token=<your_token_here>
```
- Alternatively, copy and paste the complete URL from the output of the jupyter lab list command.

5. Troubleshooting and Installation-Related Considerations

Important: Be aware of potential installation or disk-related issues:

Library/Missing File Issues:
If your lab session fails to start due to missing libraries, ensure that your conda environment has all required dependencies installed. Verify that the python/miniforge-25.3.0 module is correctly loaded and that your environment has been updated.

Disk Errors:
If you experience errors writing files (as observed in some reported node issues), double-check that the path_to_working_directory has proper read/write permissions and that you are not running on nodes affected by known storage issues.

Node-Specific Problems:
Should your job unexpectedly terminate or if files aren’t being written, use the detailed error logs (e.g., jupyter_lab_.err) to diagnose potential disk quota issues or misconfigurations .

By following these detailed instructions, you can seamlessly launch, connect, manage, and reconnect to a Jupyter Lab session on RCC compute nodes using the Python/miniforge-25.3.0 module.

Conclusion

This documentation provides a complete setup—from the SLURM job script to step-by-step connectivity and session management—ensuring that you can effectively run Jupyter Lab on RCC compute nodes. Use the instructions and troubleshooting tips provided to handle connection issues, disk errors, or missing dependencies, thereby improving the robustness of your compute sessions. Happy coding!