Jupyter Notebook on SGE Cluster with Reverse Tunneling
The current HPC setup we didn’t have direct access to the child nodes, the only way to connect to any webserver applications running in SGE queue is through reverse tunneling via the head/parent node. This is a guide to access the Jupyter notebook server running in a SGE cluster.
There are two primary methods to submit a job in SGE cluster, the first one being qsub
and the interactive method is qrsh
. If he analysis process is going to take more than a couple of hours it’s better to submit the job with qsub
. Also, before starting the notebook server setup password based authentication to avoid unauthorized access.
Notebook in SGE queue
The queue submission process is similar like and can be done with a standard SGE script. Following is an example of SGE script for notebook server. Using the -q option an appropriate queue type/target node (queue_name@node_address) can be specified.
#!/bin/bash
#$ -N notebook
#$ -cwd
#$ -q <queue_name>
#$ -e $JOB_ID_$JOB_NAME.err
#$ -o $JOB_ID_$JOB_NAME.out
#activate conda env for notebook
source <path_to_conda>/miniconda3/etc/profile.d/conda.sh
conda activate <notebook_env_name>
#start the notebook server
jupyter notebook --ip '*' --no-browser --port 8898 \
--notebook-dir /some/path
In the above example the notebook server binds with all the network interfaces in the node. To know which node is running the job we can use qstat
command or check the specific job id .err
file, this will be required to connect to the notebook server.
#Example
[I 01:17:11.023 NotebookApp] Jupyter Notebook 6.5.2 is running at:
[I 01:17:11.023 NotebookApp] http://compute-0-5.local:8898/
Notebook in an interactive session
To run a Jupyter notebook server interactive session we will first start by activating the conda/python environment and use the qrsh
command.
# Example
qrsh -cwd -q <queue_name> -N test -V "jupyter notebook --ip '*' --no-browser --port 8898"
The node name running the server will be available in the stdout stream.
Connect with the server
As the notebook server is only accessible to the HPC’s intranet a SSH tunnel via the parent node is required to be established.
ssh -N -L 8898:compute-0-5.local:8898 <user_name>@<parent_node_ip>
Once the connection has been established, the notebook server will be available in http://localhost:8898
. The overall process is a bit cumbersome and to best of my knowledge no better solution to circumvent this issue is available yet.
Update Feb, 2023
Vince Buffalo recently shared a tool to manage remote Jupyter sessions. The tool is very useful if you are running multiple Jupyter instances and can’t expose network ports.
Enjoy Reading This Article?
Here are some more articles you might like to read next: