Crash Course on using SoC Compute Clusters
Published:
Need access to NUS computing resources but not sure how? Here’s a quick crash course!
Logging In
# VPN
ssh larrylaw@xgpc2.comp.nus.edu.sg
# Otherwise through sunfire
ssh larrylaw@sunfire.comp.nus.edu.sg
ssh xgpc2
- To skip tunneling, either use ssh tunneling or SoC VPN.
- Compute cluster hardware configuration here.
- Use RSA key to skip typing of password. Guide here
Transfering Data
# From tembusu cluster to local
## VPN
scp -r larrylaw@xgpc2.comp.nus.edu.sg:~/NM2/results/exp-e ./
## Otherwise through sunfire
scp -r larrylaw@sunfire.comp.nus.edu.sg:~/net_75 .
scp -r results/rs-obs/net_75/ larrylaw@sunfire.comp.nus
.edu.sg:~/
# From local to tembusu cluster
scp lab1.tar.gz larrylaw@xcne2.comp.nus.edu.sg:~/
Lazy to manually check cluster availability?
This bash script echos the availability of specified nodes.
#!/usr/bin/bash
echo "Checking all remote! /prays hard"
declare -a nodes=("xgpc" "xgpd" "xgpg")
rm output.txt
for ((i = 0; i < 10; i++)); do
for node in "${nodes[@]}"
do
node_idx="${node}${i}"
echo "$node_idx" >> output.txt
echo yes | ssh -o ConnectTimeout=10 "larrylaw@$node_idx.comp.nus.edu.sg" nvidia-smi | grep "MiB /" >> output.txt
done
done
echo "Go get em!"
Development
pyenv
for python version andpyvenv
for virtual environmenttmux
to keep process running after ending ssh session. Help here.nvidia-smi
to check GPU usage (before sending jobs)- Remote development on VSCode. Help here.
- Speed up computation (significantly)) by storing data and outputs in
/temp
. - View tensorboard on remote. Help here.
- Run on specific GPU via prepending
CUDA_VISIBLE_DEVICES=2,3 python xxx.py