Release Notes for the new Talapas2 (2024)
The HPC cluster has been updated, from Talapas to Talapas2
Pronounced tah-lah-paas
Newer hardware, operating system, and infrastructure.
Notable updates
Operating system - Red Hat Enterprise Linux 8 (RHEL8)
Kernel - 4.18
Processors - 3rd generation Intel (Ice Lake) and AMD (Milan)
GPUs - Nvidia Ampere A100s
Memory - DDR4 3200MT/s and Intel Optane memory in the
memory
partitionsStorage - 250GB home directories
Login
Duckids
Talapas uses UO Identity Access Management system, Microsoft Active Directory, for authentication which requires all users to have a valid UO Duckid.
Links are provided below for external collaborators, graduating researchers, or automation accounts to continue their access to the cluster.
External collaborators (2 options):
Graduating researchers:
Automation Accounts (Role Accounts)
Talapas VPN
A virtual private network (VPN) connection is recommended to access the cluster. This adds an extra layer of security.
Instructions here: Article - Getting Started with UO VPN (uoregon.edu)
We have a Talapas profile in UO VPN which should provide all the same capabilities as UO VPN as well as adding access to Talapas.
Use “uovpn.uoregon.edu/talapas
" as the connection URL and your duckid and password.
Do not repeatedly attempt to log in when you’re getting error messages. As with other uses of your DuckID at UO, if you generate a large number of login failures, all DuckID access (including things like e-mail) will be locked University-wide. Similarly, be aware of automated processes like cron jobs that might trigger this situation without your notice.
Once you’re on the VPN, you can access any one of the cluster login nodes:
login1.talapas.uoregon.edu
login2.talapas.uoregon.edu
login3.talapas.uoregon.edu
login4.talapas.uoregon.edu
Load balancer
If you can’t use the OU VPN, you can also connect to the login load balancer at login.talapas.uoregon.edu
A load balancer is used to redirect SSH connections to different login nodes to spread the load. The load balancers choice of login node is “sticky” in that repeated connections from your IP address will go to the same login node - as long as there has been some activity within the last 24 hours.
Slurm
List of shared partitions
compute
computelong
gpu
gpulong
interactive
interactivegpu
memory
memorylong
Job control
A slurm account is still required for each job, use
--account=<your-PIRG>
There is no default partition, you must a specify partition(s) with
--partition
The default memory per CPU is 4GB; use the
--mem=<size>
or--mem-per-cpu
or--mem-per-gpu
to adjust as needed
Slurm features
Each node in the cluster has at a minimum: processor make, generation, and model Slurm feature tags. For example,
n0173 amd,milan,7713
Nodes with GPUs include Slurm feature tags with GPU model and GPU memory size. For example,
n0172 amd,milan,7413,a100,gpu-40gb
Nodes with large memory include Slurm feature tags with memory size. For example,
Request a node based on processor
Nodes with AMD and Intel processors are available on Talapas2.
Constrain a job to allocate a node with legacy Intel broadwell processor,
Request a node based on GPU feature
Nodes with 10GB, 40GB, 80GB GPU memory are available on Talapas2.
Constrain a job to allocate a node with 10GB of GPU memory,
CUDA A100 MIG slicing
Due to limitations with CUDA MIG slicing, it appears that a job can only use one slice (GPU) per host. That means one GPU per job unless MPI is being used to orchestrate GPU usage on multiple hosts. See NVIDIA Multi-Instance GPU User Guide :: NVIDIA Tesla Documentation. On nodes which have 80GB GPUs MIG mode is not enabled. Request these nodes using, --constraint=gpu-80gb,no-mig
Request node based on memory feature
Nodes with 1TB, 2TB, 4TB memory are available on Talapas2.
Constrain a job to allocate a node with 1TB of memory,
For the complete list of features run,
Note: All features are not available in every partition. You may need to submit your job to the preempt
partition in order to have access to the desired features, i.e. --constraint=h100
Processor architectures
Talapas2 is comprised of nodes from multiple separate purchases over the course of several years. Therefore, it has several generations of processors from multiple (Intel and AMD) vendors.
Here is the current architecture layout (this is subject to change):
Storage
See the Directory Structure document.
Software
Some existing software will run fine on the new cluster.
But, with the operating system update to RHEL8 there will likely be cases where software requires rebuilding.
Generally, issues would be due to differences with the new shared libraries in RHEL8. If you compile software in a way that specifically assumes one architecture (i.e. Intel IceLake) it might not run on all nodes.
LMOD
We’ll provide all software centrally available, including conda environments, through LMOD.
Conda
Talapas2 uses miniconda3
and new conda environments will be built with this base environment. If you have personal conda environments, you might need/want to recreate them using miniconda3
. Note that using existing Conda environments should work fine - it’s making changes that might cause problems.
Note: miniconda-t2 is being deprecated in favor of miniconda3 which include the libmamba solver
To use the libmamba solver include --solver=libmamba
in your conda create
or conda env create
command line.
Spack
Talapas2 uses spack-rhel8
and software provided centrally by this platform will built using this instance with gcc 13.1.0 on Intel Broadwell nodes.
Open OnDemand
Updated Open OnDemand is on Talapas2. Use Google Chrome or Firefox and navigate to,
https://ondemand.talapas.uoregon.edu/
Use your DuckID to log in.
Globus
Recently deployed Talapas2 Globus endpoint, University of Oregon - Talapas2 Overview | Globus
Technical Differences
These probably won’t affect you, but they are visible differences that you might notice.
Talapas2 domain name is
talapas.uoregon.edu
Hostnames now use the long form,
login1.talapas.uoregon.edu
Use the long form of hostnames to access other campus hosts,
some-other-host.uoregon.edu
.Linux User IDs (UID) centrally managed in Active Directory (AD)
Linux groups IDs (GID) are centrally managed in Active Directory (AD). And the group names are longer, for example,
is.racs.pirg.racs
instead of justracs