Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Due to limitations with CUDA MIG slicing, it appears that a job can only use one slice (GPU) per host. That means one per job, unless MPI is being used to orchestrate GPU usage on multiple hosts. See NVIDIA Multi-Instance GPU User Guide :: NVIDIA Tesla Documentation.

RHEL 8 libcrypto botch vs miniconda

Red Hat added a patch to their libcrypto libraries that collides with miniconda. See SSL library conflicts on CentOS 8 · Issue #10241 · conda/conda (github.com).

So, for example, you might see things like this:

Code Block
$ module load miniconda
$ emacs
emacs: symbol lookup error: /lib64/libk5crypto.so.3: undefined symbol: EVP_KDF_ctrl, version OPENSSL_1_1_1b
$ ssh localhost
ssh: symbol lookup error: /lib64/libk5crypto.so.3: undefined symbol: EVP_KDF_ctrl, version OPENSSL_1_1_1b
$ curl https://www.google.com
curl: symbol lookup error: /lib64/libk5crypto.so.3: undefined symbol: EVP_KDF_ctrl, version OPENSSL_1_1_1b

Not all distribution commands will fail, but quite a few do. For now, the workaround is to either only load miniconda for the commands within that you need, or to unload it before running a command that exhibits the bug. For example, something like this:

Code Block
curl yada
(module load miniconda && conda activate myfavoriteenv && mycommandinthatenv someargs)
curl yada

or something like this:

Code Block
module load miniconda
conda activate myfavoriteenv
(module purge && curl yada)
mycommandinthatenv someargs
(module purge && curl yada)

Obviously, both are pretty awful. We’ll look for a proper fix, but it might be a while.

Technical Differences

These probably won’t affect you, but they are visible differences that you might notice.

...