...
We will start by launching and iterative GPU session from one of the talapas login nodes
Code Block language bash $ srun --account=<your account> --pty --gres=gpu:1 --mem=4G --time=60 --partition=gputestgpu bash
- Wait for your interactive session to start
Load the modules for tensorflow
Code Block $ module load cuda/9.0 $ module load python3
Check what GPU resources are available
Code Block $ nvidia-smi +-----------------------------------------------------------------------------+ | NVIDIA-SMI 390.46 Driver Version: 390.46 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K80 Off | 00000000:04:00.0 Off | Off | | N/A 35C P0 60W / 149W | 97MiB / 12206MiB | 0% Default | +-------------------------------+----------------------+----------------------+
Shows that we have successfully reserved 1 Tesla K80
Launch python 3 (Note: the python command will give you the default python2 version on the system, use python3)
Code Block language bash $ python3 >>from tensorflow.python.client import device_lib >>print(device_lib.list_local_devices()) 2018-09-19 11:11:34.399858: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2018-09-19 11:11:34.524069: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235 pciBusID: 0000:04:00.0 totalMemory: 11.92GiB freeMemory: 11.75GiB 2018-09-19 11:11:34.524110: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0 2018-09-19 11:11:34.796876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-09-19 11:11:34.796918: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0 2018-09-19 11:11:34.796926: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N 2018-09-19 11:11:34.797220: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/device:GPU:0 with 11399 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:04:00.0, compute capability: 3.7) [name: "/device:CPU:0" device_type: "CPU" memory_limit: 268435456 locality { } incarnation: 13411014324454836610 , name: "/device:GPU:0" device_type: "GPU" memory_limit: 11953517364 locality { bus_id: 1 links { } } incarnation: 133570401343557472 physical_device_desc: "device: 0, name: Tesla K80, pci bus id: 0000:04:00.0, compute capability: 3.7" ]
this also confirms you are correctly using the GPU version of tensorflow and have access to one cpu and one Tesla k-80 CTRL+D to exit
Lets try to fit a simple model. Copy the following text into a file called my_test.py using your favorite text editor (for example emacs)
Code Block language py title my_test.py # import tensorflow as tf (x_train,y_train),(x_test,y_test)=tf.keras.datasets.mnist.load_data() #Note this will download the mnist dataset to ~/.kears/datasets the first time you run it #Lets create a 2 hidden layer neural network input=tf.keras.layers.Input(shape=(28,28)) network=tf.keras.layers.Flatten()(input) network=tf.keras.layers.Dense(10)(network) network=tf.keras.layers.LeakyReLU()(network) network=tf.keras.layers.Dropout(0.2)(network) network=tf.keras.layers.Dense(10)(network) network=tf.keras.layers.LeakyReLU()(network) output=tf.keras.layers.Dense(10,activation='softmax')(network) my_model=tf.keras.models.Model(input,output) my_model.summary() my_model.compile(loss='sparse_categorical_crossentropy',optimizer='adam',metrics=['acc']) my_model.fit(x_train,y_train,validation_data=(x_test,y_test),epochs=10)
Info This model is training to identify a hand written number from a 28x28 pixel image. We use two fully connected layers, and dropout (to prevent overfitting).
Example Data: https://en.wikipedia.org/wiki/MNIST_database
Lets run it
Code Block language bash $ python3 my_test.py (60000, 28, 28) _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) (None, 28, 28) 0 _________________________________________________________________ flatten_1 (Flatten) (None, 784) 0 _________________________________________________________________ dense_1 (Dense) (None, 10) 7850 _________________________________________________________________ leaky_re_lu_1 (LeakyReLU) (None, 10) 0 _________________________________________________________________ dropout_1 (Dropout) (None, 10) 0 _________________________________________________________________ dense_2 (Dense) (None, 10) 110 _________________________________________________________________ leaky_re_lu_2 (LeakyReLU) (None, 10) 0 _________________________________________________________________ dense_3 (Dense) (None, 10) 110 ================================================================= Total params: 8,070 Trainable params: 8,070 Non-trainable params: 0 _________________________________________________________________ Train on 60000 samples, validate on 10000 samples Epoch 1/10 2018-09-19 11:52:13.929548: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2018-09-19 11:52:14.058190: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235 pciBusID: 0000:04:00.0 totalMemory: 11.92GiB freeMemory: 11.75GiB 2018-09-19 11:52:14.058229: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0 2018-09-19 11:52:14.335734: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-09-19 11:52:14.335778: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0 2018-09-19 11:52:14.335786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N 2018-09-19 11:52:14.336087: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11399 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:04:00.0, compute capability: 3.7) 60000/60000 [==============================] - 7s 123us/step - loss: 4.1213 - acc: 0.6669 - val_loss: 1.4595 - val_acc: 0.8431 Epoch 2/10 60000/60000 [==============================] - 7s 109us/step - loss: 1.3066 - acc: 0.7841 - val_loss: 0.4579 - val_acc: 0.8820 Epoch 3/10 60000/60000 [==============================] - 7s 109us/step - loss: 0.6586 - acc: 0.8106 - val_loss: 0.3734 - val_acc: 0.8948 Epoch 4/10 60000/60000 [==============================] - 6s 108us/step - loss: 0.5859 - acc: 0.8254 - val_loss: 0.3708 - val_acc: 0.8942 Epoch 5/10 60000/60000 [==============================] - 6s 107us/step - loss: 0.5356 - acc: 0.8412 - val_loss: 0.3513 - val_acc: 0.9021 Epoch 6/10 60000/60000 [==============================] - 6s 107us/step - loss: 0.5148 - acc: 0.8457 - val_loss: 0.3542 - val_acc: 0.9043 Epoch 7/10 60000/60000 [==============================] - 6s 107us/step - loss: 0.5036 - acc: 0.8484 - val_loss: 0.3397 - val_acc: 0.9064 Epoch 8/10 60000/60000 [==============================] - 6s 107us/step - loss: 0.4928 - acc: 0.8533 - val_loss: 0.3207 - val_acc: 0.9140 Epoch 9/10 60000/60000 [==============================] - 6s 107us/step - loss: 0.4837 - acc: 0.8549 - val_loss: 0.3309 - val_acc: 0.9085 Epoch 10/10 60000/60000 [==============================] - 6s 107us/step - loss: 0.4819 - acc: 0.8560 - val_loss: 0.3224 - val_acc: 0.9127s
Congratulations you've trained your first network on Talapas! Let's try one more time in batch mode.
- Use CRTL+D to close your interactive session
Write a new submit script to run the same training code, by putting the following into a file named submit_gpu_test
Code Block language bash title submit_gpu_test #!/bin/bash #SBATCH --job-name=GPUMnistTest ### Job Name #SBATCH --partition=gpu ### Quality of Service (like a queue in PBS) #SBATCH --time=0-01:00:00 ### Wall clock time limit in Days-HH:MM:SS #SBATCH --nodes=1 ### Node count required for the job #SBATCH --ntasks-per-node=1 ### Nuber of tasks to be launched per Node #SBATCH --gres=gpu:1 ### General REServation of gpu:number of gpus #SBATCH --account=<your account> module load cuda/9.0 module load python3 python3 my_test.py > my_test_output
Submit this job
Code Block language bash sbatch submit_gpu_test
That's it wait for your job to finish, and you'll see the training log in ~\my_test_output
...