Page Comparison

...

We will start by launching and iterative GPU session from one of the talapas login nodes
Code Block
language bash
$ srun --account=<your account> --pty --gres=gpu:1 --mem=4G --time=60 --partition=gputestgpu bash
Wait for your interactive session to start

Load the modules for tensorflow

Code Block
$ module load cuda/9.0 $ module load python3

Check what GPU resources are available

Code Block

$ nvidia-smi 
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.46                 Driver Version: 390.46                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:04:00.0 Off |                  Off |
| N/A   35C    P0    60W / 149W |     97MiB / 12206MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

Shows that we have successfully reserved 1 Tesla K80

Launch python 3 (Note: the python command will give you the default python2 version on the system, use python3)

Code Block

language	bash

$ python3
>>from tensorflow.python.client import device_lib
>>print(device_lib.list_local_devices())

2018-09-19 11:11:34.399858: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-09-19 11:11:34.524069: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:04:00.0
totalMemory: 11.92GiB freeMemory: 11.75GiB
2018-09-19 11:11:34.524110: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-09-19 11:11:34.796876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-19 11:11:34.796918: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0 
2018-09-19 11:11:34.796926: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N 
2018-09-19 11:11:34.797220: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/device:GPU:0 with 11399 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:04:00.0, compute capability: 3.7)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 13411014324454836610
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 11953517364
locality {
  bus_id: 1
  links {
  }
}
incarnation: 133570401343557472
physical_device_desc: "device: 0, name: Tesla K80, pci bus id: 0000:04:00.0, compute capability: 3.7"
]

this also confirms you are correctly using the GPU version of tensorflow and have access to one cpu and one Tesla k-80 CTRL+D to exit

Lets try to fit a simple model. Copy the following text into a file called my_test.py using your favorite text editor (for example emacs)

Code Block

language	py
title	my_test.py

#
import tensorflow as tf

(x_train,y_train),(x_test,y_test)=tf.keras.datasets.mnist.load_data()
#Note this will download the mnist dataset to ~/.kears/datasets the first time you run it


#Lets create a 2 hidden layer neural network
input=tf.keras.layers.Input(shape=(28,28))
network=tf.keras.layers.Flatten()(input)
network=tf.keras.layers.Dense(10)(network)
network=tf.keras.layers.LeakyReLU()(network)
network=tf.keras.layers.Dropout(0.2)(network)
network=tf.keras.layers.Dense(10)(network)
network=tf.keras.layers.LeakyReLU()(network)
output=tf.keras.layers.Dense(10,activation='softmax')(network)

my_model=tf.keras.models.Model(input,output)
my_model.summary()
my_model.compile(loss='sparse_categorical_crossentropy',optimizer='adam',metrics=['acc'])
my_model.fit(x_train,y_train,validation_data=(x_test,y_test),epochs=10)

Info

This model is training to identify a hand written number from a 28x28 pixel image. We use two fully connected layers, and dropout (to prevent overfitting).

Example Data: https://en.wikipedia.org/wiki/MNIST_database

Lets run it

Code Block

language	bash

$ python3 my_test.py 

(60000, 28, 28)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 28, 28)            0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                7850      
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 10)                0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 10)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 10)                110       
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU)    (None, 10)                0         
_________________________________________________________________
dense_3 (Dense)              (None, 10)                110       
=================================================================
Total params: 8,070
Trainable params: 8,070
Non-trainable params: 0
_________________________________________________________________
Train on 60000 samples, validate on 10000 samples
Epoch 1/10
2018-09-19 11:52:13.929548: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-09-19 11:52:14.058190: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:04:00.0
totalMemory: 11.92GiB freeMemory: 11.75GiB
2018-09-19 11:52:14.058229: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-09-19 11:52:14.335734: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-19 11:52:14.335778: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0 
2018-09-19 11:52:14.335786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N 
2018-09-19 11:52:14.336087: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11399 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:04:00.0, compute capability: 3.7)
60000/60000 [==============================] - 7s 123us/step - loss: 4.1213 - acc: 0.6669 - val_loss: 1.4595 - val_acc: 0.8431
Epoch 2/10
60000/60000 [==============================] - 7s 109us/step - loss: 1.3066 - acc: 0.7841 - val_loss: 0.4579 - val_acc: 0.8820
Epoch 3/10
60000/60000 [==============================] - 7s 109us/step - loss: 0.6586 - acc: 0.8106 - val_loss: 0.3734 - val_acc: 0.8948
Epoch 4/10
60000/60000 [==============================] - 6s 108us/step - loss: 0.5859 - acc: 0.8254 - val_loss: 0.3708 - val_acc: 0.8942
Epoch 5/10
60000/60000 [==============================] - 6s 107us/step - loss: 0.5356 - acc: 0.8412 - val_loss: 0.3513 - val_acc: 0.9021
Epoch 6/10
60000/60000 [==============================] - 6s 107us/step - loss: 0.5148 - acc: 0.8457 - val_loss: 0.3542 - val_acc: 0.9043
Epoch 7/10
60000/60000 [==============================] - 6s 107us/step - loss: 0.5036 - acc: 0.8484 - val_loss: 0.3397 - val_acc: 0.9064
Epoch 8/10
60000/60000 [==============================] - 6s 107us/step - loss: 0.4928 - acc: 0.8533 - val_loss: 0.3207 - val_acc: 0.9140
Epoch 9/10
60000/60000 [==============================] - 6s 107us/step - loss: 0.4837 - acc: 0.8549 - val_loss: 0.3309 - val_acc: 0.9085
Epoch 10/10
60000/60000 [==============================] - 6s 107us/step - loss: 0.4819 - acc: 0.8560 - val_loss: 0.3224 - val_acc: 0.9127s

Congratulations you've trained your first network on Talapas! Let's try one more time in batch mode.

Use CRTL+D to close your interactive session

Write a new submit script to run the same training code, by putting the following into a file named submit_gpu_test

Code Block

language	bash
title	submit_gpu_test

#!/bin/bash                                                                                             
#SBATCH --job-name=GPUMnistTest     ### Job Name                                                        
#SBATCH --partition=gpu       ### Quality of Service (like a queue in PBS)                              
#SBATCH --time=0-01:00:00     ### Wall clock time limit in Days-HH:MM:SS                                
#SBATCH --nodes=1             ### Node count required for the job                                       
#SBATCH --ntasks-per-node=1   ### Nuber of tasks to be launched per Node                                
#SBATCH --gres=gpu:1          ### General REServation of gpu:number of gpus                             
#SBATCH --account=<your account> 

module load cuda/9.0
module load python3
python3 my_test.py > my_test_output

Submit this job
Code Block
language bash
sbatch submit_gpu_test
That's it wait for your job to finish, and you'll see the training log in ~\my_test_output

...

Versions Compared

Old Version 3

New Version Current

Key