Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. We will start by launching and iterative GPU session from one of the talapas login nodes

    Code Block
    languagebash
    $ srun --account=<your account> --pty --gres=gpu:1 --mem=4G --time=60 --partition=gputestgpu bash


  2. Wait for your interactive session to start
  3. Load the modules for tensorflow

    Code Block
    $ module load cuda/9.0
    $ module load python3


  4.  Check what GPU resources are available

    Code Block
    $ nvidia-smi 
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 390.46                 Driver Version: 390.46                    |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  Tesla K80           Off  | 00000000:04:00.0 Off |                  Off |
    | N/A   35C    P0    60W / 149W |     97MiB / 12206MiB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
                                                                                   

    Shows that we have successfully reserved 1 Tesla K80

  5. Launch python 3 (Note: the python command will give you the default python2 version on the system, use python3)

    Code Block
    languagebash
    $ python3
    >>from tensorflow.python.client import device_lib
    >>print(device_lib.list_local_devices())
    
    2018-09-19 11:11:34.399858: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
    2018-09-19 11:11:34.524069: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: 
    name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
    pciBusID: 0000:04:00.0
    totalMemory: 11.92GiB freeMemory: 11.75GiB
    2018-09-19 11:11:34.524110: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
    2018-09-19 11:11:34.796876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
    2018-09-19 11:11:34.796918: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0 
    2018-09-19 11:11:34.796926: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N 
    2018-09-19 11:11:34.797220: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/device:GPU:0 with 11399 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:04:00.0, compute capability: 3.7)
    [name: "/device:CPU:0"
    device_type: "CPU"
    memory_limit: 268435456
    locality {
    }
    incarnation: 13411014324454836610
    , name: "/device:GPU:0"
    device_type: "GPU"
    memory_limit: 11953517364
    locality {
      bus_id: 1
      links {
      }
    }
    incarnation: 133570401343557472
    physical_device_desc: "device: 0, name: Tesla K80, pci bus id: 0000:04:00.0, compute capability: 3.7"
    ]
    
    
    

    this also confirms you are correctly using the GPU version of tensorflow and have access to one cpu and one Tesla k-80 CTRL+D to exit

  6. Lets try to fit a simple model. Copy the following text into a file called my_test.py using your favorite text editor (for example emacs)


    Code Block
    languagepy
    titlemy_test.py
    #
    import tensorflow as tf
    
    (x_train,y_train),(x_test,y_test)=tf.keras.datasets.mnist.load_data()
    #Note this will download the mnist dataset to ~/.kears/datasets the first time you run it
    
    
    #Lets create a 2 hidden layer neural network
    input=tf.keras.layers.Input(shape=(28,28))
    network=tf.keras.layers.Flatten()(input)
    network=tf.keras.layers.Dense(10)(network)
    network=tf.keras.layers.LeakyReLU()(network)
    network=tf.keras.layers.Dropout(0.2)(network)
    network=tf.keras.layers.Dense(10)(network)
    network=tf.keras.layers.LeakyReLU()(network)
    output=tf.keras.layers.Dense(10,activation='softmax')(network)
    
    my_model=tf.keras.models.Model(input,output)
    my_model.summary()
    my_model.compile(loss='sparse_categorical_crossentropy',optimizer='adam',metrics=['acc'])
    my_model.fit(x_train,y_train,validation_data=(x_test,y_test),epochs=10)
    
    
    



    Info

    This model is training to identify a hand written number from a 28x28 pixel image. We use two fully connected layers, and dropout (to prevent overfitting).

    Example Data: https://en.wikipedia.org/wiki/MNIST_database

     


  7.  Lets run it

    Code Block
    languagebash
    $ python3 my_test.py 
    
    (60000, 28, 28)
    _________________________________________________________________
    Layer (type)                 Output Shape              Param #   
    =================================================================
    input_1 (InputLayer)         (None, 28, 28)            0         
    _________________________________________________________________
    flatten_1 (Flatten)          (None, 784)               0         
    _________________________________________________________________
    dense_1 (Dense)              (None, 10)                7850      
    _________________________________________________________________
    leaky_re_lu_1 (LeakyReLU)    (None, 10)                0         
    _________________________________________________________________
    dropout_1 (Dropout)          (None, 10)                0         
    _________________________________________________________________
    dense_2 (Dense)              (None, 10)                110       
    _________________________________________________________________
    leaky_re_lu_2 (LeakyReLU)    (None, 10)                0         
    _________________________________________________________________
    dense_3 (Dense)              (None, 10)                110       
    =================================================================
    Total params: 8,070
    Trainable params: 8,070
    Non-trainable params: 0
    _________________________________________________________________
    Train on 60000 samples, validate on 10000 samples
    Epoch 1/10
    2018-09-19 11:52:13.929548: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
    2018-09-19 11:52:14.058190: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: 
    name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
    pciBusID: 0000:04:00.0
    totalMemory: 11.92GiB freeMemory: 11.75GiB
    2018-09-19 11:52:14.058229: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
    2018-09-19 11:52:14.335734: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
    2018-09-19 11:52:14.335778: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0 
    2018-09-19 11:52:14.335786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N 
    2018-09-19 11:52:14.336087: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11399 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:04:00.0, compute capability: 3.7)
    60000/60000 [==============================] - 7s 123us/step - loss: 4.1213 - acc: 0.6669 - val_loss: 1.4595 - val_acc: 0.8431
    Epoch 2/10
    60000/60000 [==============================] - 7s 109us/step - loss: 1.3066 - acc: 0.7841 - val_loss: 0.4579 - val_acc: 0.8820
    Epoch 3/10
    60000/60000 [==============================] - 7s 109us/step - loss: 0.6586 - acc: 0.8106 - val_loss: 0.3734 - val_acc: 0.8948
    Epoch 4/10
    60000/60000 [==============================] - 6s 108us/step - loss: 0.5859 - acc: 0.8254 - val_loss: 0.3708 - val_acc: 0.8942
    Epoch 5/10
    60000/60000 [==============================] - 6s 107us/step - loss: 0.5356 - acc: 0.8412 - val_loss: 0.3513 - val_acc: 0.9021
    Epoch 6/10
    60000/60000 [==============================] - 6s 107us/step - loss: 0.5148 - acc: 0.8457 - val_loss: 0.3542 - val_acc: 0.9043
    Epoch 7/10
    60000/60000 [==============================] - 6s 107us/step - loss: 0.5036 - acc: 0.8484 - val_loss: 0.3397 - val_acc: 0.9064
    Epoch 8/10
    60000/60000 [==============================] - 6s 107us/step - loss: 0.4928 - acc: 0.8533 - val_loss: 0.3207 - val_acc: 0.9140
    Epoch 9/10
    60000/60000 [==============================] - 6s 107us/step - loss: 0.4837 - acc: 0.8549 - val_loss: 0.3309 - val_acc: 0.9085
    Epoch 10/10
    60000/60000 [==============================] - 6s 107us/step - loss: 0.4819 - acc: 0.8560 - val_loss: 0.3224 - val_acc: 0.9127s

    Congratulations you've trained your first network on Talapas! Let's try one more time in batch mode.

  8. Use CRTL+D to close your interactive session
  9. Write a new submit script to run the same training code, by putting the following into a file named submit_gpu_test

    Code Block
    languagebash
    titlesubmit_gpu_test
    #!/bin/bash                                                                                             
    #SBATCH --job-name=GPUMnistTest     ### Job Name                                                        
    #SBATCH --partition=gpu       ### Quality of Service (like a queue in PBS)                              
    #SBATCH --time=0-01:00:00     ### Wall clock time limit in Days-HH:MM:SS                                
    #SBATCH --nodes=1             ### Node count required for the job                                       
    #SBATCH --ntasks-per-node=1   ### Nuber of tasks to be launched per Node                                
    #SBATCH --gres=gpu:1          ### General REServation of gpu:number of gpus                             
    #SBATCH --account=<your account> 
    
    module load cuda/9.0
    module load python3
    python3 my_test.py > my_test_output


  10. Submit this job

    Code Block
    languagebash
    sbatch submit_gpu_test

    That's it wait for your job to finish, and you'll see the training log in ~\my_test_output

...