Optimizing TensorFlow for CPU

A few days ago I participated in a Brazilian Developers Conference. Among many lectures, one about Intel architecture was very interesting.

Intel is working to provide more performance for Artificial Intelligence projects with different hardware and software optimizations, as you can see in the following figure. Intel has a TensorFlow distribution with many optimizations for the CPU.

In this post I will show the results of my test with different TensorFlow and Python distributions with Oracle Cloud.

For this test I used 2 Compute instances from Oracle Cloud:

  • Instance 1 = 1 OCPU + 7,5GB memory
  • Instance 2 = 2 OCPUs + 30GB memory

In both compute instances I created 3 different Anaconda environments:

  • Python + TensorFlow
conda create -n py36tf tensorflow python=3.6
  • Python + TensorFlow-mkl (TensorFlow with Intel MKL DNN)
conda create -n py36tfmkl tensorflow-mkl python=3.6
  • Python (Intel) + TensorFlow-mkl (TensorFlow with Intel MKL DNN)
conda create -n py36tfmklintel tensorflow-mkl python=3.6 -c intel 

To evaluate the TensorFlow performance I used this python script:

import tensorflow as tf
import time

tf.set_random_seed(42)
A = tf.random_normal([10000,10000])
B = tf.random_normal([10000,10000])

def check():
    start_time = time.time()
    with tf.Session() as sess:
        print(sess.run(tf.reduce_sum(tf.matmul(A,B))))
    print("It took {} seconds".format(time.time() - start_time))

check()

These were the results:

  • Python + TensorFlow

In instance 1 it took 70.50 seconds.

In instance 2 it took 35.95 seconds.

  • Python + TensorFlow-mkl (TensorFlow with Intel MKL DNN)

In instance 1 it took 15.69 seconds.

In instance 2 it took 8.00 seconds.

  • Python (Intel) + TensorFlow-mkl (TensorFlow with Intel MKL DNN)

In instance 1 it took 15.85 seconds.

In instance 2 it took 8.13 seconds.

The difference between the best and the worst cases in both compute instances was 450%!

Python + TensorFlow-mkl (TensorFlow with Intel MKL DNN) were the best tools, and Python + TensorFlow were the worst tools.

These results prove that you can use the CPU to train and execute your machine learning and deep learning projects. All you have to do is use the right tool.

Have a good time!

Author: Waslley Souza

Consultor Oracle com foco em tecnologias Oracle Fusion Middleware e SOA. Certificado Oracle WebCenter Portal, Oracle ADF e Java.

Leave a Reply

Your email address will not be published. Required fields are marked *