# Optimizing TensorFlow for CPU

A few days ago I participated in a Brazilian Developers Conference. Among many lectures, one about Intel architecture was very interesting.

Intel is working to provide more performance for Artificial Intelligence projects with different hardware and software optimizations, as you can see in the following figure. Intel has a TensorFlow distribution with many optimizations for the CPU.

In this post I will show the results of my test with different TensorFlow and Python distributions with Oracle Cloud.

For this test I used 2 Compute instances from Oracle Cloud:

• Instance 1 = 1 OCPU + 7,5GB memory
• Instance 2 = 2 OCPUs + 30GB memory

In both compute instances I created 3 different Anaconda environments:

• Python + TensorFlow
```conda create -n py36tf tensorflow python=3.6
```
• Python + TensorFlow-mkl (TensorFlow with Intel MKL DNN)
```conda create -n py36tfmkl tensorflow-mkl python=3.6
```
• Python (Intel) + TensorFlow-mkl (TensorFlow with Intel MKL DNN)
```conda create -n py36tfmklintel tensorflow-mkl python=3.6 -c intel
```

To evaluate the TensorFlow performance I used this python script:

```import tensorflow as tf
import time

tf.set_random_seed(42)
A = tf.random_normal([10000,10000])
B = tf.random_normal([10000,10000])

def check():
start_time = time.time()
with tf.Session() as sess:
print(sess.run(tf.reduce_sum(tf.matmul(A,B))))
print("It took {} seconds".format(time.time() - start_time))

check()
```

These were the results:

• Python + TensorFlow

In instance 1 it took 70.50 seconds.

In instance 2 it took 35.95 seconds.

• Python + TensorFlow-mkl (TensorFlow with Intel MKL DNN)

In instance 1 it took 15.69 seconds.

In instance 2 it took 8.00 seconds.

• Python (Intel) + TensorFlow-mkl (TensorFlow with Intel MKL DNN)

In instance 1 it took 15.85 seconds.

In instance 2 it took 8.13 seconds.

The difference between the best and the worst cases in both compute instances was 450%!

Python + TensorFlow-mkl (TensorFlow with Intel MKL DNN) were the best tools, and Python + TensorFlow were the worst tools.

These results prove that you can use the CPU to train and execute your machine learning and deep learning projects. All you have to do is use the right tool.

Have a good time!

## Author: Waslley Souza

Consultor Oracle com foco em tecnologias Oracle Fusion Middleware e SOA. Certificado Oracle WebCenter Portal, Oracle ADF e Java.

## 2 thoughts on “Optimizing TensorFlow for CPU”

1. Michael says:

Funnywise I came to this topic from another suggestion using tensorflow-mkl from conda over pip.

The results look strange:
only 33% CPU usage on all 4 cores (8 threads) with tensorflow-mkl
upto 100% CPU usage on all 4 cores (8 threads) with tensorflow-eigen

CPU is Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz, 2701 MHz, 4 cores, 8 threads
The time to complete the learning of the model took half(!) the time with tensorflow-eigen!
I use latest miniconda 64bit with python 3.7

2. Hi Waslley,

I tried to use Ptyhon + Tensorflow-mkl for my project, but it actually made it less performant.

I am working on fast transfer style with tensorflow.

When I use Python2.7 + Tensorflow 1.15 it takes 5 seconds to apply a style to a JPEG (2000x2000pixels).

When I use Python2.7 + Tensorflow-mkl it takes 8 seconds for the same JPEG.

And when I use Python3 + Tensorflow-mkl it takes 9 seconds for the same JPEG.

I even tried to use intel-python. I managed to make it work only with intel-Python3.
But I got the same results (9 seconds for the same JPEG).

So I guess this optimization is not suited for transfer styles ?