float16 matmul is way slower than float32 matmul on CPU #24738

Open

dchatterjee172 opened this issue on 7 Jan 2019 · 1 comment

Open

float16 matmul is way slower than float32 matmul on CPU#24738

dchatterjee172 opened this issue on 7 Jan 2019 · 1 comment

Comments

dchatterjee172 commented on 7 Jan 2019

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): YES
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
TensorFlow installed from (source or binary): Binary
TensorFlow version (use command below): 1.12.0
Python version: 3.5.2

You can collect some of this information using our environment capture script
You can also obtain the TensorFlow version with
python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

Describe the current behavior
float16 matmul is way slower than float32 matmul on CPU

Code to reproduce the issue

import tensorflow as tf
import time
from datetime import timedelta


a = tf.random.normal(shape=[1, 768, 768], dtype=tf.float16)
b = tf.random.normal(shape=[1, 768, 768], dtype=tf.float16)

c = tf.random.normal(shape=[1, 768, 768], dtype=tf.float32)
d = tf.random.normal(shape=[1, 768, 768], dtype=tf.float32)

e = tf.matmul(a, b)
f = tf.matmul(c, d)

config = tf.ConfigProto(
    intra_op_parallelism_threads=24,
    inter_op_parallelism_threads=24,
    allow_soft_placement=True,
    device_count={"GPU": 0},
)

with tf.Session(config=config) as sess:
    for i in range(100):
        if i % 2:
            print("16bit -- ", end="")
            op = e
        else:
            print("32bit -- ", end="")
            op = f
        start = time.monotonic()
        sess.run(op)
        end = time.monotonic()
        print(i, timedelta(seconds=end - start))

output

2019-01-07 16:06:19.698878: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow bi
nary was not compiled to use: AVX512F
32bit -- 0 0:00:00.017297
16bit -- 1 0:00:00.275746
32bit -- 2 0:00:00.002908
16bit -- 3 0:00:00.261320
32bit -- 4 0:00:00.003028
16bit -- 5 0:00:00.253561
32bit -- 6 0:00:00.002849
16bit -- 7 0:00:00.256515
32bit -- 8 0:00:00.006011
16bit -- 9 0:00:00.255613
32bit -- 10 0:00:00.003996
16bit -- 11 0:00:00.242231
32bit -- 12 0:00:00.003338

jvishnuvardhan self-assigned this on 9 Jan 2019