System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): YES
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
- TensorFlow installed from (source or binary): Binary
- TensorFlow version (use command below): 1.12.0
- Python version: 3.5.2
You can collect some of this information using our environment capture script
You can also obtain the TensorFlow version with
python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
Describe the current behavior
float16 matmul is way slower than float32 matmul on CPU
Code to reproduce the issue
import tensorflow as tf
import time
from datetime import timedelta
a = tf.random.normal(shape=[1, 768, 768], dtype=tf.float16)
b = tf.random.normal(shape=[1, 768, 768], dtype=tf.float16)
c = tf.random.normal(shape=[1, 768, 768], dtype=tf.float32)
d = tf.random.normal(shape=[1, 768, 768], dtype=tf.float32)
e = tf.matmul(a, b)
f = tf.matmul(c, d)
config = tf.ConfigProto(
intra_op_parallelism_threads=24,
inter_op_parallelism_threads=24,
allow_soft_placement=True,
device_count={"GPU": 0},
)
with tf.Session(config=config) as sess:
for i in range(100):
if i % 2:
print("16bit -- ", end="")
op = e
else:
print("32bit -- ", end="")
op = f
start = time.monotonic()
sess.run(op)
end = time.monotonic()
print(i, timedelta(seconds=end - start))
output
2019-01-07 16:06:19.698878: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow bi
nary was not compiled to use: AVX512F
32bit -- 0 0:00:00.017297
16bit -- 1 0:00:00.275746
32bit -- 2 0:00:00.002908
16bit -- 3 0:00:00.261320
32bit -- 4 0:00:00.003028
16bit -- 5 0:00:00.253561
32bit -- 6 0:00:00.002849
16bit -- 7 0:00:00.256515
32bit -- 8 0:00:00.006011
16bit -- 9 0:00:00.255613
32bit -- 10 0:00:00.003996
16bit -- 11 0:00:00.242231
32bit -- 12 0:00:00.003338
|