本文彙總了近年來關於Deep-Learning Processor領域的相關論文。

2014

ASPLOS

Chen, Tianshi, et al. “DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning.” architectural support for programming languages and operating systems (2014): 269-284.

MICRO

Chen, Yunji, et al. “DaDianNao: A Machine-Learning Supercomputer.” international symposium on microarchitecture (2014): 609-622.

2015

ISCA

Du, Zidong, et al. “ShiDianNao: shifting vision processing closer to the sensor.” international symposium on computer architecture (2015): 92-104.

ASPLOS

Liu, Daofu, et al. “PuDianNao: A Polyvalent Machine Learning Accelerator.” architectural support for programming languages and operating systems (2015): 369-381.

2016

MICRO

Zhang, Shijin, et al. “Cambricon-x: an accelerator for sparse neural networks.” international symposium on microarchitecture (2016): 1-12.
Han, Song, et al. “EIE: efficient inference engine on compressed deep neural network.” international symposium on computer architecture (2016): 243-254

ISCA

Liu, Shaoli, et al. “Cambricon: an instruction set architecture for neural networks.” international symposium on computer architecture (2016): 393-405.
Chen, Yuhsin, Joel Emer, and Vivienne Sze. “Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks.” international symposium on computer architecture (2016): 367-379.

ISSCC

Chen, Yuhsin, et al. “14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks.” international solid-state circuits conference (2016): 262-263.
Sim, Jaehyeong, et al. “14.6 A 1.42TOPS/W deep convolutional neural network recognition processor for intelligent IoE systems.” international solid-state circuits conference (2016): 264-265.

VLSI

Moons, Bert, and Marian Verhelst. “A 0.3–2.6 TOPS/W precision-scalable processor for real-time large-scale ConvNets.” 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits). IEEE, 2016.

2017

ISSCC

Desoli, Giuseppe, et al. “14.1 A 2.9TOPS/W deep convolutional neural network SoC in FD-SOI 28nm for intelligent embedded systems.” international solid-state circuits conference (2017): 238-239.
Shin, Dongjoo, et al. “14.2 DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks.” international solid-state circuits conference (2017): 240-241.
Whatmough, Paul N., et al. “14.3 A 28nm SoC with a 1.2GHz 568nJ/prediction sparse deep-neural-network engine with >0.1 timing error rate tolerance for IoT applications.” international solid-state circuits conference (2017): 242-243.
Price, Michael, James Glass, and Anantha P. Chandrakasan. “14.4 A scalable speech recognizer with deep-neural-network acoustic models and voice-activated power gating.” international solid-state circuits conference (2017): 244-245.
Moons, Bert, et al. “14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI.” international solid-state circuits conference (2017): 246-247.
Bang, Suyoung, et al. “14.7 A 288µW programmable deep-learning processor with 270KB on-chip weight storage using non-uniform memory hierarchy for mobile intelligence.” international solid-state circuits conference (2017): 250-251.

ISCA

Jouppi, Norman P., et al. “In-Datacenter Performance Analysis of a Tensor Processing Unit.” international symposium on computer architecture (2017): 1-12.

VLSI

Yin, Shouyi, et al. “A 1.06-to-5.09 TOPS/W reconfigurable hybrid-neural-network processor for deep learning applications.” symposium on vlsi circuits (2017).

2018

ISSCC

Lee, Jinmook, et al. “UNPU: A 50.6 TOPS/W unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision.” 2018 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, 2018.

Hot chips

ARM’s First Generation ML Processor. (ARM)
The NVIDIA Deep Learning Accelerator. (NVIDIA)

Deep-Learning Processor（更新中）

2014

ASPLOS

MICRO

2015

ISCA

ASPLOS

2016

MICRO

ISCA

ISSCC

VLSI

2017

ISSCC

ISCA

VLSI

2018

ISSCC

Hot chips

Python實現大麥網搶票的四大關鍵技術點解析

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

搶票還用加速？Python助你購得一票

matlab工具生成可編程FIR濾波器的HDL代碼

關於語音特徵提取(MFCC)的matlab相關函數

Tsinghua發表的論文（Deep-Learning Processor ）

KAIST 電子工程系半導體實驗室ISSCC會議發表的文章（Deep-Learning Processor）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結