Neural Networks on Emerging Devices
RRAM-based Approximate Computation
Approximate computing provides a promising solution to close the gap of power efficiency between current capabilities and future requirements. In this work, we introduce an RRAM-based power efficient framework for analog approximate computing. A programmable RRAM-based approximate computing unit (RRAM-ACU) is introduced first to accelerate numeric computation and a scalable approximate computing framework is proposed on top of the RRAM-ACU. We also introduce a full scheme for RRAM-ACU configuration, including a neural approximator training, an approximator parameter to RRAM state mapping and an RRAM writing scheme. A predictive compact model is also developed to analyze the configuration overhead. The simulation results on a set of diverse benchmarks show that the RRAM-ACU achieves 10.26~491.02× speedup and power efficiency of 24.59~567.98 GFLOPS/W with quality loss of 8.72% on average. In addition, a system-level simulation of HMAX application atop our proposed RRAM-based approximate computing framework demonstrates >12.8x power efficiency improvements than its pure digital implementation counterpart (CPU, GPU, and FPGA).
Spiking Neural Network with RRAM for Real-World Application?
Inspired by the human brain's function and efficiency, neuromorphic computing offers a promising solution for a wide set of tasks, ranging from brain machine interfaces to real-time classification. The spiking neural network (SNN), which encodes and processes information with bionic spikes, is an emerging neuromorphic model with great potential to drastically promote the performance and efficiency of computing systems. However, an energy efficient hardware implementation and the difficulty of training the model significantly limit the application of the spiking neural network. In this work, we address these issues by building an SNN-based energy efficient system for real time classification with metal-oxide resistive switching random-access memory (RRAM) devices. We implement different training algorithms of SNN, including Spiking Time Dependent Plasticity (STDP) and Neural Sampling method. Our RRAM SNN systems for these two training algorithms show good power efficiency and recognition performance on real-time classification tasks, such as the MNIST digit recognition. Finally, we propose a possible direction to further improve the classification accuracy by boosting multiple SNNs.
Large Scale Neural Network on GPU and FPGA
Training Neural Networks with GPU
Large scale artificial neural networks (ANNs) have been widely used in data processing applications. Training phase is the critical operation of neural network. In recent years, the use of graphics processing units (GPUs) becomes a significant advance to speed up the training process of large scale neural networks by taking advantage of the massive parallelism capabilities of GPUs. In our work, efficient parallel neural network training on servers, each equipped with multi GPUs, are being studied. Our early work includes an efficient GPU implementation of the large scale recurrent neural network. The recurrent neural network (RNN) is a special type of neural network equipped with additional recurrent connections. However, the large computation complexity makes it difficult to effectively train a recurrent neural network and therefore significantly limits the research on the recurrent neural network in the last 20 years. In this work, we explore the potential parallelism of the recurrent neural network and propose a fine-grained two-stage pipeline implementation. Experiment results show that the proposed GPU implementation can achieve 2~11x speed-up compared with the basic CPU implementation with the Intel Math Kernel Library.
Embedded CNN on FPGA
Convolutional Neural Network (CNN) has shown significant performance improvement in recent years. In ImageNet 2014, one of the largest and most challenging computer vision challenge in the world, GoogLeNet from Google has won the classification competition with 6.65% Top-5 error rate. The promising results have shown great possibilities of application scenarios for CNN in the near future. Recent CNN models, albeit with remarkable performance improvement, are becoming more and more complex in terms of model architecture, making it difficult to achieve real-time performance on embedded systems, which are bounded with energy and computation resources. Our group are aiming to use CNN to perform object classification tasks on FPGA, achieving real-time performance and high energy efficiency. Our contribution can be concluded as follows:
- We cut down the computation workloads of CNN models from the arithmetic level, while trying to reduce the performance degradation as less as possible.
- We use fixed-point numbers with changeable precision instead of float numbers in the hardware system due to limited bandwidth. The precision of fixed-point numbers is carefully studied to reduce performance degradation.
- We design customizable architecture on hardware to fully utilize the computation resources and bandwidth of the available FPGA platform.
- Instruction Driven Cross-layer CNN Accelerator For Fast Detection on FPGA , to appear in ACM Transactions on Reconfigurable Technology and Systems (TRETS), 2019. pdf
- Learning the Sparsity for ReRAM: Mapping and Pruning Sparse Neural Network for ReRAM based Accelerator , to appear in Proceedings of the 24th Asia and South Pacific Design Automation Conference (ASP-DAC), 2019.
- Stuck-at Fault Tolerance in RRAM Computing Systems , in IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS), vol.8, No.1, 2018, pp.102-115. pdf
- Angel-Eye: A Complete Design Flow for Mapping CNN onto Embedded FPGA , in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol.37, No.1, 2018, pp.35-47. pdf
- MNSIM: Simulation Platform for Memristor-based Neuromorphic Computing System , in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol.37, No.5, 2018, pp.1009-1022. pdf
- Fault Tolerance for RRAM-Based Matrix Operations , to appear in International Test Conference (ITC), 2018. pdf
- Training Low Bitwidth Convolutional Neural Networks on RRAM , in Proceedings of the 23rd Asia and South Pacific Design Automation Conference (ASP-DAC), 2018, pp.117-122. pdf
- Long Live TIME: Improving Lifetime for Training-In-Memory Engines by Structured Gradient Sparsification , in Design Automation Conference (DAC), 2018. pdf
- A Peripheral Circuit Reuse Structure Integrated with a Retimed Data Flow for Low Power RRAM Crossbar-based CNN , in DATE, 2018, pp.1057-1062. pdf
- Rescuing Memristor-based Computing with Non-linear Resistance Levels , in DATE 2018, 2018, pp.407-412. pdf
- Real-time object detection towards high power efficiency , in Design, Automation & Test in Europe Conference & Exhibition (DATE), 2018, pp.704-708. pdf
- Design of Fault-Tolerant Neuromorphic Computing Systems , in European Test Symposium, 2018.
- Low Power Driven and Multi-CLP aware Loop Tiling for RRAM Crossbar-based CNN , in ACM/SIGAPP Symposium On Applied Computing (SAC), 2018.
- Software–Hardware Codesign for Efficient Neural Network Acceleration , in IEEE Micro, vol.37, No.2, 2017, pp.18-25. pdf
- A Compact Memristor-Based Dynamic Synapse for Spiking Neural Networks , in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , vol.36, No.8, 2017. pdf
- Computation-Oriented Fault-Tolerance Schemes for RRAM Computing Systems , in Proceedings of the 22nd Asia and South Pacific Design Automation Conference (ASP-DAC), 2017, pp.794-799. pdf slide
- Binary Convolutional Neural Network on RRAM , in Proceedings of the 22nd Asia and South Pacific Design Automation Conference (ASP-DAC), 2017, pp.782-787. pdf slide
- Fault-Tolerant Training with On-Line Fault Detection for RRAM-Based Neural Computing Systems , in DAC, 2017. pdf
- TIME:A Training-in-memory Architecture for Memristor-based Deep Neural Network , in Design Automation Conference (DAC), 2017, pp.26:1-26:6. pdf slide
- ESE: Efficient Speech Recognition Engine with Compressed LSTM on FPGA , in ACM International Symposium on FPGA, 2017, pp.75-84. pdf
- Circuit Design for Beyond Von Neumann Applications Using Emerging Memory: From Nonvolatile Logics to Neuromorphic Computing , in International Symposium on Quality Electronic Design (ISQED), 2017, pp.23-28. pdf
- A 462GOPs/J RRAM-Based Nonvolatile Intelligent Processor for Energy Harvesting IoE System Featuring Nonvolatile Logics and Processing-In-Memory , in IEEE Symposium on VLSI Circuits (VLSIC), 2017. pdf
- Exploring the Precision Limitation for RRAM-based Analog Approximate Computing , in IEEE Design & Test (D&T), vol.33, No.1, 2016, pp.51-58. pdf
- Harmonica: A Framework of Heterogeneous Computing Systems With Memristor-Based Neuromorphic Computing Accelerators , in IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, 2016. pdf
- Towards Real-Time Object Detection on Embedded Systems , in IEEE Transactions on Emerging Topics in Computing, 2016. pdf
- Technological Exploration of RRAM Crossbar Array for Matrix-Vector Multiplication , in Journal of Computer Science and Technology (JCST), vol.31, No.1, 2016, pp.3-19. pdf
- All-Spin Artificial Neural Network based on Compound Spintronic Synapse and Neuron , in IEEE Transactions on Biomedical Circuits and Systems, 2016. pdf
- RRAM Based Learning Acceleration , in Compliers, Architectures, and Sythesis of Embedded Systems (CASES) invited talk, 2016, pp.1-2. pdf
- Switched by Input: Power Efficient Structure for RRAM-based Convolutional Neural Network , in Design Automation Conference (DAC), 2016, pp.125:1-125:6. pdf slide
- MNSIM: Simulation Platform for Memristor-based Neuromorphic Computing System , in DATE, 2016, pp.469-474. pdf slide
- Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , in ACM International Symposium on FPGA, 2016, pp.26-35. pdf slide
- A Data Locality-aware Design Framework for Reconfigurable Sparse Matrix-Vector Multiplication Kernel , in International Conference On Computer Aided Design (ICCAD), 2016, pp.1-8. pdf
- A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory , in The 43rd ACM/IEEE International Symposium on Computer Architecture, 2016, pp.1-14. pdf
- Low Power Convolutional Neural Networks on a Chip , in ISCAS, 2016, pp.129-132. pdf slide
- Angel-Eye: A Complete Design Flow for Mapping CNN onto Customized Hardware , in IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2016, pp.24-29. pdf
- RRAM-based Analog Approximate Computing , in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol.34, No.12, 2015, pp.1905-1917. pdf
- Technological Exploration of RRAM Crossbar Array For Matrix-Vector Multiplication , in Proceedings of the 20th Asia and South Pacific Design Automation Conference (ASP-DAC), 2015, pp.106-111. pdf
- Merging the interface: Power, area and accuracy co-optimization for RRAM crossbar-based mixed-signal , in 52nd ACM/EDAC/IEEE Design Automation Conference (DAC), 2015, pp.13:1-13:6. pdf
- Reno: A Highly-efficient Reconfigurable Neuromorphic Computing Accelerator Design , in 52nd ACM/EDAC/IEEE Design Automation Conference (DAC), 2015, pp.1-6. pdf
- Spiking Neural Network with RRAM : Can We Use it for Real-World Application? , in Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015, pp.860-865. pdf
- FPGA Acceleration for Recurrent Neural Network Language Model , in Proceedings of the IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2015, pp.111-118. pdf
- Energy Efficient RRAM Spiking Neural Network for Real Time Classification , in Proceedings of the 25th Edition on Great Lakes Symposium on VLSI (GLSVLSI), 2015, pp.189-194. pdf
- Rebooting Computing and Low-Power Image Recognition Challenge , in IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2015, pp.927-932. pdf
- Scaling-up Resistive Synaptic Arrays for Neuro-inspired Architecture: Challenges and Prospect , in IEEE International Electron Devices Meeting (IEDM), 2015, pp.451-454. pdf
- Integrated photonic reservoir computing based on hierarchical time-multiplexing structure , in Optical Express, vol.22, No.25, 2014, pp.31356-31370. pdf
- Training itself: Mixed-signal training acceleration for memristor-based neural network. , in Proceedings of the 19th Asia and South Pacific Design Automation Conference (ASP-DAC), 2014, pp.361-366. pdf
- The stochastic modeling of TiO2 memristor and its usage in neuromorphic system design. , in Proceedings of the 19th Asia and South Pacific Design Automation Conference (ASP-DAC), 2014, pp.831-836. pdf
- ICE: inline calibration for memristor crossbar-based computing engine , in Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), 2014, pp.1-4. pdf
- Energy efficient neural networks for big data analytics , in Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), 2014, pp.1-2. pdf
- Large Scale Recurrent Neural Network on GPU , in Proceedings of the International Joint Conference on Neural Networks (IJCNN), 2014, pp.4062 - 4069. pdf
- Energy Efficient Spiking Neural Network Design with RRAM Devices , in Proceedings of the 14th International Symposium on Integrated Circuits (ISIC), 2014, pp.268 - 271. pdf
- Memristor-based approximated computation , in Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2013, pp.242-247. pdf