Publications

Hardware


2021

 • Hengrui Zhang, Zhongming Yu, Guohao Dai, Guyue Huang, Yufei Ding, Yuan Xie, Yu Wang, Understanding GNN Computational Graph: A Coordinated Computation, IO, and Memory Perspective , 2021. pdf
 • Jincheng Yu, Zhilin Xu, Shulin Zeng, Chao Yu, Jiantao Qiu, Chaoyang Shen, Yuanfan Xu, Guohao Dai, Yu Wang, and Huazhong Yang, INCAME: Interruptible CNN accelerator for multi-robot exploration , in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2021. pdf
 • Yuanfan Xu, Zhaoliang Zhang, Jincheng Yu, Jianfei Cao, Haolin Dong, Zhengfeng Huang, Yu Wang, Huazhong Yang, GAME: Gaussian Mixture Model Mapping and Navigation Engine on Embedded FPGA , to appear in International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2021. pdf
 • Zhongming Yu, Guohao Dai, Guyue Huang, Yu Wang, Huazhong Yang, Exploiting Online Locality and Reduction Parallelism for Sampled Dense Matrix Multiplication on GPUs , to appear in International Conference on Computer Design (ICCD), 2021. pdf

2020

 • Shulin Zeng, Guohao Dai, Hanbo Sun, Kai Zhong, Guangjun Ge, Kaiyuan Guo, Yu Wang, Huazhong Yang, Enabling Efficient and Flexible FPGA Virtualization for Deep Learning in the Cloud , in http://arxiv.org/abs/2003.12101, 2020. pdf
 • Guyue Huang, Guohao Dai, Yu Wang and Huazhong Yang, GE-SpMM: General-purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks , in https://arxiv.org/abs/2007.03179, 2020. pdf
 • Shulin Zeng, Hanbo Sun, Yu Xing, Xuefei Ning, Yi Shan, Xiaoming Chen, Yu Wang, Huazhong Yang, Black Box Search Space Profiling for Accelerator-Aware Neural Architecture Search , to appear in The 25th Asia and South Pacific Design Automation Conference (ASP-DAC 2020), 2020. pdf
 • Shulin Zeng, Guohao Dai, Hanbo Sun, Kai Zhong, Guangjun Ge, Kaiyuan Guo, Yu Wang, Huazhong Yang, Enabling Efficient and Flexible FPGA Virtualization for Deep Learning in the Cloud , to appear in International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2020. pdf
 • Shaoxia Fang, Shulin Zeng and Yu Wang, Optimizing CNN Accelerator with Improved Roofline Model , to appear in IEEE System-On-Chip Conference, 2020. pdf
 • Ziqian Wan, Guohao Dai, Yun Joon Soh, Jishen Zhao, Yu Wang, An Order Sampling Processing-in-Memory Architecture for Approximate Graph Pattern Mining , 2020. pdf slide
 • Xiaoming Chen, Yinhe Han, Yu Wang, Communication Lower Bound in Convolution Accelerators , in IEEE International Symposium on High Performance Computer Architecture, 2020. pdf
 • Guyue Huang, Guohao Dai, Yu Wang and Huazhong Yang, Towards Fast Graph Neural Network Training with Efficient and Framework-Compatible Sparse-Dense Matrix Multiplication , in MICRO-53 Student Research Competition (SRC), 2020. pdf
 • Guyue Huang, Guohao Dai, Yu Wang and Huazhong Yang, GE-SpMM: General-purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks , in The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2020. pdf

2019

 • Xiaoming Chen, Yinhe Han, Yu Wang, Communication Lower Bound in Convolution Accelerators , in https://arxiv.org/abs/1911.05662, 2019.
 • Kaiyuan Guo, Shulin Zeng, Jincheng Yu, Yu Wang and Huazhong Yang, A Survey of FPGA-Based Neural Network Inference Accelerator , in ACM Transactions on Reconfigurable Technology and Systems (TRETS), vol.12, No.1, 2019. pdf

2018

 • Tianhao Huang, Guohao Dai, Yu Wang and Huazhong Yang, HyVE: Hybrid Vertex-Edge Memory Hierarchy for Energy-Efficient Graph Processing , in Design, Automation & Test in Europe Conference & Exhibition (DATE), 2018, pp.973-978. pdf
 • Guohao Dai, Tianhao Huang, Yu Wang, Huazhong Yang, John Wawrzynek, NewGraph: Balanced Large-scale Graph Processing on FPGAs with Low Preprocessing Overheads , in International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2018, pp.208-208. pdf
 • Shaoxia Fang, Lu Tian, Junbin Wang, Shuang Liang, Dongliang Xie, Zhongmin Chen, Lingzhi Sui, Qian Yu, Xiaoming Sun, Yi Shan, and Yu Wang, Real-time Object Detection and Semantic Segmentation Hardware System with Deep Learning Networks , in Proceedings of the International Conference on Field-Programmable Technology (FPT), 2018.
 • Gushu Li, Guohao Dai, Shuangchen Li, Yu Wang, Yuan Xie, GraphIA: An In-situ Accelerator for Large-scale Graph Processing , in International Symposium on Memory Systems (MEMSYS), 2018, pp.79-84. pdf

2017

 • Yuliang Sun, Lanjun Wang, Chen Wang, Yu Wang, Exploiting Stable Data Dependency in Stream Processing Acceleration on FPGAs , in ACM Transactions on Embedded Computing Systems (TECS), 2017. pdf
 • Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, Jiajun Liang, EAST: An Efficient and Accurate Scene Text Detector , in Computer Vision and Pattern Recognition (CVPR), 2017. pdf
 • Guohao Dai, Tianhao Huang, Yuze Chi, Ningyi Xu, Yu Wang, Huazhong Yang, ForeGraph: Exploring Large-scale Graph Processing on Multi-FPGA Architecture , in ACM International Symposium on FPGA (FPGA), 2017, pp.217-226. pdf slide
 • Baofu Zhao, Yubin Li, Yu Wang, Huazhong Yang, Streaming Sorting Network Based BWT Acceleration on FPGA for Lossless Compression , in ICFPT 2017, 2017, pp.247-250. pdf

2016

 • Guohao Dai, Yuze Chi, Yu Wang, Huazhong Yang, FPGP: Graph Processing Framework on FPGA , in ACM International Symposium on FPGA (FPGA), 2016, pp.105-110. pdf slide
 • Xijie Jia, Kaiyuan Guo, Wenqiang Wang, Yu Wang, Huazhong Yang, SRI-SURF: A Better SURF Powered by Scaled-RAM Interpolator on FPGA , in International Conference on Field-Programmable Logic and Applications (FPL), 2016, pp.1-8. pdf slide
 • Yubin Li, Yuliang Sun, Guohao Dai, Qiang Xu, Yu Wang, Huazhong Yang, Approximate Frequent Itemset Mining for Streaming Data on FPGA , in International Conference on Field-Programmable Logic and Applications (FPL), 2016, pp.1-4. pdf
 • Yuze Chi, Guohao Dai, Yu Wang, Guangyu Sun, Guoliang Li, Huazhong Yang, NXgraph: An Efficient Graph Processing System on a Single Machine , in IEEE International Conference on Data Engineering (ICDE), 2016, pp.409-420. pdf slide

2015

 • Wenqiang Wang, Jing Yan, Ningyi Xu, Yu Wang, Feng-Hsiung Hsu, Real-time High-quality Stereo Vision System in FPGA , in IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), vol.25, No.10, 2015, pp.1696-1708. pdf
 • Gushu Li, Xiaoming Chen, Guangyu Sun, Henry Hoffmann, Yongpan Liu, Yu Wang, Huazhong Yang, A STT-RAM-based Low-Power Hybrid Register File for GPGPUs , in 52nd ACM/EDAC/IEEE Design Automation Conference (DAC), 2015, pp.103:1-103:6. pdf
 • Xinyu Niu, Wayne Luk, Yu Wang, EURECA: On-Chip Configuration Generation for Effective Dynamic Data Access , in Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), 2015, pp.74-83. pdf
 • Mengyuan Gu, Kaiyuan Guo, Wenqiang Wang, Yu Wang, Huazhong Yang, An FPGA-based Real-time Simultaneous Localization and Mapping System , in the International Conference on Field-Programmable Technology (FPT), 2015, pp.200-203. pdf
 • Yubin Li, Yuliang Sun, Guohao Dai, Yuzhi Wang, Jiacai Ni, Yu Wang, Guoliang Li, Huazhong Yang, A Self-aware Data Compression System on FPGA in Hadoop , in International Conference on Field-Programmable Technology (FPT), 2015, pp.196-199. pdf
 • Xiaolong Xie, Yun Liang, Yu Wang, Guangyu Sun, Tao Wang, Coordinated Static and Dynamic Cache Bypassing for GPUs , in Proceedings of the IEEE 21st International Symposium on High Performance Computer Architecture (HPCA) , 2015, pp.76-88. pdf

2014

 • Yi Shan, Yuchen Hao, Wenqiang Wang, Yu Wang, Xu Chen, Huazhong Yang, Wayne Luk, Hardware Acceleration for an Accurate Stereo Vision System Using Mini-Census Adaptive Support Region , in ACM Transactions on Embedded Computing Systems (TECS), vol.13, No.4s, 2014, pp.132:1-132:24. pdf
 • Fei Chen,Yi Shan,Yu Zhang,Yu Wang,Hubertus Franke,Xiaotao Chang,Kun Wang, Enabling FPGAs in the Cloud , in Proceedings of the 11th ACM Conference on Computing Frontiers, 2014, pp.3:1-3:10. pdf
 • Yuliang Sun, Zilong Wang, Sitao Huang, Lanjun Wang, Yu Wang, Rong Luo, Huazhong Yang, Accelerating frequent item counting with fpga , in Proceedings of the ACM/SIGDA international symposium on Field-programmable gate arrays (FPGA), 2014, pp.109-112. pdf
 • Guohao Dai, Yi Shan, Fei Chen, Yu Zhang, Yu Wang, Kun Wang and Huazhong Yang, Online Scheduling for FPGA Computation in the Cloud , in International Conference on Field-Programmable Technology (FPT), 2014, pp.330-333. pdf
 • Wenqiang Wang, Kaiyuan Guo, Mengyuan Gu, Yuchun Ma, Yu Wang, A Universal FPGA-based Floating-point Matrix Processor for Mobile Systems , in Proceedings of the International Conference on Field-Programmable Technology (FPT), 2014, pp.139 - 146. pdf

2013

 • Xiang Chen, Ji Zhu, Ziyu Wen, Yu Wang, Huazhong Yang, BER Guaranteed Optimization and Implementation of Parallel Turbo Decoding on GPU , in Proceedings of the 8th International ICST Conference on Communications and Networking in China (CHINACOM), 2013, pp.183-188. pdf
 • Zilong Wang, Sitao Huang, Lanjun Wang, Hao Li, Yu Wang, Huazhong Yang, Accelerating subsequence similarity search based on dynamic time warping distance with FPGA , in Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays (FPGA), 2013, pp.53-62. pdf
 • Xinyu Niu, José Gabriel F. Coutinho, Yu Wang and Wayne Luk, Dynamic Stencil: Effective Exploitation of Run-time Resources in Reconfigurable Clusters , in Proceedings of the International Conference on Field-Programmable Technology (FPT), 2013, pp.214-221. pdf
 • Wenqiang Wang, Jing Yan, Ning-Yi Xu, Yu Wang and Feng-Hsiung Hsu, A Real-time High-quality Stereo Vision System on FPGA , in Proceedings of the International Conference on Field-Programmable Technology (FPT), 2013, pp.358-361. pdf
 • Sitao Huang, Guohao Dai, Yuliang Sun, Zilong Wang, Yu Wang, Huazhong Yang, DTW-Based Subsequence Similarity Search on AMD Heterogeneous Computing Platform , in IEEE International Conference on High Performance Computing and Communications & IEEE International Conference on Embedded and Ubiquitous Computing (HPCCEUC), 2013, pp.1054-1063. pdf

2012

 • Zhaoran Wang, Yu Zhang, Xiaotao Chang, Xiang Mi, Yu Wang, Kun Wang, Huazhong Yang, Pub/Sub on stream: a multi-core based message broker with QoS support , in Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems (DEBS), 2012, pp.127-138. pdf
 • Yi Shan,Zilong Wang,Wenqiang Wang,Yuchen Hao,Yu Wang,Kuen Hung Tsoi,Wayne Luk,Huazhong Yang, FPGA based memory efficient high resolution stereo vision system for video tolling , in Proceedings of the International Conference on the Field-Programmable Technology (FPT), 2012, pp.29-32. pdf

2011

 • Jing Yan, Ning-YI Xu, Xiong-FEI Cai, Rui Gao, Yu Wang, Rong Luo, Feng-HSIUNG Hsu, An FPGA-based accelerator for LambdaRank in Web search engines , in ACM Transactions on Reconfigurable Technology and Systems (TRETS), vol.4, No.3, 2011, pp.25:1-25:19. pdf
 • Tianji Wu, Di Wu, Yu Wang, Xiaorui Zhang, Hong Luo, Ningyi Xu, Huazhong Yang, Gemma in April: A matrix-like parallel programming architecture on OpenCL , in Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), 2011, pp.703-708. pdf

2010

 • Yi Shan, Bo Wang, Jing Yan, Yu Wang, Ningyi Xu, Huazhong Yang, FPMR: MapReduce framework on FPGA , in Proceedings of the 18th Annual ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA), 2010, pp.93-102. pdf
 • Jing Yan, Ning-Yi Xu, Xiong-Fei Cai, Rui Gao, Yu Wang, Rong Luo, Feng-Hsiung Hsu, LambdaRank acceleration for relevance ranking in web search engines , in Proceedings of the 18th Annual ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA), 2010, pp.285-285.
 • Di Wu, Tianji Wu, Yi Shan, Yu Wang, Yong He, Ningyi Xu, Huazhong Yang, Making human connectome faster: GPU acceleration of brain network analysis , in Proceedings of the IEEE 16th International Conference on Parallel and Distributed Systems (ICPADS), 2010, pp.593-600. pdf
 • Tianji Wu, Bo Wang, Yi Shan, Feng Yan, Yu Wang, Ningyi Xu, Efficient pagerank and spmv computation on amd gpus , in Proceedings of the 39th International Conference on Parallel Processing (ICPP) , 2010, pp.81-89. pdf
 • Yi Shan, Tianji Wu, Yu Wang, Bo Wang, Zilong Wang, Ningyi Xu, Huazhong Yang, FPGA and GPU implementation of large scale SpMV , in Proceedings of the IEEE 8th Symposium on Application Specific Processors (SASP) , 2010, pp.64-70. pdf

2009

 • Jing Yan, Ning-Yi Xu, Xiong-Fei Cai, Rui Gao, Yu Wang, Rong Luo, Feng-Hsiung Hsu, FPGA-based acceleration of neural network for ranking in web search engine with a streaming architecture , in Proceedings of the International Conference on Field Programmable Logic and Applications (FPL), 2009, pp.662-665. pdf
 • Bo Wang, Tianji Wu, Feng Yan, Ruirui Li, Ningyi Xu, Yu Wang, RankBoost Acceleration on both NVIDIA CUDA and ATI Stream platforms , in Proceedings of the 15th International Conference on Parallel and Distributed Systems (ICPADS), 2009, pp.284-291. pdf
 • Guangming Yu, Yu Wang, Huazhong Yang, Hui Wang, A fast-locking all-digital phase-locked loop with a novel counter-based mode switching controller , in Proceedings of the TENCON IEEE Region 10 Conference (TENCON), 2009, pp.1-5.

copyright 2021 © NICS Lab of Tsinghua University