Vision Related Acceleration in FPGA

Stereo Vision

Stereo vision is a well-known technique for acquiring depth information. With two cameras, we capture a pair of images and find the correspondence between the images. The correspondence between the two images reflects the 3D structure of the scene. In recent years, the accuracy of stereo vision system has been increased with the algorithm improvement. However, the computing complexity of these algorithms is quite huge, thus the low processing speed restricts the application of stereo vision. The key part of stereo vision is called stereo matching, which aims to find the correspondence between stereo images. As shown in Fig.1, for each pixel in the left image, we compare it to multiple pixels in the right image and select the most similar one. The coordinate difference between these two pixels is called disparity, which reflects the 3D information of the pixel.

pic1.png

Cost initialization, cost aggregation, disparity computation and post processing are four common steps of stereo matching algorithm. As shown in Fig.2, the initial costs are calculated by pixel-to-pixel comparison. These initial costs are aggregated in a support region to reduce the ambiguity. The disparity corresponding to the minimum cost is selected as the result disparity.

pic2.png

The computing complexity of stereo matching is O(Image Size × Window Size × Disparity Range). It is quite huge. Traditional CPU is not capable to realize real-time processing. Thus FPGA is adopted to accelerate stereo matching algorithms. We focus on the hardware acceleration and related algorithm tuning of binocular stereo matching problem. Several novel structures and systems have been proposed by us.

Hybrid-D Parallelism 1

Box filter is a traditional data reuse technology in the cost aggregation step. However, the 2D box filter costs huge amount of on-chip memory resources in FPGA because we must store a whole row of aggregated costs to realize date reuse in column direction.

To solve this problem, we propose a hybrid-D parallelism structure. As shown in Fig. 3, multiple rows are calculated in parallel. The first row is calculated by an adder tree, while the others are calculated by column data reuse. With this method, no on-chip memory resource is needed to store the matching costs, while data reuse is realized in both the column and row directions. The proposed structure greatly improves the parallelism degree with few additional resources.

pic3.png

Inverted Variable Support Window 2

The support region is used to reduce the ambiguity in single pixel comparison. However, the depth map is often blurred in the discontinuous regions with a fixed rectangle support region. Variable support window is a widely used technology to solve this problem. Among the related works, the cross-based variable support window is a special 2D variable window which enables 2D data reuse when aggregating costs. As shown in Fig. 4(a), it is composed of horizontal bars and the aggregation can be divided into two steps: horizontal aggregation, then vertical aggregation.

We adopt the cross-based variable support window to improve the depth accuracy. However, it is difficult to implement on FPGA because of large amount of memory resource utilization. Current work [] only realize a 1D variable window with a fixed vertical support window size. However, it is difficult to implement on FPGA because of large amount of memory resource utilization. The current work only realizes an 1D variable window with a fixed vertical support window size.

pic4a.png
pic4b.png

To solve this problem, we propose an inverted variable support window, as shown in Fig. 4(b). The window is still 2D variable, but we don’t need to store multiple rows of initial costs in the cost aggregation step. A demo system has been realized and the accuracy is improved compared to the previous work.

Real-time High-quality System 3

A complete real-time high-quality system is built based on the previous works. By using AD-Census cost initialization, cross-based aggregation and semi-global optimization, the system provides high-quality depth results for high-definition images. Special parallelism scheme and dataflow are designed to combine these two technologies efficiently.

The structure of the demo system is shown in Fig. 5. This is the first complete real-time hardware system that supports both cost aggregation on cross-based regions and semi-global optimization on FPGA. The proposed system achieves the best depth accuracy among FPGA-based stereo vision works. Besides, we achieve a processing speed of 1600×1200 images @ 42fps in Stratix V FPGA.

pic5.png

FPGA-based SLAM System on UAV

Contributor: Yu Wang, Wenqiang Wang, Mengyuan Gu, Kaiyuan Guo, Jie Wang, Weiyi Kong

In recent years, UAV (Unmanned aerial vehicle) technology is developing rapidly and has been used in many fields. Localization is a basic component for the UAV navigation system. In the outdoor environment, the localization technology is usually based on GPS and IMU. However, when facing the indoor environment, GPS becomes unstable. Thus the traditional localization technology cannot be used.

Image-based localization is a widely researched technology which is suitable for the indoor environment. In this project, we aim at an indoor localization system based on SLAM (Simultaneous localization and mapping) algorithm. Traditional SLAM systems are usually based on a PC because the computing procedure is quite complex. However, the big volume and high power consumption always requires a big UAV, which restricts the application in indoor environment. Another solution is to put the computing part on the ground station, but this assumption restricts the application in uncontrollable environments.

We plan to migrate the whole SLAM algorithm to the FPGA board in this project. To solve the complex computing procedure, we build a co-processor architecture, as shown in Fig. 6. A soft-core master processor handles the complex date schedule, while the accelerators handles the computing tasks. Some other modules including image rectification and feature extraction can also be put on the FPGA chip. The system is suitable for small indoor UAVs due to simple peripherals and low power consumption. This project is under development now.
pic6.png

References

  • Shaoxia Fang, Lu Tian, Junbin Wang, Shuang Liang, Dongliang Xie, Zhongmin Chen, Lingzhi Sui, Qian Yu, Xiaoming Sun, Yi Shan, and Yu Wang, Real-time Object Detection and Semantic Segmentation Hardware System with Deep Learning Networks , in Proceedings of the International Conference on Field-Programmable Technology (FPT), 2018.
  • Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, Jiajun Liang, EAST: An Efficient and Accurate Scene Text Detector , in Computer Vision and Pattern Recognition (CVPR), 2017. pdf
  • Xijie Jia, Kaiyuan Guo, Wenqiang Wang, Yu Wang, Huazhong Yang, SRI-SURF: A Better SURF Powered by Scaled-RAM Interpolator on FPGA , in International Conference on Field-Programmable Logic and Applications (FPL), 2016, pp.1-8. pdf slide
  • Wenqiang Wang, Jing Yan, Ningyi Xu, Yu Wang, Feng-Hsiung Hsu, Real-time High-quality Stereo Vision System in FPGA , in IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), vol.25, No.10, 2015, pp.1696-1708. pdf
  • Mengyuan Gu, Kaiyuan Guo, Wenqiang Wang, Yu Wang, Huazhong Yang, An FPGA-based Real-time Simultaneous Localization and Mapping System , in the International Conference on Field-Programmable Technology (FPT), 2015, pp.200-203. pdf
  • Yi Shan, Yuchen Hao, Wenqiang Wang, Yu Wang, Xu Chen, Huazhong Yang, Wayne Luk, Hardware Acceleration for an Accurate Stereo Vision System Using Mini-Census Adaptive Support Region , in ACM Transactions on Embedded Computing Systems (TECS), vol.13, No.4s, 2014, pp.132:1-132:24. pdf
  • Wenqiang Wang, Kaiyuan Guo, Mengyuan Gu, Yuchun Ma, Yu Wang, A Universal FPGA-based Floating-point Matrix Processor for Mobile Systems , in Proceedings of the International Conference on Field-Programmable Technology (FPT), 2014, pp.139 - 146. pdf
  • Wenqiang Wang, Jing Yan, Ning-Yi Xu, Yu Wang and Feng-Hsiung Hsu, A Real-time High-quality Stereo Vision System on FPGA , in Proceedings of the International Conference on Field-Programmable Technology (FPT), 2013, pp.358-361. pdf
  • Yi Shan,Zilong Wang,Wenqiang Wang,Yuchen Hao,Yu Wang,Kuen Hung Tsoi,Wayne Luk,Huazhong Yang, FPGA based memory efficient high resolution stereo vision system for video tolling , in Proceedings of the International Conference on the Field-Programmable Technology (FPT), 2012, pp.29-32. pdf

copyright 2019 © NICS Lab of Tsinghua University