With the coming of big data, there is an increasing throughput requirement of applications in many domains, challenging traditional time series data mining algorithms in terms of time cost and storage. The streaming data are generated at a high speed with unknown length. The information lies in both the data and the order of data. There are also high throughput and real time analysis requirements. The conventional CPU solution becomes harder and harder to satisfy the processing requirement, so there is an increasing trend to use parallel hardware.
Based on these features, we are inspired to pipeline and parallel processing.
FPGA shows its advantages in dealing with time series due to its gate-level fine-grained parallelism. We summarize the computation and data dependency patterns in the time series data mining, and illustrate the patterns with two typical applications:
- Dynamic Time Warping (DTW) Distance
- Space-Saving in Frequent Item Counting
We present a streaming algorithm model which covers a large range of time series mining algorithms. The model have two properties: 1) operations on each input tuple are comprised by homogeneous sub-functions; 2) sub-functions follow a stable data dependence pattern. Based on the model, we propose a general FPGA architecture, Processing Element Ring (PE-ring), to implement algorithms conform to the model. The PE-Ring fully exploits the fine-grained parallelism, together with provides more flexibility and scalability. We demonstrate the efficiency of our PE-ring architecture with two typical applications.
Dynamic Time Warping (DTW) Distance
Dynamic Time Warping (DTW) is a widely used distance metric in time series data mining. In spite of the great effort in software speedup techniques, including early abandoning strategies, lower bound, indexing, computation-reuse, DTW still cost too much time for many applications. Since DTW is a 2-Dimension sequential dynamic search with quite high data dependency, it is hard to use parallel hardware to accelerate it. We propose a novel framework for FPGA based subsequence similarity search and a novel PE-ring structure for DTW calculation. This framework utilizes the data reusability of continuous DTW calculations to reduce the bandwidth. The PE-ring supports on-line updating patterns of arbitrary lengths, and utilizes the hard-wired synchronization of FPGA to realize the fine-grained parallelism. It also achieves flexible parallelism degree to do performance-cost trade-off.
The experimental results show that we can achieve 2 to 5 orders of magnitude speedup in accelerating subsequence similarity search compared with the best software with different warping degree and datasets, 3699 times faster than the current GPU implementation and 203 times faster than the current FPGA implementation.
Space-Saving in Frequent Item Counting
With the rapid rising of data input speeds, the most challenging problem in frequent item counting is to meet the requirement of wire-speed processing. We propose a streaming oriented PE-ring framework on FPGA for counting frequent items. Compared with the best existing FPGA implementation, our basic PE-ring framework saves 50% lookup table resources cost and achieves the same throughput in a more scalable way. Furthermore, we adopt SIMD-like cascaded filter for further performance improvements. The improvements attribute to 3.2x speedup with respect to the state-of-the-art designs.
- Approximate Frequent Itemset Mining for Streaming Data on FPGA , in International Conference on Field-Programmable Logic and Applications (FPL), 2016, pp.1-4. pdf
- Accelerating frequent item counting with fpga , in Proceedings of the ACM/SIGDA international symposium on Field-programmable gate arrays (FPGA), 2014, pp.109-112. pdf
- Accelerating subsequence similarity search based on dynamic time warping distance with FPGA , in Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays (FPGA), 2013, pp.53-62. pdf
- DTW-Based Subsequence Similarity Search on AMD Heterogeneous Computing Platform , in Proceedings of the IEEE 10th International Conference on High Performance Computing and Communications & IEEE International Conference on Embedded and Ubiquitous Computing (HPCCEUC), 2013, pp.1054-1063. pdf
- Pub/Sub on stream: a multi-core based message broker with QoS support , in Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems (DEBS), 2012, pp.127-138. pdf