A Task Parallel Accelerator with Dynamic Pipeline Balancing

Authors

Tianyuan Xu University of Michigan, Ann Arbor
Yihan Wang University of Michigan, Ann Arbor
You Zhang University of Michigan, Ann Arbor
Yuang Lu University of Michigan, Ann Arbor

DOI:

https://doi.org/10.53469/wjimt.2024.07(06).11

Keywords:

Parallel Accelerator, Dynamic Pipeline Balancing, Coarse-grained reconfigurable array(CGRA)

Abstract

Coarse-grained reconfigurable array(CGRA) based ac- celerator is a promising architecture to accelerate data processing workloads. CGRA-based accelerator features more flexibility in adopting various workloads while remaining powerful in comparison to the traditional application specified accelerator. However, the strength of CGRAs is limited by irregular data access and dependence patterns. The previously proposed task- based execution model enabled work-aware dynamic scheduling, but the effectiveness is still limited by the regular execution resources. To address these issues, we proposed a heterogeneous architecture to improve parallelism for pipeline-enabled task streaming. We enabled dynamic pipeline balancing on this architecture with minor modifications to the task stream annotation. We compare the execution result on various heterogeneous structures with the regular configuration. Overall, we find that our architecture can improve performance with the same overall resources.

References

V. Dadu and T. Nowatzki, “TaskStream: accelerating task-parallel workloads by recovering program structure,” in Proceedings of the 27th ACM International Conference on Architectural Support for Pro- gramming Languages and Operating Systems, 2022, pp. 1–13. doi: 10.1145/3503222.3507706.

J. Cong, H. Huang, C. Ma, B. Xiao and P. Zhou, “A Fully Pipelined and Dynamically Composable Architecture of CGRA,” 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines, 2014, pp. 9-16, doi: 10.1109/FCCM.2014.12.

M. Wijtvliet, L. Waeijen and H. Corporaal, “Coarse grained reconfig- urable architectures in the past 25 years: Overview and classification,” 2016 International Conference on Embedded Computer Systems: Archi- tectures, Modeling and Simulation (SAMOS), 2016, pp. 235-244, doi: 10.1109/SAMOS.2016.7818353.

T. Nowatzki, V. Gangadhar, N. Ardalani and K. Sankaralingam, “Stream- dataflow acceleration,” 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), 2017, pp. 416-429, doi: 10.1145/3079856.3080255.

R. Prabhakar et al., “Plasticine: A reconfigurable architecture for parallel patterns,” 2017 ACM/IEEE 44th Annual International Sym- posium on Computer Architecture (ISCA), 2017, pp. 389-402, doi: 10.1145/3079856.3080256.

S. A. Chin et al., “CGRA-ME: A unified framework for CGRA modelling and exploration,” 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP), 2017, pp. 184-189, doi: 10.1109/ASAP.2017.7995277.

M. Vilim, A. Rucker and K. Olukotun, “Aurochs: An Architecture for Dataflow Threads,” 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), 2021, pp. 402-415, doi: 10.1109/ISCA52012.2021.00039.

C. Torng, P. Pan, Y. Ou, C. Tan and C. Batten, “Ultra-Elastic CGRAs for Irregular Loop Specialization,” 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2021, pp. 412-425, doi: 10.1109/HPCA51647.2021.00042.

D. Liu et al., “Data-Flow Graph Mapping Optimization for CGRA With Deep Reinforcement Learning,” in IEEE Transactions on Computer- Aided Design of Integrated Circuits and Systems, vol. 38, no. 12, pp. 2271-2283, Dec. 2019, doi: 10.1109/TCAD.2018.2878183.

W. Lu, G. Yan, J. Li, S. Gong, Y. Han and X. Li, “FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks,” 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2017, pp. 553-564, doi: 10.1109/HPCA.2017.29.

O. Akbari, M. Kamal, A. Afzali-Kusha, M. Pedram and M. Shafique, “X-CGRA: An Energy-Efficient Approximate Coarse-Grained Reconfig- urable Architecture,” in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 10, pp. 2558-2571, Oct. 2020, doi: 10.1109/TCAD.2019.2937738.

A Task Parallel Accelerator with Dynamic Pipeline Balancing