Implementation of Numerical Integration to High-Order Elements on the GPUs
This article presents ways to implement a resource-consuming algorithm on hardware with a limited amount of memory, which is the GPU. Numerical integration for higher-order finite element approximation was chosen as an example algorithm. To perform computational tests, we use a non-linear geometric element and solve the convection-diffusion-reaction problem. For calculations, a Tesla K20m graphics card based on Kepler architecture and Radeon r9 280X based on Tahiti XT architecture were used. The results of computational experiments were compared with the theoretical performance of both GPUs, which allowed an assessment of actual performance. Our research gives suggestions for choosing the optimal design of algorithms as well as the right hardware for such a resource-demanding task.
KeywordsGPU, Numerical Integration, Finite Elemet Method, OpenCL, CUDA,
References1. AMD. White paper: AMD Graphics Cores Next (GCN) Architecture, Advanced Micro Devices Inc., Sunnyvale, CA, 2012.
2. K. Banaś, F. Krużel, OpenCL performance portability for Xeon Phi coprocessor and NVIDIA GPUs: A case study of finite element numerical integration, [in:] Euro-Par 2014: Parallel Processing Work-shops, vol. 8806 of Lecture Notes in Computer Science, Springer International Publishing, pp. 158–169, 2014.
3. K. Banaś, F. Krużel, J. Bielański, Optimal kernel design for finite element numerical integration on GPUs, Computing in Science and Engineering, 2019 [in print].
4. K. Banaś, F. Krużel, J. Bielański, K. Chłoń, A comparison of performance tuning process for different generations of NVIDIA GPUs and an example scientific computing algorithm, [in:] Parallel Processing and Applied Mathematics, R. Wyrzykowski, J. Dongarra, E. Deelman, K. Karczewski [Eds], Springer International Publishing, pp. 232–242, 2018.
5. E. Becker, G. Carey, J. Oden, Finite Elements. An Introduction, Prentice Hall, 1981.
6. L. Buatois, G. Caumon, B. Levy, Concurrent number cruncher: AGPU implementation of a general sparse linear solver, International Journal of Parallel, Emergent and Distributed Systems, 24(3): 205–223, 2009.
7. P. Ciarlet, The finite element method for elliptic problems, North-Holland, Amsterdam, 1978.
8. P.K. Das, G.C. Deka, History and evolution of GPU architecture, Emerging Research Surrounding Power Consumption and Performance Issues in Utility Computing, pp. 109– 135, 2016.
9. M. Geveler, D. Ribbrock, D. Göddeke, P. Zajac, S. Turek, Towards a complete FEM- based simulation toolkit on GPUs: Unstructured grid finite element geometric multigrid solvers with strong smoothers based on sparse approximate inverses, Computers & Fluids, 80: 327–332, 2013 (Part of Special Issue: Selected contributions of the 23rd International Conference on Parallel Fluid Dynamics ParCFD2011).
10. D. Göddeke, H. Wobker, R. Strzodka, J. Mohd-Yusof, P. McCormick, S. Turek, Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU, International Journal of Computational Science and Engineering, 4(4): 254–269, 2009.
11. C. Johnson, Numerical solution of partial differential equations by the finite element method, Cambridge University Press, 1987.
12. F. Krużel, K. Banaś, Finite element numerical integration on PowerXCell processors, [in:] PPAM’09: Proceedings of the 8th International Conference on Parallel Processing and Applied Mathematics, Springer-Verlag, pp. 517–524, 2010.
13. F. Krużel, K. Banaś, Vectorized OpenCL implementation of numerical integration for higher order finite elements, Computers and Mathematics with Applications, 66(10): 2030–2044, 2013.
14. F. Krużel, K. Banaś, Finite element numerical integration on Xeon Phi coprocessor, [in:] Proceedings of the 2014 Federated Conference on Computer Science and Information Systems, Warsaw, Poland, M.P.M. Ganzha, L. Maciaszek [Eds], vol. 2 of Annals of Computer Science and Information Systems, IEEE, pp. 603–612, 2014.
15. F. Krużel K. Banaś, AMD APU systems as a platform for scientific computing, Computer Methods in Materials Science, 15(2): 362–369, 2015.
16. F. Krużel, Vectorized implementation of the FEM numerical integration algorithm on a modern CPU, [in:] Proceedings of the 33rd International ECMS Conference on Modelling and Simulation: ECMS 2019, 11–14 June 2019, Caserta, Italy, 33(1): 414–420, 2019.
17. J. Mamza, P. Makyla, A. Dziekoński, A. Lamecki, M. Mrozowski, Multi-core and multi- processor implementation of numerical integration in Finite Element Method, [in:] 2012 19th International Conference on Microwave Radar and Wireless Communications, vol. 2, pp. 457–461, 2012.
18. Nvidia Corporation, NVIDIAs Next Generation CUDA Compute Architecture: Kepler GK110, Whitepaper, 2012.
19. Nvidia Corporation, Profiler User’s Guide, 2015.
20. R. Smith, AMD Radeon HD 7970 Review: 28nm and Graphics Core Next, Together As One, AnandTech, 2011, retrieved from https://www.anandtech.com/show/5261/amd- radeon-hd-7970-review on 12.09.2019
21. P. Šolín, K. Segeth, I. Doležel, Higher-order finite element methods, Chapman & Hall/CRC, 2004.
22. S. Williams, A. Waterman, D. Patterson, Roofline: An insightful visual performance model for multicore architectures, Communications in the ACM, 52(4): 65–76, 2009.