S10 Parallel computing [download abstracts]

List of abstracts

ID 6: Accelerating computation of reduced order model of a structural system using GPU programming P. Gorecki, M. Kalinowski, Ł. Jeziorek, J. Broniszewski, T. Koziara

ID 17: Parallel optimization of automotive shock absorberP. Sebastjan, W. Kuś

ID 33: Parallel approach to the design of nanostructures W. Kuś

ID 37: Real-time Operational Load Monitoring of a composite aerostructure using FPGA-based computing systemW. Mucha

ID 223: Matrix-free solver for fluid-structure interaction problems in ALE formulationM. Wichrowski, P. Krzyżanowski, S. Stupkiewicz, L. Heltai

ID 277: Cellular automata based multiscale simulations on low power microcomputers in edge architectureP. Hajder, Ł. Rauch


ID 6

Accelerating computation of reduced order model of a structural system using gpu programming

Piotr Gorecki1, Miłosz Kalinowski2, Łukasz Jeziorek2, Jakub Broniszewski1, Tomasz Koziara2

1 General Electric Poland, Poland
2 Warsaw Institute of Aviation, Poland


Craig-Bampton (CB) method is a well-known substructuring technique that reduces the size of a finite element model (FEM) using a set of vibration modes, where two or more subsystems are connected. The reduction process could be computationally expensive since it requires algebra operations on FEM mode shapes and FEM system sparse matrices.
In this paper, we investigate the potential of GPU parallel processing to speed-up the building process of CB reduced order model. A Python based high-level approach employing the CuPy library on the GPU is compared with the CPU reference implementation using SciPy library, as well as an optimized Fortran code. In a side-to-side comparisons employing the same inputs the Python-GPU code is run on a single GPU device and the Python-CPU and the Fortran codes are run on a multi-core compute node. The CB reduction process was split into several parts, each dealing with different kind of an analytical problem, such as sparse generalized eigenvalue problem, sparse system linear solver, sparse/dense matrix multiplications and modal assurance criterion (MAC) computation. Performance metrics of each problem part are compared in terms of the relative compute times for different problem sizes among all of the implemented approaches.

ID 17

Parallel optimization of automotive shock absorber

Przemysław Sebastjan1, Wacław Kuś1

1 Faculty of Mechanical Engineering, Silesian University of Technology, Poland


The paper aims to present the method of selecting the number of processing units for the constrained, structural optimization problem. The optimization of the automotive component uses the Finite Element Method (FEM) results for computing the values of the objective function. The FEM analyses are the most time-consuming part of the optimization, so the goal is to achieve the most efficient parallelization of the FEM-based calculations. The hybrid method of optimization is used, combining both - gradient-based and evolutionary methods [1]. During each iteration, the objective function can be computed in parallel, depending on the number of individuals in the evolutionary algorithm and available computation units. The parallelization of the optimization is realized on two levels - on the optimization algorithm level and the direct FEM-based solution level [2].
The numerical example of the optimization is the automotive shock absorber subjected to excessive compression loads, taking into account nonlinearities resulting from its unstable behavior [3]. Such loads can occur during misuse events at the vehicle level, like going through the curb with high velocity, therefore special care must be taken to create a sufficiently strong design to withstand such impacts. The optimization aims to reduce the mass of the forged bracket that supports the shock absorber and connects it with the control arm. At the same time, the minimum buckling force is constrained, to assure the fulfillment of vehicle strength requirements. The paper presents the results of the tests performed on a different amount of computing resources, based on which the method of optimal resources management was formulated.

The research presented in this paper was co-financed under the grant no DWD/3/7/2019 supported by the Ministry of Science and Higher Education in Poland and research subsidy of the Mechanical Engineering Faculty, Silesian University of Technology.

[1] Sebastjan, P. and Kuś, W., 2022. Hybrid shape optimization of automotive spring seat. International Journal of Automotive Technology, in press.
[2] Burczyński, T., Kuś, W., Beluch, W., Długosz, A., Poteralski, A. and Szczepanik, M., 2020. Intelligent Computing in Optimal Design. Springer International Publishing.
[3] Sebastjan, P. and Kuś, W., 2021. Optimization of material distribution for forged automotive components using hybrid optimization techniques. Computer Methods in Material Science, 21(2).

ID 33

Parallel approach to the design of nanostructures

Wacław Kuś1

1 Department of Computational Mechanics and Engineering, Silesian University of Technology, Poland


The goal of the paper is to present a method of design of 2D nanostructure with a priori material properties defined by the designer and the importance of parallel computing in the process. The method is based on global, bioinspired, optimization algorithm and molecular dynamics (MD) for direct problem solving. Bioinspired algorithms operate on a pool of individuals in each iteration and can be easily parallelized [1]. The objective function for each individual is computed on the basis of direct problem solution for the 2D structure. The problem of stretching the nanostructure for the purpose of obtaining the stress-strain relation is used in the approach and requires a lot of computational effort. Fortunately, the MD algorithm can also be parallelized, allowing two levels of parallelization. The paper presents algorithm, optimization problem formulation, and numerical examples. The test results were obtained using two supercomputers, Cray XC40 Okeanos and HPE Apollo 2000 Karolina. Benchmark tests with speedups and parallelization efficiency are presented in the paper. Numerical tests are conducted for the 2D MoS2 [2] nanostructure and the LAMMPS [3] software is used for MD analyzes.

[1] T. Burczyński, W. Kuś, W. Beluch, A. Długosz, A. Poteralski, M. Szczepanik, Intelligent Computing in Optimal Design. Solid Mechanics and Its Applications, Springer, 2020.
[2] J. A. Akhter, W. Kuś, A. Mrozek, T. Burczyński, 2020, Mechanical Properties of Monolayer MoS 2 with Randomly Distributed Defects, Materials 1307 1-13, https://doi.org/10.3390/ma13061307.
[3] A. P. Thompson, H. M. Aktulga, R. Berger, D. S. Bolintineanu, W. M. Brown, P. S. Crozier, P. J. in 't Veld, A. Kohlmeyer, S. G. Moore, T. D. Nguyen, R. Shan, M. J. Stevens, J. Tranchida, C. Trott, S. J. Plimpton, 2022, LAMMPS - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales, Comp Phys Comm 271 10817, https://doi.org/10.1016/j.cpc.2021.108171.

The work is partially financed from research subsidy of the Mechanical Engineering Faculty, Silesian University of Technology.

Optimizations were performed in part at the Interdisciplinary Centre for Mathematical and Computational Modelling at the University of Warsaw under grant GB80-16. We acknowledge EuroHPC JU for awarding us access to supercomputer Karolina hosted by IT4Innovations, National Supercomputing Center, Ostrava, Czechia.

ID 37

Real-time operational load monitoring of a composite aerostructure using fpga-based computing system

Waldemar Mucha1

1 Department of Computational Mechanics and Engineering, Silesian University of Technology, Poland


Operational Load Monitoring is an industrial process that involves measuring and registering the number and character of load cycles that a structure has withstood in its operating environment. The purpose is to estimate the remaining in-service life of the structure, taking into account its fatigue failure. This process is most of all applied in aerospace industry to monitor the usage of aircraft structures. Operational Load Monitoring is a sensor-based technique where usually strain sensors are utilized. In order to reduce the number of required sensors, artificial intelligence techniques can be implemented. If all possible load cases of the structure are identified, a prediction model can be trained to estimate the current state of the structure based on relatively low number of sensor measurements. A cyber-physical system for real-time Operational Load Monitoring has been proposed. The system is based on modern real-time multicore microcontroller equipped with FPGA (field-programmable gate array). The advantages of using FPGAs are true parallel data processing (e.g. dozens of independent circuits running simultaneously), high efficiency and reliability. In the presented work an artificial intelligence-based prediction model for Operational Load Monitoring of an example composite aerostructure (lightweight structure for aerospace applications) was built based on numerically generated reference data. The model was then deployed to the microcontroller where it was tested in series of real-time experiments. During the experiments, the microcontroller was acquiring data from strain gauges mounted to the structure and predicting the current load conditions of the structure in real-time. Measured efficiency of those predictions is presented.

ID 223

Matrix-free solver for fluid-structure interaction problems in ale formulation

Michal Wichrowski1, Piotr Krzyżanowski2, Stanisław Stupkiewicz3, Luca Heltai4

1 Universität Heidelberg, Germany
2 Department of Mathematics and Informatics, Institute of Applied Mathematics,, Poland
3 Institute of Fundamental Technological Research, Polish Academy of Sciences, Poland
4 International School for Advanced Studies, Italy


The solution of fluid-structure interaction (FSI) problems is required in vast applications, ranging from the study of micro-scale biomechanics to the design of offshore platforms. The complexity of phenomena requires accurate computational methods, resulting in growing problem size. The resulting problems may involve billions of unknowns which can only be handled on massively parallel, distributed memory supercomputers. For handling such large-scale problems special algorithms are required. In this work, we develop a new method of solving time-dependent FSI problems using the Finite Element Method in Arbitrary Lagrangian-Eulerian frame of reference. We derive the monolithic predictor-corrector time integration scheme by adopting the Geometry-Convective Explicit scheme for the problem involving interaction between incompressible hyperelastic solid and incompressible fluid. To improve the conservation of the volume of solid we modify the mass conservation equation by introducing volumetric stabilization. The proposed algorithm consists of several sub-steps at each time step. Among them, the most time-consuming is the solution of the generalized Stokes problem with discontinuous variable coefficients. This has to be done one or two times per each time step, depending on the variant of the predictor-corrector scheme. We introduce a new multilevel preconditioner, that is robust with respect to both problem size and coefficient jumps. We test our implementation on the Turek-Hron benchmark problem. The design of all building blocks of the solver is matrix-free, allowing both speed and memory optimizations. The implementation, based on deal.II library, supports parallel computations in both 2D and 3D.

ID 277

Cellular automata based multiscale simulations on low power microcomputers in edge architecture

Piotr Hajder1, Lukasz Rauch1

1 Applied Computer Science and Modelling, AGH University of Science and Technology, Poland


Numerical computations are usually associated with the High Performance Computing. Nevertheless, both industry and science tend to involve devices with lower power in computations. This is especially true when the data collecting devices are able to partially process them at place, thus increasing the system reliability. This paradigm is known as Edge Computing. In this paper, we propose the use of devices at the edge, with lower computing power, for multiscale modelling calculations. A system was created, consisting of a high-power device - a two-processor workstation, 8 RaspberryPi 4B microcomputers and 8 NVidia Jetson Nano units,7 equipped with GPU processor. As a part of this research benchmarking was performed, on the basis of which the computational capabilities of the devices were classified. Two parameters were considered: the number and performance of computing units (CPUs and GPUs) and the energy consumption of the loaded machines. Then, using the calculated weak scalability and energy consumption, a min-max-based load optimization algorithm was proposed. The system was tested in laboratory conditions, giving similar computation time with same power consumption for 24 physical workstation cores vs 8x RaspberryPi 4B and 8x Jetson Nano. The work ends with a proposal to use this solution in industrial processes on example of hot rolling of flat products.