ID 6: Accelerating computation of reduced order model of a structural system using GPU programming – P. Gorecki, M. Kalinowski, Ł. Jeziorek, J. Broniszewski, T. Koziara
ID 17: Parallel optimization of automotive shock absorber – P. Sebastjan, W. Kuś
ID 33: Parallel approach to the design of nanostructures – W. Kuś
ID 37: Realtime Operational Load Monitoring of a composite aerostructure using FPGAbased computing system – W. Mucha
ID 223: Matrixfree solver for fluidstructure interaction problems in ALE formulation – M. Wichrowski, P. Krzyżanowski, S. Stupkiewicz, L. Heltai
ID 277: Cellular automata based multiscale simulations on low power microcomputers in edge architecture – P. Hajder, Ł. Rauch
ID 6  
Accelerating computation of reduced order model of a structural system using gpu programming  
Piotr Gorecki^{1}, Miłosz Kalinowski^{2}, Łukasz Jeziorek^{2}, Jakub Broniszewski^{1}, Tomasz Koziara^{2}  
^{1} General Electric Poland, Poland ^{2} Warsaw Institute of Aviation, Poland 

piotr.mi.gorecki@gmail.com  
CraigBampton (CB) method is a wellknown substructuring technique that reduces the size of a finite element model (FEM) using a set of vibration modes, where two or more subsystems are connected. The reduction process could be computationally expensive since it requires algebra operations on FEM mode shapes and FEM system sparse matrices. In this paper, we investigate the potential of GPU parallel processing to speedup the building process of CB reduced order model. A Python based highlevel approach employing the CuPy library on the GPU is compared with the CPU reference implementation using SciPy library, as well as an optimized Fortran code. In a sidetoside comparisons employing the same inputs the PythonGPU code is run on a single GPU device and the PythonCPU and the Fortran codes are run on a multicore compute node. The CB reduction process was split into several parts, each dealing with different kind of an analytical problem, such as sparse generalized eigenvalue problem, sparse system linear solver, sparse/dense matrix multiplications and modal assurance criterion (MAC) computation. Performance metrics of each problem part are compared in terms of the relative compute times for different problem sizes among all of the implemented approaches. 
ID 17  
Parallel optimization of automotive shock absorber  
Przemysław Sebastjan^{1}, Wacław Kuś^{1}  
^{1} Faculty of Mechanical Engineering, Silesian University of Technology, Poland 

przemyslaw.sebastjan@polsl.pl  
The paper aims to present the method of selecting the number of processing units for the constrained, structural optimization problem. The optimization of the automotive component uses the Finite Element Method (FEM) results for computing the values of the objective function. The FEM analyses are the most timeconsuming part of the optimization, so the goal is to achieve the most efficient parallelization of the FEMbased calculations. The hybrid method of optimization is used, combining both  gradientbased and evolutionary methods [1]. During each iteration, the objective function can be computed in parallel, depending on the number of individuals in the evolutionary algorithm and available computation units. The parallelization of the optimization is realized on two levels  on the optimization algorithm level and the direct FEMbased solution level [2]. The numerical example of the optimization is the automotive shock absorber subjected to excessive compression loads, taking into account nonlinearities resulting from its unstable behavior [3]. Such loads can occur during misuse events at the vehicle level, like going through the curb with high velocity, therefore special care must be taken to create a sufficiently strong design to withstand such impacts. The optimization aims to reduce the mass of the forged bracket that supports the shock absorber and connects it with the control arm. At the same time, the minimum buckling force is constrained, to assure the fulfillment of vehicle strength requirements. The paper presents the results of the tests performed on a different amount of computing resources, based on which the method of optimal resources management was formulated. The research presented in this paper was cofinanced under the grant no DWD/3/7/2019 supported by the Ministry of Science and Higher Education in Poland and research subsidy of the Mechanical Engineering Faculty, Silesian University of Technology. References [1] Sebastjan, P. and Kuś, W., 2022. Hybrid shape optimization of automotive spring seat. International Journal of Automotive Technology, in press. [2] Burczyński, T., Kuś, W., Beluch, W., Długosz, A., Poteralski, A. and Szczepanik, M., 2020. Intelligent Computing in Optimal Design. Springer International Publishing. [3] Sebastjan, P. and Kuś, W., 2021. Optimization of material distribution for forged automotive components using hybrid optimization techniques. Computer Methods in Material Science, 21(2). 
ID 33  
Parallel approach to the design of nanostructures  
Wacław Kuś^{1}  
^{1} Department of Computational Mechanics and Engineering, Silesian University of Technology, Poland 

waclaw.kus@polsl.pl  
The goal of the paper is to present a method of design of 2D nanostructure with a priori material properties defined by the designer and the importance of parallel computing in the process. The method is based on global, bioinspired, optimization algorithm and molecular dynamics (MD) for direct problem solving. Bioinspired algorithms operate on a pool of individuals in each iteration and can be easily parallelized [1]. The objective function for each individual is computed on the basis of direct problem solution for the 2D structure. The problem of stretching the nanostructure for the purpose of obtaining the stressstrain relation is used in the approach and requires a lot of computational effort. Fortunately, the MD algorithm can also be parallelized, allowing two levels of parallelization. The paper presents algorithm, optimization problem formulation, and numerical examples. The test results were obtained using two supercomputers, Cray XC40 Okeanos and HPE Apollo 2000 Karolina. Benchmark tests with speedups and parallelization efficiency are presented in the paper. Numerical tests are conducted for the 2D MoS2 [2] nanostructure and the LAMMPS [3] software is used for MD analyzes. [1] T. Burczyński, W. Kuś, W. Beluch, A. Długosz, A. Poteralski, M. Szczepanik, Intelligent Computing in Optimal Design. Solid Mechanics and Its Applications, Springer, 2020. [2] J. A. Akhter, W. Kuś, A. Mrozek, T. Burczyński, 2020, Mechanical Properties of Monolayer MoS 2 with Randomly Distributed Defects, Materials 1307 113, https://doi.org/10.3390/ma13061307. [3] A. P. Thompson, H. M. Aktulga, R. Berger, D. S. Bolintineanu, W. M. Brown, P. S. Crozier, P. J. in 't Veld, A. Kohlmeyer, S. G. Moore, T. D. Nguyen, R. Shan, M. J. Stevens, J. Tranchida, C. Trott, S. J. Plimpton, 2022, LAMMPS  a flexible simulation tool for particlebased materials modeling at the atomic, meso, and continuum scales, Comp Phys Comm 271 10817, https://doi.org/10.1016/j.cpc.2021.108171. The work is partially financed from research subsidy of the Mechanical Engineering Faculty, Silesian University of Technology. Optimizations were performed in part at the Interdisciplinary Centre for Mathematical and Computational Modelling at the University of Warsaw under grant GB8016. We acknowledge EuroHPC JU for awarding us access to supercomputer Karolina hosted by IT4Innovations, National Supercomputing Center, Ostrava, Czechia. 
ID 37  
Realtime operational load monitoring of a composite aerostructure using fpgabased computing system  
Waldemar Mucha^{1}  
^{1} Department of Computational Mechanics and Engineering, Silesian University of Technology, Poland 

waldemar.mucha@polsl.pl  
Operational Load Monitoring is an industrial process that involves measuring and registering the number and character of load cycles that a structure has withstood in its operating environment. The purpose is to estimate the remaining inservice life of the structure, taking into account its fatigue failure. This process is most of all applied in aerospace industry to monitor the usage of aircraft structures. Operational Load Monitoring is a sensorbased technique where usually strain sensors are utilized. In order to reduce the number of required sensors, artificial intelligence techniques can be implemented. If all possible load cases of the structure are identified, a prediction model can be trained to estimate the current state of the structure based on relatively low number of sensor measurements. A cyberphysical system for realtime Operational Load Monitoring has been proposed. The system is based on modern realtime multicore microcontroller equipped with FPGA (fieldprogrammable gate array). The advantages of using FPGAs are true parallel data processing (e.g. dozens of independent circuits running simultaneously), high efficiency and reliability. In the presented work an artificial intelligencebased prediction model for Operational Load Monitoring of an example composite aerostructure (lightweight structure for aerospace applications) was built based on numerically generated reference data. The model was then deployed to the microcontroller where it was tested in series of realtime experiments. During the experiments, the microcontroller was acquiring data from strain gauges mounted to the structure and predicting the current load conditions of the structure in realtime. Measured efficiency of those predictions is presented. 
ID 223  
Matrixfree solver for fluidstructure interaction problems in ale formulation  
Michal Wichrowski^{1}, Piotr Krzyżanowski^{2}, Stanisław Stupkiewicz^{3}, Luca Heltai^{4}  
^{1} Universität Heidelberg, Germany ^{2} Department of Mathematics and Informatics, Institute of Applied Mathematics,, Poland ^{3} Institute of Fundamental Technological Research, Polish Academy of Sciences, Poland ^{4} International School for Advanced Studies, Italy 

mtwichrowski@gmail.com  
The solution of fluidstructure interaction (FSI) problems is required in vast applications, ranging from the study of microscale biomechanics to the design of offshore platforms. The complexity of phenomena requires accurate computational methods, resulting in growing problem size. The resulting problems may involve billions of unknowns which can only be handled on massively parallel, distributed memory supercomputers. For handling such largescale problems special algorithms are required. In this work, we develop a new method of solving timedependent FSI problems using the Finite Element Method in Arbitrary LagrangianEulerian frame of reference. We derive the monolithic predictorcorrector time integration scheme by adopting the GeometryConvective Explicit scheme for the problem involving interaction between incompressible hyperelastic solid and incompressible fluid. To improve the conservation of the volume of solid we modify the mass conservation equation by introducing volumetric stabilization. The proposed algorithm consists of several substeps at each time step. Among them, the most timeconsuming is the solution of the generalized Stokes problem with discontinuous variable coefficients. This has to be done one or two times per each time step, depending on the variant of the predictorcorrector scheme. We introduce a new multilevel preconditioner, that is robust with respect to both problem size and coefficient jumps. We test our implementation on the TurekHron benchmark problem. The design of all building blocks of the solver is matrixfree, allowing both speed and memory optimizations. The implementation, based on deal.II library, supports parallel computations in both 2D and 3D. 
ID 277  
Cellular automata based multiscale simulations on low power microcomputers in edge architecture  
Piotr Hajder^{1}, Lukasz Rauch^{1}  
^{1} Applied Computer Science and Modelling, AGH University of Science and Technology, Poland 

lrauch@agh.edu.pl  
Numerical computations are usually associated with the High Performance Computing. Nevertheless, both industry and science tend to involve devices with lower power in computations. This is especially true when the data collecting devices are able to partially process them at place, thus increasing the system reliability. This paradigm is known as Edge Computing. In this paper, we propose the use of devices at the edge, with lower computing power, for multiscale modelling calculations. A system was created, consisting of a highpower device  a twoprocessor workstation, 8 RaspberryPi 4B microcomputers and 8 NVidia Jetson Nano units,7 equipped with GPU processor. As a part of this research benchmarking was performed, on the basis of which the computational capabilities of the devices were classified. Two parameters were considered: the number and performance of computing units (CPUs and GPUs) and the energy consumption of the loaded machines. Then, using the calculated weak scalability and energy consumption, a minmaxbased load optimization algorithm was proposed. The system was tested in laboratory conditions, giving similar computation time with same power consumption for 24 physical workstation cores vs 8x RaspberryPi 4B and 8x Jetson Nano. The work ends with a proposal to use this solution in industrial processes on example of hot rolling of flat products. 
Top 