New Chinese algorithm increases the speed of Nvidia GPUs by 800 times
Kyiv • UNN
Chinese scientists have created the PD-General algorithm for Nvidia GPUs, which speeds up calculations by 800 times. The technology makes it possible to perform complex material calculations on conventional GPUs in minutes instead of days.

The new algorithm, developed by a team led by Associate Professor Yang Yang, uses the theory of peridynamics (PD), which is used to model fractures and structural damage.
Reported by UNN with reference to South China Morning Post.
Chinese researchers have developed a high-performance algorithm that can solve complex material design problems on consumer GPUs, achieving a revolutionary 800-fold speedup over traditional methods.
The new algorithm improves the computational efficiency of peridynamics (PD), an advanced, nonlocal theory that solves complex physical problems such as cracks, damage, and faults. This opens up new possibilities for solving complex mechanical problems in a variety of industries, including aerospace and military, on widely available chips that are inexpensive
Up to 800x faster performance on NVIDIA GPUs
Peridynamics has proven to be superior in modeling, but its high computational complexity has traditionally made large-scale simulations inefficient. This was due to factors such as high memory usage and slow processing speed.
To address these challenges, the development team, led by Associate Professor Yang Yang, used and utilized Nvidia's CUDA programming technology to create the PD-General structure. After conducting an in-depth analysis of the chip's unique structure, the team optimized the algorithm design and memory management, resulting in significant performance improvements. Their research was published in the Chinese Journal of Computational Mechanics on January 8.
This efficient computing power allows researchers to reduce computations that typically take days to a few hours or even minutes using a regular home GPU, a significant advancement for PD research
In tests conducted with the NVIDIA RTX 4070 GPU, PD-General achieved 800x speedup compared to traditional serial programs and 100x faster than OpenMP-based parallel programs.
In a large-scale simulation involving millions of particles, the algorithm performed 4000 iterative steps in just five minutes.
For large-scale two-dimensional uniaxial tensile problems, it processed 69.85 million iterations in less than two minutes with single-axis accuracy.
Appropriate progress should dramatically reduce research costs in areas that depend on modeling complex materials, democratizing access to this type of research and accelerating the development of new technologies.