The model runs, but the simulation is slow? Dymola warns of stiffness? This blog explains what stiff equation systems are, how to recognize them and which approaches can be used to make the simulation more stable and efficient.
ChatGPT
The simulation model is set up correctly, the simulation is running, but it is calculating much slower than expected? Is Dymola displaying the warning “Probably the communication interval is too large or the system is stiff”? One reason for this could be a stiff system of equations. What does that mean? How can I find the reasons for the stiffness? And how can I make my simulation run more efficiently? You can find out all this in this blog post!
If changes occur simultaneously on very fast and very slow time scales in a dynamic simulation model, this is referred to as stiffness or stiff systems. Such systems pose problems for solution algorithms, slow down the simulation, and can even lead to incorrect results or the simulation being terminated.
Time constants are a measure of the rate of change of states in our simulation model. We have covered everything related to time constants in our blog article on time constants. In the following, we will build on this and explain how time constants can be used to determine the stiffness of systems and solve the problems that arise as a result.
As an example, let us consider a simulation model consisting of two thermal RC elements. A temperature source with periodically fluctuating temperature is connected to each thermal resistance R with a heat capacity C. The resistance and capacity of the first RC element are large compared to the second RC element. The second RC element therefore has a much smaller capacity with a smaller resistance.
A comparable system would be, for example, a sauna that we want to heat up with a stove. The small heat capacity is represented by a metal stove surround that is directly connected to the temperature source—i.e., the stove—via a small thermal resistor. The large capacity is the concrete floor, which heats up slowly and has a significantly higher thermal resistance because it is not directly connected to the stove. We will refer back to this example repeatedly throughout the article.
As mentioned in the blog post on time constants, in this simple system, the time constant can be calculated as the product of the capacitance C and the resistance R (time constant: \(\tau\) = RC).
So if we have a very small and a very large resistance, this leads to a very small and a very large time constant and thus to both fast and slow dynamics in the system.
What happens when we simulate this system in Dymola?
The aim of the simulation is to find out how long it takes for a steady state to be reached in the system. The stiff system is therefore simulated until both heat capacities have reached stable temperatures. Since the source temperature fluctuates between 75 °C and 80 °C, we simulate until both capacities, which start at 25 °C, reach a temperature above 77 °C for the first time.
The simulation with the CVode solver in the Dymola simulation software shows that the capacities reached the limit temperature of 77 °C after approximately 646 simulated hours. The solver requires approximately 2.9 seconds for this simulation and performs 1,578,812 simulation steps.
This shows us one of the major disadvantages of stiff systems: they are very computationally intensive and therefore very slow.
Before we get to debugging stiff systems, we first need to shed some light on a few background details. The basis for this is the following state space representation of a linear differential equation system:
$$\frac{dx}{dt} = Ax + Bu$$
The system matrix A is particularly interesting for us here. It indicates how the state variables x change over time without the influence of external factors u. This is also referred to as intrinsic dynamics.
In linear equation systems, system matrix A consists exclusively of constant entries.
The state variables are the unknowns in a system of equations from which all other outputs can be calculated.
The main task of a solver in a simulation is to determine the state variables x in the next time step.
System matrix A can be used to derive valuable information about system behavior. To do this, it is necessary to look at the eigenvalues of this matrix. Eigenvalues are a scalar characteristic of matrices that are arranged in the complex number plane.
In principle, key information about system behavior can be derived from the eigenvalues of the system matrix A. This applies to the following three areas:
Overall, we are talking here about the intrinsic motion of the system.
A system of equations is considered stable if, as time progresses, all state variables tend toward a stable final value. If a state variable grows infinitely, this is referred to as instability.
We can determine stability based on the real part of an eigenvalue: if the real parts of all eigenvalues are negative, then the system is stable. If the real part of at least one eigenvalue is positive, then the system is unstable.
Information about the oscillation behavior of the system can be derived from the imaginary part of the eigenvalues. If there is no imaginary part (i.e., the eigenvalues lie on the Re axis), this means that the system does not oscillate. If, on the other hand, the eigenvalues lie in the imaginary range, this means that oscillations occur in the system.
Finally, the eigenvalues provide information about the time scales in the system. The time constant is a measure of the rates of change of states.
In a non-oscillating system, the time constant corresponds to the reciprocal of the negative real part of the eigenvalues.
In an oscillating system (i.e., a system with an imaginary part), the negative reciprocal of the real part is referred to as damping and provides information about the time scale of the oscillation decay. The angular frequency of the oscillation corresponds to the imaginary part of the eigenvalue and can be easily converted to the oscillation frequency by dividing by 2π.
It is important to note that these derivations of system behavior only apply to linear systems, as only then is the system matrix constant over the entire simulation duration.
In non-linear systems of equations, the system matrix A varies over time, which can lead to variable time constants, oscillations, and stability issues.
If you want to analyze the system behavior of nonlinear systems using eigenvalues, this is still possible: For individual points in time, the respective system can be linearized and then analyzed accordingly at that moment.
But how do we infer stiffness from this information?
There is no standard definition of stiffness. This article examines two different definitions of stiffness: one refers only to the system of equations itself, and the other to the interaction between the system of equations and the solution algorithm.
We have already explained the first definition using the introductory example and the mathematical background. A system is considered stiff if the order of magnitude of the minimum and maximum time constants \(\tau\) varies greatly. Basically, it can be said that:
The greater the variation in the magnitude of the time constants, the stiffer the system.
This definition does not provide a clear distinction as to when a system is stiff. Mathematically speaking, this means:
$$\frac{\tau_{Max}}{\tau_{Min}} >>1$$
The stiffness therefore depends solely on the system of equations.
We now know how statements about system behavior can be made by interpreting the system matrix A. Returning to our example and comparing the orders of magnitude of the time constants, we see the following:
$$\frac{\tau_{Max}}{\tau_{Min}}=\frac{500.000}{1}>>1$$
The magnitude of the time constants in the example therefore varies greatly. According to the first definition, this is a stiff system. The following can be said in principle:
The maximum time constant determines how long the slow capacity takes to heat up and thus the simulation duration. The minimum time constant determines the required step size of the solver for correct results. A large quotient of the two extends the simulation time.
In the case of such a simple system, it is still possible to check the time constants and thus the stiffness itself – but what do you do with significantly more complex simulation models with many states and time constants?
Dymola offers a handy debugging tool: Full Linear Analysis.
The debugging tool for full linear analysis can be found in Dymola under the Tools tab under Linear Analysis.
First, we perform the analysis using the default settings for our example system. A pop-up window now displays the eigenvalues of our system in the complex state space:
Eigenvalue analysis of our example system:
Additional information is provided in the Commands window. This window displays a breakdown of our system into the aforementioned state space representation, a statement about system stability, and an eigenvalue analysis—in other words, all the information we can extract from system matrix A. For our example system, the eigenvalue analysis looks as follows:
We have two eigenvalues, each of which can be assigned 100 % to the respective temperature states of the capacitances. Both eigenvalues have a negative real part (meaning the system is stable) and no imaginary part (meaning the system does not oscillate). In addition, the time constants have values of 1 s and 500,000 s, which corresponds to a quotient of 500,000. The fast capacitance therefore changes its temperature within seconds, while the slow capacitance requires several days for a comparable change – this is therefore a good example of the stiffness of a system.
Note: So-called “non-linear” systems in Dymola do not necessarily correspond to non-linear systems of equations in the mathematical sense.
The question now is how to increase the computational efficiency of such a system. One approach is to reduce the stiffness—in other words, we need to reduce the quotient of the time constants.
There are two options for doing this:
Both adjustments would lead to a change in the physical behavior of the system, but:
A reduction in the large resistance (and thus also in the large time constant) significantly affects the simulation time to reach a steady state (our simulation result). The situation is different with the small resistance: due to its fast dynamics, the temperature of the fast capacitance remains almost identical to the source temperature (e.g., oven). A moderate increase in the small resistance slows down the coupling, but only minimally changes the overall behavior. Therefore, this adjustment is well suited for reducing stiffness.
Solution: The small resistance is increased by a factor of 20 to \(R_{After}=0.2 K/W\)..
The simulation is performed with the same settings and yields the following result: After another 646 hours, both capacities reach the target temperature. The solver requires approximately 1.4 seconds and 585,293 steps to do so. The simulation time has thus been reduced by about half, while the number of steps is less than a third.
The simulation has therefore become significantly more efficient while producing virtually the same results.
A look at the full linear analysis shows that the quotient of the time constants in this new case is only 25,000, which means it has also been reduced by a factor of 20. While these relationships are still quite clear in such a simple system, the eigenvalues in complex simulation models can be much more difficult to calculate, which makes full linear analysis a useful tool for debugging simulation models.
As we know, eigenvalue analysis is only possible for linear systems, since the system matrix in non-linear systems changes over time. However, Dymola's full linear analysis also offers the option of linearizing non-linear systems at a specific point in time and then analyzing them. To do this, select the time point for linearization in the “simulationSetup” tab. The analysis is then performed and the eigenvalues can be interpreted.
If you want to know more, read our blog article about nonlinear systems and debugging in Dymola.
However, the equation system is not the only factor that can influence the stiffness of a simulation. The interaction between the solver and the equation system can also lead to stiff systems. This brings us to the second definition of stiffness. But first, let's take a closer look at what solvers (solution algorithms) actually do.
In numerical simulations, the state variables of a system are not calculated directly, but rather integrated over time. Special solution algorithms, known as numerical solvers, are used for this purpose. They determine the states in the next time step by solving the underlying differential equations, which is often done using iterative solutions to the systems of equations.
One of the fundamental differences between various solvers concerns the step size: some algorithms work with a fixed step size, while others adjust it dynamically to optimize accuracy and efficiency. Solvers with a fixed step size in particular can quickly reach their limits in stiff systems—our next example clearly illustrates why this is the case.
Let's look at our example model with two parallel thermal RC elements. Based on what we have learned so far in this blog article, this system is considered stiff but stable, as the eigenvalues only have negative real parts. Now let's simulate the system using the explicit Euler method with a fixed step size of 3 s.
Contrary to expectations, the system appears to be unstable. The temperature of the first capacitance fluctuates wildly to completely unrealistic values, even though we would not expect any fluctuations based on the eigenvalues, as there is no imaginary part.
The problem arises from the interaction between the solver and time constants, which causes the system to become numerically unstable. The solver's fixed step size of 3 s exceeds the smallest time constant of 1 s. As a result, the solver cannot accurately capture the fast dynamics of the system, leading to inaccurate or even completely erroneous results.
To address this issue, we change the solver step size to, for example, 0.5 s, which is less than the smallest time constant. This adjustment gives us the expected stable result.
The stability of a solution algorithm therefore depends on the interaction between step size and eigenvalues and varies from solver to solver.
To graphically represent the stability of Solver, so-called stability regions are defined.
Stability regions are arranged in a complex plane and show the stability range of a solver depending on the eigenvalues and step sizes. For this purpose, the eigenvalues λ are multiplied by the step size h and entered into a stability diagram. A solver is considered stable for a system of equations if all eigenvalues λ are within the solver's stability region after scaling with h. Let's look at the example from earlier. The stability region of the explicit Euler method is a circle with a radius of 1 around the point -1 on the real axis.
In the first case, the step size h = 3 s and the real parts of the eigenvalues are -1 and -2 e-6, while the imaginary parts are both 0. The two points to be entered in the diagram are therefore -3 and -6 e-6 on the real axis. This means that the point at -3 on the x-axis lies outside the stability region, which is reflected in the unstable behavior. After the adjustment, the critical eigenvalue after multiplication with the new step size is \(-0.5\cdot 1=-0.5\) and thus lies within the stability region. The result is therefore stable.
The example shows that the step size should be adjusted to the time constants: solvers encounter stability issues when the fixed step size is too large. On the other hand, solver step sizes that are too small in systems with large time constants lead to an unnecessary number of steps and thus to excessive computing effort and inefficient simulations.
For this reason, solvers with step size adjustment are useful in dynamic simulation—that is, algorithms that adjust their current step size to the conditions at that moment. Specifying an error tolerance gives the solver a guideline for step size adjustment. Using certain control methods, the solver can determine the local error for each step and adjust the step size accordingly, which leads to efficient simulations overall.
The solver must therefore repeat the integration steps with a smaller step size if it cannot maintain the specified tolerance. The local error slows down the simulation. However, the condition for maintaining the stability regions can also slow down the solver and force it to select other step sizes. This brings us to the second definition of stiffness:
A system is considered stiff if the solver is forced to adjust the step size to ensure stability. Stiffness therefore depends (in contrast to the first definition) on the interaction between the system of equations and the solver.
Thanks to large stability regions and the ability of modern solvers to adjust their step size to comply with stability limits, stability problems occur less frequently (as long as the equation system itself is also stable). However, even if stiffness does not necessarily lead to instability, it can still cause major efficiency problems with modern solvers.
Dymola offers a practical tool for debugging: Simulation Analysis
In addition to Full Linear Analysis, we present another tool that facilitates the debugging of stiff systems: Simulation Analysis, which outputs the states that dominate errors and lead to step size limitations.
By selecting the “Simulation Analysis” tab and checking the box next to “Which states that dominate error,” we get a helpful overview. For all state variables contained in the system, the following is specified:
This tool provides us with information about the condition that most slows down the solver, allowing us to adjust the corresponding parameters (and thus the eigenvalues) to increase efficiency.
Let's look at an example: We perform the simulation analysis for our example system with the two parallel RC elements and the widely differing time constants. The result for the complete simulation duration of 650 hours is as follows:
The temperature state of the fast capacity limits the solver step size during the simulation a total of over 77,000 times. From the complete linear analysis, we know that the small time constant of 1 s belongs to this state variable. The fast behavior of the variable means that the solver has to repeatedly correct the step size downwards – i.e., it has to abort steps that have already been taken and reintegrate them. This costs the simulation a lot of time.
If we increase the time constant – for example, by a factor of 50 – by increasing the thermal resistance and simulate again, the analysis shows us the following result:
The step size is only reduced by around 15,000 times, which is still a lot, but nevertheless reduces this point to around 20%.
As we have already seen in the first example, this adjustment leads to a significant reduction in simulation time and thus to a significant increase in efficiency.
The simulation analysis for step size limitation is therefore a useful complementary tool for identifying problematic conditions in complex models. Together with the complete linear analysis, it is then possible to adjust the relevant parameters (and thus the eigenvalues) in order to increase system efficiency.
The two definitions of stiffness shown above are both valid. The stiffness of a simulation model can be caused solely by the system of equations. Similarly, an unfavorable combination of eigenvalues of the system of equations and the selected solver is one way in which stiff systems can arise and become inefficient or even unstable.
The tools presented in Dymola enable a better understanding of the systems. This allows the optimal interaction of large and small time constants and the appropriate solver to be found in order to achieve an efficiently computing, stable system that meets the requirements for the simulation results.