Statistical design of experiments is a valuable tool for engineers and other users. We give an overview of the most important basics of design of experiments without diving too deeply into mathematics and explain how a design of experiments can be created in practice.

In many technical and also non-technical fields, experiments are used to quantitatively record and mathematically describe causal relationships between variables. This mathematical description of reality is called a model.

In experiments, certain variables are set in a controlled manner (e.g. air temperature). Other variables are set as a result of the experiment and are measured but not actively controlled.

In experimental design, the terms independent variables or factors have become established for the controlled variables. The thinking behind this is that the other variables arise as a result of these adjusted variables and are thus dependent variables. Fittingly, the mathematical models are usually set up as functions of the independent variables from the dependent variables.

An experimental design now specifies how the independent variables must be adjusted in one or more experiments. For example, an experimental plan could specify that 3 experiments are to be carried out at the air temperatures -10°C, 0°C and +10°C.

Since experiments usually cost a lot of time and money, it is worthwhile to think carefully about which experiments are to be carried out in detail. With a good experimental plan, the time and effort required for measurements can often be drastically reduced, while the accuracy of the results remains the same.

But what is a good experimental design? This is exactly what statistical design of experiments deals with.

The term "Design of Experiments" and its abbreviation DoE is almost always used for a specific type of statistical design of experiments, namely the use of polynomials as a mathematical model. These polynomial models are therefore also called DoE models.

Design of experiments is always specific to a particular mathematical model.design of experiments that is optimal for a particular model (e.g. a straight line equation) is not optimal for another model (e.g. a quadratic curve).

The general procedure for statistical experimental design is:

- Define mathematical model
- Determine the optimal experimental design for the model
- Carry out experiments
- Fitting the model to measurement data

Step 2 is of course easier said than done. You need quite a lot of mathematics - statistics to be precise. A good introduction to the theory can be found in the sources linked below. For a basic understanding, the following illustrative explanation is sufficient.

As with any optimization, we need to define an optimization objective. In optimal experimental design this is:

Achieve the greatest possible amount of information with the least possible effort.

The effort can be described quite simply by the number of experiments, i.e. the size of the experimental design. Statistics are needed for the information content. If the measurement accuracy is given, it should be possible to estimate the unknown parameter values as accurately and independently as possible from the measurement data. In fact, the demand for independence (as little correlation as possible) and the accuracy of the individual parameter estimates are different goals that can also be in conflict with each other. In statistical experimental design, there are various objective functions that always represent a compromise of both objectives. The most widespread objective function in practice is the determinant of the information matrix. Experimental designs that maximize this criterion are called **D-optimal**.

Below we show concrete examples of D-optimal experimental designs for simple models.

The classical DoE models are constructed as polynomials. This has the advantage that the unknown parameters (the polynomial coefficients) enter the model equation exclusively **linearly**. And with model parameters that enter linearly, it is the case that the optimal experimental design is independent of the concrete parameter values. This means that the experimental design depends only on the structure of the DoE model and not on the results of the experiments. The experimental design can be created completely in advance. Only if the evaluation of the results shows that the basic model approach does not fit the data, the model and thus the experimental design must be adapted for further experiments.

In contrast to the purely data-driven polynomial models, it is often the case with physics-based models, as we create and use them in many projects, that unknown parameters enter the model non-linearly. An example is a compressor model with internal pressure losses as unknown model parameters.

In the case of non-linear parameters, the numerical value of these parameters has an influence on the optimal experimental design. This means that estimated values must be assumed for the experimental design, since the parameter values can only be determined after the evaluation of the experiment.

With these estimated values, the same methods for selecting the best experimental design can then be used as with the classic DoE models. The mathematics behind it is the same.

Especially with non-linear models, a multi-step approach makes sense:

- Select estimated values for model parameters
- Create screening experimental plan with few points
- Conduct experiment
- Identify parameter values from measured data
- Create complete experimental plan
- Conduct remaining experiments
- Identify parameter values from complete measured data

In mathematical terms, an experimental design is a set of points in multidimensional space. This is because an experimental point is defined by certain values for all independent variables (factors). This means that the number of factors determines the number of dimensions of the experimental space. If there are 3 factors, you can represent the entire experimental space in a graph and draw the individual experimental points in it.

If there are more than 3 dimensions, a so-called **pairplot** can be used. Here, the distribution of the test points over two factors is shown in a combined representation. The axes are divided row- and column-wise by all individual diagrams. In addition, the frequency distribution of the test points over the respective factors is shown on the main diagonal. The experimental plan from the 3D representation above would thus look like this:

Two basic questions for a good experimental design can be answered qualitatively very quickly with these plots:

- Are there any areas that are not covered?
- Are the points distributed as evenly as possible?

The most common way to create an experimental design is to divide the independent variables into discrete values. The independent variables are also called **factors** and the individual discrete values of the factors are called** levels.**

For example, if the experiment involves 3 different factors, each of which is divided into 4 different levels, there are 4^{3} = 64 individual experiment points.

An experimental design that includes all these points is called a **full factorial experimental design**. It is easy to imagine that the experimental effort explodes with an increasing number of factors or levels. In practice, therefore, it is much more common to find so-called **partial factorial** experimental designs, which only receive a limited selection of the possible experimental points.

Which experimental points such a partial factorial experimental design should ideally consist of is the central question of statistical experimental design. The aim is always to achieve the maximum information content with the minimum experimental effort.

Maximum information content means being able to identify the parameters of a certain mathematical model with the smallest possible uncertainties on the basis of the measurement data. Therefore, an optimal experimental design always belongs to a very specific model. If you do not know much about the observed process in advance and therefore do not have a specific mathematical model in mind, the following two general experimental designs are suitable.

Plackett-Burman experimental designs are particularly useful if you want to find out at an early stage of research or development which factors have an influence on the result in the first place. They serve to identify the most important influencing factors and can be used as a starting point for further experimental investigations, in which interactions between the factors are then also taken into account.

Plackett-Burman experimental designs assume that the data can be described with a simple linear model without interactions. Since it is a linear model, the optimal experimental design is universal and does not depend on the parameter values that are unknown in advance. Rather, it depends solely on the number of factors to be investigated. For example, 3 factors result in 4 experimental points. This means that thanks to optimal experimental design, it is possible to determine relatively precisely with only 4 experiments which of 3 factors have an influence on a certain result.

Another frequently used type of experimental design is the Latin Hypercube experimental design. In contrast to the plans described so far, the factors are not divided into fixed levels, but are distributed completely randomly. The aim is an efficient and even exploration of the experimental space. They are particularly suitable for considering a wider range of factors and interactions.

In contrast to Plackett-Burman experimental designs, Latin Hypercube experimental designs can better capture interactions between factors. Due to the uniform and random arrangement of the factors, complex relationships between the variables can be detected.

Latin Hypercube experimental designs are not limited to a specific number of factors or levels. To create them, you simply specify the number of factors and the desired number of measurement points.

In summary, Latin Hypercube experimental designs allow for a broader exploration of the factor space, including possible interactions, while still using comparatively fewer experiments. This makes them a powerful method for complex experiments where a more comprehensive analysis of factors and their interactions is required.

Python has become the standard for data analysis. With the **Pandas** library, there is a very powerful tool for dealing with all kinds of data, including experimental data. And the library **Seaborn** offers very comfortable possibilities to visualize data in the Pandas format.

There is a very useful package for Python to create standard experimental designs: pyDOE2 a fork of the no longer maintained package pyDOE.

A full factorial experimental design for 3 factors with 3 levels each can be created for example like this:

Result is a 2D array with integer values for the factors in the columns. Each row corresponds to an experiment to be performed (experimental point):

It makes sense to convert this array into a Pandas DataFrame. In this way, names can be defined for the individual factors and a clear pair diagram can be created with Seaborn:

A Plackett-Burman experimental design for 3 factors can also be created with one line:

Or a Latin hypercube experimental design for 3 factors with 10 experimental points:

In addition to these general experimental designs, Python can also be used to create experimental designs for special application-specific mathematical models. For this purpose, we at TLK have developed our own Python package with the somewhat uncreative name **doe_tool**. With it, optimal experimental designs can be determined, graphically displayed and compared even for complex physical models. The models can either be in FMI format (Functional Mock-up Interface) or as Python functions.

As a minimal example, let's look at the experimental design for fitting a straight line to measurement data. First, the model is defined as a Python function:

From this, a corresponding object for the doe_tool is created:

The model parameters to be identified are defined:

The dependent variable together with its standard deviation as a measure of measurement uncertainties:

and the independent variables or factors:

We now want to determine an optimal experimental design for this model. First, we create a full factorial experimental design with 10 levels per factor and calculate all internally required data:

Now we want to determine an optimized experimental design with only 2 experimental points. Using a certain algorithm (DETMAX), the 2 best points are selected from the 10 candidates:

And these are precisely the two extreme points. This is also clearly comprehensible. Because when fitting a straight line, the measurement errors have the least effect on the estimated value for the slope if the distance between two measurement points is as large as possible.

What happens now if we want to afford 4 measurements?

The statistical experimental design suggests that we should rather repeat the two experiments at the extreme points instead of placing additional measuring points in the middle.

These repetitions are a typical result of the experimental design. Let me remind you again: The experimental design always belongs to a very specific model, in this case a straight line equation. We therefore assume that the data can be described by a straight line. Whether this prerequisite is fulfilled in practice or whether it is better to measure a point in the middle to be on the safe side is left to the user of these mathematical methods.

In doe_tool, much more complex models can also be imported from other modelling environments thanks to the FMI interface. We use it, for example, to create test plans for refrigerant compressors. We use physics-based models from the Modelica libraries of the TIL Suite.

The problem with statistical experimental design is that it relies very heavily on mathematics (statistics). Many sources are too theoretical, mathematical and are therefore difficult for most users (e.g. engineers) to understand. Other sources, on the other hand, oversimplify so that the basic concepts and assumptions are not clear. Both lead to experimental design methods not being understood in practice and therefore not being applied.

An exception to this is the very readable article FisherMatrix for Beginners by D. Wittman. It explains the central element of experimental design with a simple example: the Fisher information matrix. The mathematics necessary for understanding it is described in a comprehensible way without diving into too much theory.

Another highly recommended source is the NIST/SEMATECHe-Handbook of Statistical Methods. This is a very comprehensive source on all statistics topics that concern engineers and other users. Here, too, the balancing act between mathematical depth and comprehensibility is achieved very well. This is achieved through many examples and a focus on graphical/visual evaluations instead of formulas and numbers. There is a separate section on the topic of experimental design.