Statistical design of experiments is a valuable tool for engineers and other users. We give an overview of the most important basics without diving too deeply into mathematics and explain how a good experimental design can be created in practice.

In many technical and also non-technical fields, experiments are used to quantitatively record and mathematically describe causal relationships between variables. This mathematical description of reality is called a model.

In experiments, certain variables are set in a controlled manner (e.g. air temperature). Other variables are a result of the experiment and are measured but not actively controlled.

In experimental design, the terms** independent variables** or **factors **have become established for the controlled variables. The idea is that the other variables result from these controlled variables and are therefore **dependent variables**. Accordingly, the mathematical models are usually set up as functions of the independent variables from the dependent variables.

An experimental design now specifies how the independent variables must be adjusted in one or more experiments. For example, a specific design might specify that 3 experiments be performed at air temperatures of -10°C, 0°C and +10°C.

Since experiments usually cost a lot of time and money, it is worthwhile to think carefully about which experiments to perform. With a good experimental design, the time and effort required for measurements can often be drastically reduced, while the accuracy of the results remains the same.

The term "Design of Experiments" and its abbreviation DoE is almost always used for a specific type of statistical design of experiments, namely the use of polynomials as a mathematical model. These polynomial models are therefore also called DoE models.

When talking or writing about DoE, there is often no clear distinction between the experimental design itself and the mathematical modeling based on polynomials. Rather, the boundaries between model, experimental design and evaluation of the results are blurred. Of course this makes sense, since polynomial models have nice mathematical properties that allow mixing and generalization. But for beginners, like I was a few years ago, this can be very confusing.

For me, it was by far the most important realization at the time:

Design of experiments is always specific to a particular mathematical model.A design that is optimal for a particular model (e.g. a straight line equation) is not optimal for another model (e.g. a quadratic curve).

The general procedure for statistical experimental design is:

- Define a mathematical model
- Determine the optimal experimental design for that model
- Perform experiments
- Fitting the model to measurement data

Step 2 is of course easier said than done. You need quite a lot of mathematics – statistics to be precise. A good introduction to the theory can be found in the sources linked below. For a basic understanding, the following illustrative explanation is sufficient.

As with any optimization, we need to define an optimization objective. In optimal experimental design this is:

Achieve the greatest possible amount of information with the least possible effort.

The effort can be described quite simply by the number of experiments, i.e. the size of the experimental design. But the information content requires statistics. For given measurement accuracies, we want to estimate the unknown parameter values from the measurement data as accurately and independently as possible. In fact, the demand for independence (as little correlation as possible) and the accuracy of the individual parameter estimates are different goals that can even conflict with each other. In statistical experimental design, there are various objective functions that always represent a compromise between the two goals. The most widely used objective function in practice is the determinant of the Fisher information matrix. Experimental designs that maximize this criterion are called **D-optimal**.

Below we show concrete examples of D-optimal experimental designs for simple models.

The classical DoE models are constructed as polynomials. This has the advantage that unknown parameters (polynomial coefficients) enter the model equation only linearly. And for linear model parameters, optimal designs are independent of the actual parameter values. This means that the experimental design depends only on the structure of the DoE model and not on the results of the experiments. The experimental design can be created completely in advance. Only if the analysis of the results shows that the basic model approach does not fit the data, the model and thus the experimental design need to be adapted for further experimentation.

Unlike purely data-driven polynomial models, it is often the case with physics-based models, as we create and use in many projects, that unknown parameters enter the model nonlinearly. An example is a compressor model with internal pressure losses as unknown model parameters.

In the case of non-linear parameters, the numerical value of these parameters will affect the optimal experimental design. In other words, the design of experiments must be based on estimates, since the parameter values can only be determined after the evaluation of the experiment.

With these estimates, the same experimental design selection methods can be used as with classical DoE models. The mathematics is the same.

Especially with nonlinear models, a multistep approach is useful:

- Select estimated values for model parameters
- Create screening design with few points
- Conduct experiments
- Identify parameter values from measured data
- Create complete experimental plan
- Conduct remaining experiments
- Identify parameter values from complete measurement data

Mathematically, an experimental design is a set of points in a multidimensional space. This is because an experimental point is defined by certain values for all independent variables (factors). The number of factors determines the number of dimensions of the experimental space. If there are 3 factors, you can represent the entire experimental space on a graph and draw the individual design points in it.

If there are more than 3 dimensions, a so-called **pairplot** can be used. Several individual 2D plots with the distribution of test points over two factors are combined so that the axes of the various plots are identical in rows and columns. In addition, the distribution of test points over the respective factors is shown on the main diagonal. The experimental design from the 3D representation above would look like this:

Two basic questions for a good experimental design can be answered qualitatively very quickly with these plots:

- Are there any areas that are not covered?
- Are the points distributed as evenly as possible?

The most common way to create an experimental design is to divide the independent variables into discrete values. The independent variables are also called **factors** and the individual discrete values of the factors are called** levels.**

For example, if the experiment involves 3 different factors, each of which is divided into 4 different levels, there are 4^{3} = 64 individual experimental points.

An experimental design that includes all of these points is called a **full factorial design**. It is easy to imagine that the experimental effort explodes with an increasing number of factors or levels. In practice, it is much more common to find so-called **fractional factorial designs**, which include only a limited number of possible experimental points.

The central question of statistical design is which experimental points such a fractional factorial design should contain. Achieving maximum information with minimum experimental effort is always the goal.

Maximum information content means being able to identify the parameters of a particular mathematical model based on measured data with the smallest possible uncertainties. An optimal experimental design always belongs to a very specific model. If you do not know much about the observed process in advance and therefore do not have a specific mathematical model in mind, the following two general experimental designs are suitable.

Plackett-Burman experimental designs are particularly useful when you want to identify early in a research or development project which factors are influencing the outcome at all. They serve to identify the most important influencing factors and can be used as a starting point for further experimental investigations that take into account interactions between factors.

Plackett-Burman designs assume that the data can be described by a simple linear model with no interactions. Because it is a linear model, the optimal design is universal and does not depend on parameter values that are unknown in advance. Rather, it depends only on the number of factors to be studied. For example, 3 factors result in 4 experimental points. This means that with only 4 experiments, it is possible to determine which of the 3 factors influences a certain result with relative accuracy thanks to design of experiments.

Another commonly used type are Latin Hypercube designs. In contrast to the designs described so far, the factors are not divided into fixed levels, but are completely randomly distributed. The goal is an efficient and uniform exploration of the experimental space. They are particularly suitable for considering a larger number of factors and interactions.

As opposed to Plackett-Burman designs, Latin Hypercube designs are better able to capture interactions between factors. Due to the uniform and random arrangement of factors, complex relationships between variables can be detected.

Latin Hypercube designs are not limited to a specific number of factors or levels. To create them, you simply specify the number of factors and the number of points you want to measure.

In summary, Latin Hypercube designs allow for a broader exploration of factor space, including possible interactions, while using comparatively fewer experiments. This makes them a powerful method for complex experiments where a more comprehensive analysis of factors and their interactions is required.

Python has become the standard language for data analysis. The Pandas library is a very powerful tool for dealing with all kinds of data, including experimental data. And the Seaborn library provides very convenient ways to visualize data in the Pandas format.

There is a very useful Python package for creating standard designs: pyDOE2, a fork of the no longer maintained package pyDOE.

For example, pyDOE2 can be used to create a full factorial design for 3 factors with 3 levels each:

The result is a 2D array with integer values for the factors in the columns. Each row corresponds to one of the experiments to be performed ( an experimental point):

It makes sense to convert this array into a Pandas DataFrame. This way you can define names for each factor and create a nice pairplot diagram with Seaborn as an overview:

It is also possible to create a 3 factorial Plackett-Burman design with a single line of code:

Or a Latin Hypercube design for 3 factors with 10 experimental points:

In addition to these general designs, Python can also be used to create designs for special application-specific mathematical models. For this purpose, we at TLK have developed our own Python package with the somewhat uncreative name **doe_tool**. Even for complex physical models, optimal experimental designs can be determined, graphically displayed and compared. The models can either be imported in FMI format (Functional Mock-up Interface) or defined as Python functions.

As a minimal example, let's look at the experimental design for fitting a straight line to measurement data. First, the model is defined as a Python function:

This function is used to create a corresponding object for the doe_tool:

Define the model parameters to be identified:

the dependent variable, together with its standard deviation, used to define measurement uncertainty:

and the independent variables or factors:

We now want to determine an optimal design for this model. First, we create a full factorial design with 10 levels per factor and compute all the internally required data:

Finally, we want to determine an optimized design with only 2 experimental points. Using a certain algorithm (DETMAX), the 2 best points are selected from the 10 candidates:

And these are the two extremes. This is also easy to understand. When fitting a straight line, the measurement errors will have the least effect on the estimated value of the slope if the distance between two measurement points is as large as possible.

What happens if we want to afford 4 measurements?

The statistical design suggests that we should repeat the two experiments at the extreme points rather than adding additional points in the middle; these repetitions are a typical result of the experimental design.

Let me remind you again: The design is always part of a very specific model, in this case a linear equation. We therefore assume that the data can be described by a straight line. Whether this assumption is met in practice, or whether it is better to measure a point in the middle to be on the safe side, is up to the user of these mathematical methods.

In doe_tool, much more complex models can be imported from other modeling environments thanks to the FMI interface. We use it, for example, to design experiments for laboratory tests of refrigerant compressors. The underlying mathematical models are physics-based compressors from the Modelica library TIL Suite.

The problem with statistical design of experiments is that it relies heavily on mathematics (statistics). Many sources are too theoretical, mathematical, and therefore difficult for most users (e.g., engineers) to understand. Other sources oversimplify, so that the basic concepts and assumptions are not clear. Both result in experimental design methods that are not understood and therefore not applied in practice.

One exception is the very readable article Fisher Matrix for Beginners by D. Wittman. It explains the central element of experimental design, the Fisher information matrix, using a simple example. The necessary mathematics is described in an understandable way without getting too theoretical.

Another highly recommended resource is the NIST/SEMATECHe-Handbook of Statistical Methods. This is a very comprehensive resource on all statistical topics of interest to engineers and other users. Again, the balancing act between mathematical depth and ease of understanding is done very well here. This is achieved through many examples and a focus on graphical/visual evaluations rather than formulas and numbers. There is a separate section on experimental design.