3. Fitting mechanistic models to data

In the previous tutorial, The mechanistic model, we have seen how we can implement and simulate treatment response models in Chi. For example, we can simulate the time course of drug concentration levels by 1. implementing a model in SBML; 2. setting the route of administration and the dosing regimen; 3. simulating the drug concentration for a given set of model parameters.

import os

import chi
import numpy as np
import plotly.graph_objects as go


# Implement the model
directory = os.path.dirname(__file__)
filename = os.path.join(directory, 'one_compartment_pk_model.xml')
model = chi.PKPDModel(sbml_file=filename)
model.set_outputs(['global.drug_concentration'])

# Set administration and dosing regimen
model.set_administration(compartment='global', direct=False)
model.set_dosing_regimen(dose=2, period=1)

# Simulate treatment response
parameters = [0, 0, 10, 1, 2]
times = np.linspace(start=0, stop=3, num=200)
conc = model.simulate(parameters=parameters, times=times)[0]

# Plot results
fig = go.Figure()
fig.add_trace(go.Scatter(
    x=times,
    y=conc,
    mode='lines',
))
fig.update_layout(
    xaxis_title='Time',
    yaxis_title='Drug concentration',
    template='plotly_white'
)
fig.show()

This functionality to simulate treatment responses is pretty cool in its own right! For example, it can help us to study nonlinearities in the treatment response dynamics and to optimise dosing regimens to target a desired treatment response.

However, at this point, the simulated treatment responses have little to do with real treatment responses. To describe real treatment responses that we may observe in clinical practice, we need to somehow connect our model to reality.

The most common approach to relate models to real treatment responses is to compare the model predictions to measurements. Below, we have prepared an example dataset with drug concentration measurements. These drug concentrations were measured after repeatedly adminstering 2 mg of a drug every 24 hours. You can download the dataset from the following link: Dataset_1.

Drug concentration measurements
ID	Time	Time unit	Observable	Value	Observable unit	Duration	Dose	Dose unit
1	0.5	Day	Drug concentration	0.198	ng/mL
1	1.0	Day	Drug concentration	0.123	ng/mL
1	1.5	Day	Drug concentration	0.305	ng/mL
1	2.0	Day	Drug concentration	0.184	ng/mL
1	2.5	Day	Drug concentration	0.421	ng/mL
1	3.0	Day	Drug concentration	0.306	ng/mL
1	0.0	Day				0.01	2.0	mg
1	1.0	Day				0.01	2.0	mg
1	2.0	Day				0.01	2.0	mg

The dataset contains one column that identifies measured individuals (ID), two columns that specify measurement times and dose administration times (Time and Time unit), three columns that specify measured values (Observable, Value, Observable unit), and three columns that specify dose administrations (Dose, Duration, Dose unit).

Downloading the file and saving it in the same directory as the Python script, we can visualise the measurements by executing the below script

import pandas as pd


# Import data
directory = os.path.dirname(__file__)
data = pd.read_csv(directory + '/dataset_1.csv')

# Plot data
fig = go.Figure()
fig.add_trace(go.Scatter(
    x=data.Time,
    y=data.Value,
    mode='markers',
))
fig.update_layout(
    xaxis_title='Time in day',
    yaxis_title='Drug concentration in ng/mL',
    template='plotly_white'
)
fig.show()

The figure shows the drug concentration measurements as blue scatter points. The treatment response dynamics indicated by the measurements is similar to the simulated treatment response in the previous code block. But looking more closely at the magnitude of the values, it appears that the measured values are much smaller than the simulated ones. We can therefore conclude that, at this point, our model does not provide an accurate description of the measured treatment response.

To find a better description of the treatment response, we have two options: 1. we can try to find parameter values that improve the proximity of the model output to the measurements; or 2. we can define a new mechanistic model and see whether this new model is able to describe the measurements better. This tutorial will be about the former and will detail how we can find better model parameters for a given model structure.

3.1. Estimating model parameters from data: Background

Before we can try to find parameter values that describe the observed treatment response most closely, we first need to agree on what we mean by “most closely” for the relationship between the mechanistic model output and the measurements. An intuitive way to define this notion of closeness is to use the distance between the measurements and the model output, i.e. the difference between the measured values and the simulated values. Then the model parameters that most closely describe the measurements would be those parameter values that make the mechanistic model output perfectly match the measurements, resulting in distances of 0 ng/mL between the model output and the measurements at all measured time points. However, as outlined in Sections 1.3 and 1.4 of the Quick overview, measurements of treatment responses are imprecise and noisy, and will therefore not perfectly represent the treatment response dynamics. Consequently, if we were to match the model outputs to measurements perfectly, we would end up with an inaccurate description of the treatment response as our model would be paying too much attention to the measurement noise.

One way to improve our notion of closeness is to incorporate the measurement process into our computational model of the treatment response, thereby explicitly stating that we do not expect the mechanistic model output to match the measurements perfectly. In Chi, this can be done using chi.ErrorModel s. Error models promote the single value output of mechanistic model simulations to a distribution of values. This distribution characterises a range of values around the mechanistic model output where measurements may be expected. For simulation, this distribution can be used to sample measurement values and imitate the measurement process of real treatment responses, see Section 1.3 in the Quick overview for an example. For parameter estimation, the distribution can be used to quantify the likelihood with which the observed measurements would have been generated by our model, see Section 1.4 in the Quick overview. To account for measurement noise during the parameter estimation, we therefore choose to quantify the closeness between the model output an the measurements using likelihoods.

Formally, we denote the measurement distribution by \(p(y | \psi, t, r)\), where \(y\) denotes the measurement value, \(\psi\) denotes the model parameters, \(t\) denotes the time point of the measurement, and \(r\) denotes the adminstered dosing regimen. With this measurement distribution in place, we can quantify the likelihood with which any given set of measurements would have been generated by the model. For example, the likelihood of a measurement \(y_1\) at time \(t_1\) under dosing regimen \(r^*\) is defined by the value of the probability density of the measurement distribution evaluated at the measurement, \(p(y_1 | \psi, t_1, r^*)\). Note that this likelihood depends on the choice of model parameters, \(\psi\). The model parameters with the maximum likelihood are the parameter values that most closely describe the measurements.

Note

The measurement distribution, \(p(y | \psi, t, r)\), is defined by the mechanistic model output and the error model. To illustrate this, let use denote the simulated drug concentration values of the the 1-compartment PK model by \(c(\psi, t, r)\). By definition of the model, the drug concentration values, \(c\), are a function of the model parameters, \(\psi = (a_0, k_a, k_e, v)\), the time, \(t\), and the dosing regimen, \(r\).

1. If we choose a chi.GaussianErrorModel to describe the difference between the model output and the measurements, we assume that measurements are distributed according to a Normal distribution around the model output

\[p(y | \psi, t, r) = \frac{1}{\sqrt{2\pi \sigma ^2}}\mathrm{e}^{-\big(y - c(\psi, t, r)\big)^2 / 2\sigma ^2},\]

where \(\sigma\) defines the width of the distribution. For ease of notation, we extend the definition of the model parameters to include \(\sigma\), \(\psi = (a_0, k_a, k_e, v, \sigma)\).

We can see that the model output defines the mean or Expectation Value of the measurement distribution.

2. If we choose a chi.LogNormalErrorModel to describe the difference between the model output and the measurements, we assume that measurements are distributed according to a lognormal distribution around the model output

\[p(y | \psi, t, r) = \frac{1}{\sqrt{2\pi \sigma ^2}}\frac{1}{y}\mathrm{e}^{-\big(\log y - \log c(\psi, t, r) + \sigma / 2\big)^2 / 2\sigma ^2}.\]

One can show that also for this distribution the model output defines the mean or Expectation Value of the measurement distribution.

The main difference between the two distributions is the shape. The chi.GaussianErrorModel is symmetrically distributed around the model output, while chi.LogNormalErrorModel is scewed in such a way that measurements can never assume negative values. To visualise these differences, we recommend simulating many measurements with different error models, similar to Section 1.3 in Quick overview. But instead of choosing different times, sample all measurements at the same time. You can then histogram the samples, using for example go.Histogram, as used in Section 1.4.2 in Quick overview, to visualise the shape of the probability density.

Assuming independence of measurements, the likelihood for a dataset with \(n\) measurements, \(\mathcal{D}=((t_1, y_1), (t_2, y_2), \ldots, (t_n, y_n), r)\), is given by the product of the individual likelihoods

\[p(\mathcal{D}| \psi ) = \prod _{j=1}^n p(y_n | \psi, t_n, r),\]

where independence refers to the assumption that measurements are independently and identically distributed according to our model of the measurement process (this does not have to be the case, and is especially unlikely to be the case when our model fails to describe the measurement process accurately).

With this likelihood in place, Chi makes it straightforward to run numerical optimisation algorithms in order to derive the maximum likelihood estimates that best describe the measurements, see Section 1.4.1 in the Quick overview.

Alternatively, Chi also supports Bayesian inference of the model parameters: Bayesian inference is conceptually different from maximum likelihood estimation as it does not seek to find a single set of model parameters that “best” describes the observed measurements. Instead, Bayesian inference acknowledges the fact that noisy measurements leave uncertainty about which model parameters best describe the data, and therefore focuses instead on deriving a distribution of parameter values which are all consistent with the observed measurements, see Section 1.4.2 in Quick overview for a more detailed discussion. This distribution is derived from the likelihood using Bayes’ rule

\[p(\psi| \mathcal{D} ) = \frac{p(\mathcal{D}| \psi )\, p(\psi)}{p(\mathcal{D} )},\]

where \(p(\psi)\) denotes the prior distribution of the model parameters. The prior distribution quantifies the baseline distribution of parameter values before the parameter estimation, making it possible to inform the inference results with otherwise hard-to-quantify knowledge of feasible or likely parameter values prior to the inference.

3.2. Defining the log-posterior

In Chi, we can derive posterior distributions from measurements using Markov chain Monte Carlo (MCMC) algorithms implemented in the open source Python package Pints. In Sections 1.4.1 and 1.4.2 in the Quick overview, we showed in some detail how we can define (log-)posterior distributions, chi.LogPosterior, for this purpose. Here, we want to show how we can use the chi.ProblemModellingController to simplify and automate this process as much as possible.

The tricky bit when implementing log-posteriors for treatment response models is often that log-posteriors do not only depend on the treatment response measurements, \(((t_1, y_1), (t_2, y_2), \ldots, (t_n, y_n))\), but also on the administered dosing regimen, \(r\). This can make it tedious to define log-posteriors, especially when parameters across measurements of multiple individuals with different dosing regimens are inferred, as dosing regimens of the model will have to be manually set to the dosing regimens administered during the measurement process. A mismatch between the dosing regimens would render the inference results invalid.

To eliminate this source of error, we have implemented the chi.ProblemModellingController. The chi.ProblemModellingController facilitates the construction of log-posteriors and reduces the workflow to a simple 4-step approach:

1. Definition of the mechanistic model
1. Definition of the error model
1. Definition of the measurements
1. Definition of the prior distribution

In the below code block, we illustrate this workflow for the above drug concentration dataset, Dataset_1.

import pints


# Define mechanistic model
directory = os.path.dirname(__file__)
filename = os.path.join(directory, 'one_compartment_pk_model.xml')
model = chi.PKPDModel(sbml_file=filename)
model.set_administration(compartment='global', direct=False)
model.set_outputs(['global.drug_concentration'])

# Define error model
error_model = chi.LogNormalErrorModel()

# Define data
directory = os.path.dirname(__file__)
data = pd.read_csv(directory + '/dataset_1.csv')

# Define prior distribution
log_prior = pints.ComposedLogPrior(
    pints.GaussianLogPrior(mean=10, sd=2),           # absorption rate
    pints.GaussianLogPrior(mean=6, sd=2),            # elimination rate
    pints.LogNormalLogPrior(log_mean=0, scale=1),    # volume of distribution
    pints.LogNormalLogPrior(log_mean=-2, scale=0.5)  # scale of meas. distrib.
)

# Define log-posterior using the ProblemModellingController
problem = chi.ProblemModellingController(
    mechanistic_model=model, error_models=[error_model])
problem.set_data(
    data=data,
    output_observable_dict={'global.drug_concentration': 'Drug concentration'}
)
problem.fix_parameters(name_value_dict={
    'dose.drug_amount': 0,
    'global.drug_amount': 0,
})
problem.set_log_prior(log_prior=log_prior)
log_posterior = problem.get_log_posterior()

The first four blocks in the code define the individual components of the log-posterior: the mechanistic model, the error model, the data, and the prior distribution. Note that the administration of the dosing regimen is set before passing the mechanistic model to the ProblemModellingController.

The prior distribution defines marginal distributions for the parameters, and is implemented using Pints. In Bayesian inference, we can use the prior distribution to bias the inference results towards feasible areas of the parameter space. In this case, we have little prior knowledge about feasible parameter ranges, so we choose relatively uninformative prior distributions (below will be an illustration of the prior distribution). Note that we seem to be specifying marginal prior distributions for only 4 of the 6 model parameters. This is because we fix the values of the initial drug amounts to 0 prior to the inference in the lines below, reflecting our knowledge that the subject had no prior exposure to the drug before starting the trial. This reduces the number of model parameters from 6 to 4. The fixing of model parmaters is optional, but can sometimes improve the inference results when some model parameters are already well understood.

For the remaining 4 parameters, only positive values make biological sense, so we choose prior distributions that focus on positive values. For two model parameters, the volume of distribution and the scale parameter, negative or zero values are particularly bad as they will break the simulation (1. a volume of zero causes a division by zero error; and 2. the lognormal distribution is only defined for positive sigma). We therefore use pints.LogNormalLogPrior to constrain those parameters to strictly positive values.

In the final block of the code, we define the log-posterior. In the first line, we specify the mechanistic model and the error model. In the next line, we set the dataset. Note that we need to use the output_observable_dict to map the output variable of the model, global.drug_concentration, to the Observable name in the dataset, Drug concentration. Other specifications are not required, and dosing regimens are automatically set, when the dosing regimen related columns, Dose and Duration, are present in the dataset. In the following line, we fix the initial drug amounts to 0 using ProblemModellingController.fix_parameters(). You can use this method to fix any parameters of the model. In the last two lines, we set the log-prior and implement the log-posterior using the ProblemModellingController.get_log_posterior method.

3.3. Inferring the posterior distribution

With this chi.LogPosterior in place, we can infer the posterior distribution using any MCMC algorithm of our choice. Recall that inference, in this context, means the reconstruction of the posterior distribution from the chi.LogPosterior, which defines the log-pdf of the posterior distribution up to an unknown constant shift

\[\log p(\psi| \mathcal{D} ) = \log p(\mathcal{D}| \psi ) + \log p(\psi) + \text{constant}.\]

By comparison to above Bayes rule, you will find that \(\text{constant} = -\log p(\mathcal{D})\), so the constant shift may seem not so ‘unkown’ after all. However, unknown here does not mean that we cannot write down an expression for it, it refers to the fact that for most treatment response models \(p(\mathcal{D})\) is (practically) impossible to evaluate, as evaluating \(p(\mathcal{D})\) requires the numerical integration of the likelihood-prior product over the full parameter space, \(p(\mathcal{D}) = \int \mathrm{d} \psi \, p(\mathcal{D}, \psi ) = \int \mathrm{d} \psi \, p(\mathcal{D}| \psi )\, p(\psi)\). This renders the value of the constant shift for all intents and purposes unknown.

The unknown shift makes it impossible to make statements about the absolute probability of parameter values. However, it does allow for relative comparisons of probabilities – a fact exploited by MCMC algorithms to circumvent the limitation of the partially known log-posterior. MCMC algorithms use the relative comparison of parameter probabilities to generate random samples from the posterior distribution, opening a gateway to reconstruct the distribution. The more random samples are generated, the closer the histogram over the samples will approximate the posterior distribution. In fact, one can show that the histogram will converge to the posterior distribution as the number of samples approaches infinity. This makes it possible for MCMC algorithms to reconstruct any posterior distribution from a chi.LogPosterior.

To illustrate this, let us run an MCMC algorithm to infer the above defined posterior distribution of the 1-compartment PK model.

# Run MCMC algorithm
n_iterations = 20000
controller = chi.SamplingController(log_posterior=log_posterior, seed=1)
controller.set_n_runs(n_runs=3)
controller.set_parallel_evaluation(False)
controller.set_sampler(pints.HaarioBardenetACMC)
samples = controller.run(n_iterations=n_iterations, log_to_screen=True)

In the code block, we use an MCMC algorithm implemented in Pints, called pints.HaarioBardenetACMC. For technical reasons that we will discuss below, we run the algorithm three times for 20,000 iterations. Note that we use the chi.SamplingController to set the number of runs, the number of iterations, and to run the sampling algorithm. The chi.SamplingController can also do other things, such as running the chains in parallel, but we will not go into this additional functionality in this tutorial, and refer instead to the API reference.

Executing the above code block will spawn a response of the below form. The left most column indicates the current iteration of the MCMC algorithm. The other columns show the total number of log-posterior evaluations and the fraction of accepted MCMC proposals of each chain. The target acceptance ratio depends on the details of the MCMC algorithm. For the pints.HaarioBardenetACMC the target is 0.23.

Using Haario-Bardenet adaptive covariance MCMC
Generating 3 chains.
Running in sequential mode.
Iter. Eval. Accept.   Accept.   Accept.   Time m:s
0     3      0         0         0          0:00.0
1     6      0         0.5       0.5        0:00.0
2     9      0.333     0.667     0.333      0:00.0
3     12     0.25      0.75      0.5        0:00.0
20    63     0.714     0.571     0.476      0:00.0
40    123    0.756     0.61      0.561      0:00.0
60    183    0.738     0.475     0.59       0:00.0
80    243    0.716     0.407     0.642      0:00.1
100   303    0.693     0.406     0.653      0:00.1
120   363    0.736     0.421     0.686      0:00.1
.
.
.

When the three runs of the algorithm terminate, the generated samples are returned in form of an xarray.Dataset. We can visualise the samples using the code block documented at the end of this section (we move the code block to the end to avoid disrupting the flow of the tutorial with less relevant code snippets).

The left column of the figure shows the histogram over the samples across the three runs in orange, as well as the probability density of the prior distribution in black. The first row shows the samples of the absorption rate, the second row shows the samples of the elimination rate, the third row shows the samples of the volume of distribution, and the fourth row shows the samples of the scale parameter of the error model, sigma. The right column of the figure shows the same samples, this time visualised over the iterations at which they were drawn. The samples from the different runs are illustrated in different colours: run 1 (green); run 2 (red); run 3 (blue).

3.3.1. The posterior distribution

The orange distribution is the result of the inference – the posterior distribution. It contains all parameter values that are consistent with the drug concentration measurements and our prior knowledge, assigning each set of parameter values with a probability of being the data-generating parameter values. Notably, the figure shows that our prior knowledge and Dataset_1 are insufficient to conclude on a single set of parameter values (see orange distribution). Instead, the measurements only allow us to refine our understanding of feasible parameter values. For example, we can see in the second row of the figure that the marginal posterior distribution substantially differs from the marginal prior distribution. This is because the drug concentration measurements contain important information about the elimination rate, rendering rates above 1.5 1/day or below 0.25 1/day as extremely unlikely for the model of the treatment response. This in in stark contrast to the relatively wide range of model parameters that we deemed feasible prior to the inference (see black line). However, the measurements are not conclusive enough to reduce the distribution of feasible elimination rates to a single value. Similarly, for the volume of distribution (row 3) and the error scale parameter (row 4), the measurements lead to substaintial updates relative to the prior distribution. In comparison, the measurements appear less informative about the absorption rate (see row 1), given that the marginal posterior distribution of the absorption rate is almost identical to its prior distribution. We will have a closer look at an intuitive understanding of why the measurements contain little information about the absorption rate below. The take-away from this discussion is that inferring distributions of parameter values consistent with the measurements is in many ways more informative than just estimating a single set of model parameters, as it can help to quantify the feasibility of different parameter values and the overall remaining level of uncertainty. In a Bayesian setting, inferring posterior distributions also allows us to supplement the information contained in the measurements by prior knowledge of feasible model parameters, facilitating the inference of parameter values when treatment response measurements are limited.

3.3.2. Assessing convergence

Similar to most numerical inference algorithms, MCMC algorithms have practical limitations, making them not equally well suited to all inference problems. Some MCMC algorithms are, for example, only well suited to estimate model parameters when the total number of parameters is small, while other MCMC algorithms only work when the curvature of the posterior surface is not too extreme. One way to test the suitability of an MCMC algorithm for a particular inference problem is the use of the \(\hat{R}\) metric. We will motivate the metric below and provide an intuitive explanation. Technical details are left to interested readers to explore on their own.

Let us begin this section by revisiting the right column in the figure above. The column shows the samples from the three MCMC algorithm runs at each iteration. For early iterations of the algorithm, the samples from the MCMC runs look quite distinct – each run appears to sample from a different area of the parameter space. In contrast, the MCMC runs seem to converge and sample from the same area of the parameter space at later iterations. Intuitively, it does not really make sense for the samples from the MCMC runs to look different – after all, we use the same MCMC algorithm to sample from the same posterior distribution. The histogram over the samples should therefore be identical within the limits of sampling variation. However, if we plot the samples from each of the three runs separately, we find that the histograms actually look quite different. For illustrative purposes, we focus on the absorption rate in the code block.

fig = go.Figure()

# Plot histograms
fig.add_trace(
    go.Histogram(
        name='Run 1',
        x=samples['dose.absorption_rate'].values[0, ::n_iterations//1000],
        histnorm='probability density',
        showlegend=True,
        xbins=dict(size=0.5),
    marker_color=qualitative.Plotly[2],
    )
)
fig.add_trace(
    go.Histogram(
        name='Run 2',
        x=samples['dose.absorption_rate'].values[1, ::n_iterations//1000],
        histnorm='probability density',
        showlegend=True,
        xbins=dict(size=0.5),
    marker_color=qualitative.Plotly[1],
    )
)
fig.add_trace(
    go.Histogram(
        name='Run 3',
        x=samples['dose.absorption_rate'].values[2, ::n_iterations//1000],
        histnorm='probability density',
        showlegend=True,
        xbins=dict(size=0.5),
    marker_color=qualitative.Plotly[0],
    )
)

fig.update_layout(
    template='plotly_white',
    xaxis_title='Absorption rate in 1/day',
    yaxis_title='Probability density',
    barmode='overlay'
)
fig.update_traces(opacity=0.75)
fig.show()

We select the samples of the absorption rate from the xarray.Dataset using the name of the parameter in the mechanistic model, samples['dose.absorption_rate'].values. The returned object is a numpy array of shape (n_runs, n_iterations). We can select the samples from run n using the index n-1, e.g. for run 1: samples['dose.absosption_rate'].values[0]. In addition, we subsample the MCMC samples in the above code block, which we can, for now, regard as a cosmetic choice.

We can see that the histograms over the samples from the three runs look very different. This seems contradictory to the orange distribution which we presented above as the histogram over the MCMC samples across the runs. But also, how can it makes sense for the three histograms to differ when the posterior distribution is the same? – It does not, and in fact, it tells us that the three histograms in the figure above do not represent the posterior distribution. Although, we did not disclose it until now, the orange distribution, respresenting the posterior distribution, is only based on the second half of each MCMC run!

So why is it crucial that we only choose the second half of each MCMC run, and is the particular choice of the second half important? The answer comes back to a common limitation of all MCMC algorithm which we can see in the right column of the figure presented earlier: MCMC algorithms generate samples from the posterior distribution conditional on the latest generated sample. For some MCMC algorithms, this conditioning has little influences on sequential samples because the internal sampling strategy is advanced enough to sufficiently decorrelate subsequent samples. But for many MCMC algorithms the conditioned sample substantially influences the sampled value. That means that if the last samples of two MCMC runs are far away from each other, the following samples of the runs will also differ substantially from each other, see for example the first 5000 iterations of the elimination rate in the second row of the figure in the previous section.

At the start of an MCMC run, there is no “last sampled” parameter value, and we have to manually provide a starting point to the MCMC algorithm. The chi.SamplingController automatically samples these starting points from the prior distribution, to reduce the number of manual steps during the inference in Chi. This means that the above runs of the MCMC algorithm start from three different positions in parameter space. These starting points have little to do with the posterior distribution, and are potentially far away from the area of the parameter space that is interesting for the posterior distribution. It therefore makes sense that during the early iterations of the runs, the sampled values of the MCMC runs do not agree. Fortunately, MCMC algorithms are constructed in such a way that their samples are, at least in theory, guaranteed to converge to the posterior distribution, meaning that runs of MCMC algorithms starting from different areas in parameter space can be expected to converge to the same area. In practice, numerical limitations may nevertheless make it impossible, or very time-consuming for samples from MCMC algorithms not suited to the inference problem to converge to the posterior distribution. Starting multiple runs from different areas in parameter space therefore provides an effective way of testing whether a given MCMC algorithm is suited for the inference of the problem at hand. If the runs do not converge, the MCMC algorithm either needs more iterations to converge, or is not suited for the inference problem, in which case the MCMC samples cannot be expected to represent the posterior distribution. If the runs do converge, we can be confident that the inference results are correct (although there is always a chance for edge cases where this convergence test fails to identify the limitations of an algorithm).

So, starting multiple MCMC runs from different locations in parameter space is a good idea to gain confidence in the inference results. We recommend running the MCMC algorithm from 3-5 different starting points, randomly sampled from the prior distribution. On the one hand, a large number of starting points will test the suitability of the MCMC algorithm more thoroughly, but each run also comes at a computational cost. 3-5 runs therefore provide a good tradeoff. In addition, starting points randomly sampled from the prior usually make sure that the starting points are not too close together, so that the suitability of the MCMC algorithm is indeed tested. At the same time, it also ensures that the starting points are not extremely far away from each other, located in areas of the parameter space that we do not even deem relevant for the inference result. So, even if the MCMC algorithm would encounter problems in these extreme areas of parameter space, it may not be very relevant for our inference problem. Randomly sampling from the prior distribution therefore provides a good tradeoff between well dispersed and not too extreme starting points.

In conclusion, the initial iterations of the MCMC runs can be used to establish the validity of the inference results when the starting points of different runs are sufficiently dispersed. At the same time, these initial samples have little to do with the posterior distribution, as outlined above. It is therefore common practice to exclude these early samples from the inference result. To identify which samples should be discarded, and which should be included in the result, the earlier presented “trace plot”, visualising the samples of the different runs at each iteration can be useful. We can see, for example, that somewhere around the 8000s iteration the three runs of the MCMC algorithm converge. We therefore choose the 10,000s iteration as the cut-off for the “warm-up” (2000 extra iteration for good measure). Below, we plot the histogram over the samples from the three chains, this time using only the samples from the second half of each run.

fig = go.Figure()

# Plot histograms
fig.add_trace(
    go.Histogram(
        name='Run 1',
        x=samples['dose.absorption_rate'].values[0, n_iterations//2::(n_iterations//2)//1000],
        histnorm='probability density',
        showlegend=True,
        xbins=dict(size=0.5),
    marker_color=qualitative.Plotly[2],
    )
)
fig.add_trace(
    go.Histogram(
        name='Run 2',
        x=samples['dose.absorption_rate'].values[1, n_iterations//2::n_iterations//1000],
        histnorm='probability density',
        showlegend=True,
        xbins=dict(size=0.5),
    marker_color=qualitative.Plotly[1],
    )
)
fig.add_trace(
    go.Histogram(
        name='Run 3',
        x=samples['dose.absorption_rate'].values[2, n_iterations//2::n_iterations//1000],
        histnorm='probability density',
        showlegend=True,
        xbins=dict(size=0.5),
    marker_color=qualitative.Plotly[0],
    )
)

fig.update_layout(
    template='plotly_white',
    xaxis_title='Absorption rate in 1/day',
    yaxis_title='Probability density',
    barmode='overlay'
)
fig.update_traces(opacity=0.75)
fig.show()

We can see that the histograms over the samples are now in much better agreement!

This visual assessment of the convergence already gets us very far, but for those that would prefer more quantitative metrics to assess the convergence of MCMC runs, we recommend the open source library ArViz. ArViz implements a number of widely established metrics to assess the convergence of MCMC algorithms, including the \(\hat{R}\) metric and the effective sample size. We can estimate both values (and many more) using the az.summary function. In the below code block, we first estimate the values for all MCMC samples, and then estimate the values only for the samples post warm-up.

import arviz as az


# Summary for all samples
summary1 = az.summary(samples)
print(summary1)

# Summary for the samples post warmup
summary2 = az.summary(samples.sel(draw=slice(n_iterations//2, n_iterations)))
print(summary2)

The return of az.summary is a pandas.DataFrame, containing the estimated values, as illustrated below

Summary: all samples
	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
dose.absorption_rate	8.898	3.516	0.399	13.007	1.503	1.125	10.0	11.0	1.21
global.elimination_rate	1.525	1.959	0.207	6.662	0.938	0.714	9.0	11.0	1.25
global.volume	6.018	2.529	0.47	9.233	1.05	0.783	8.0	11.0	1.26
Sigma log	0.185	0.072	0.091	0.295	0.004	0.003	502.0	264.0	1.01

Summary: samples post warm-up
	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
dose.absorption_rate	10.014	1.957	6.456	13.778	0.046	0.033	1802.0	3110.0	1.0
global.elimination_rate	0.795	0.261	0.351	1.301	0.006	0.005	1850.0	2121.0	1.0
global.volume	6.862	1.605	3.923	9.947	0.038	0.027	1757.0	2295.0	1.0
Sigma log	0.178	0.053	0.095	0.278	0.001	0.001	1651.0	2381.0	1.0

The summary output contains the estimates of the mean value, the standard deviation, and many other values. For the assessment of the convergence, we would like to focus on the \(\hat{R}\) values in the right-most column of the dataframes, the r_hat columns. We can see that the summary provides one estimate for each parameter. For the samples post warm-up, the estimates are all 1.00, while the estimates for all samples assume larger values. Loosely speaking, the \(\hat{R}\) metric compares the variance of samples across runs to the variance of samples within runs. If samples from different runs are very different, the variance across runs is much larger than the variance of samples within the same run, leading to an estimate \(>1\). On the flip side, if the inter-run variance is the same as the intra-run variance, the samples across runs look the same and the \(\hat{R}\) metric returns a value of \(1\). As a result, \(\hat{R}\) estimates of \(1\) indicate convergence, while estimate \(>1\) indicate that the runs have not yet fully converged.

from plotly.colors import qualitative
from plotly.subplots import make_subplots


# Plot results
fig = make_subplots(
    rows=4, cols=2, vertical_spacing=0.15, horizontal_spacing=0.1)

# Plot traces and histogram of parameter
fig.add_trace(
    go.Histogram(
        name='Posterior samples',
        legendgroup='Group 1',
        x=samples['dose.absorption_rate'].values[:, n_iterations//2::(n_iterations//2)//1000].flatten(),
        histnorm='probability density',
        showlegend=True,
        xbins=dict(size=0.5),
    marker_color=qualitative.Plotly[4],
    ),
    row=1,
    col=1
)
fig.add_trace(
    go.Scatter(
        name='Run 1',
        legendgroup='Group 2',
        x=np.arange(1, n_iterations+1),
        y=samples['dose.absorption_rate'].values[0],
        mode='lines',
        line_color=qualitative.Plotly[2],
    ),
    row=1,
    col=2
)
fig.add_trace(
    go.Scatter(
        name='Run 2',
        legendgroup='Group 3',
        x=np.arange(1, n_iterations+1),
        y=samples['dose.absorption_rate'].values[1],
        mode='lines',
        line_color=qualitative.Plotly[1],
    ),
    row=1,
    col=2
)
fig.add_trace(
    go.Scatter(
        name='Run 3',
        legendgroup='Group 4',
        x=np.arange(1, n_iterations+1),
        y=samples['dose.absorption_rate'].values[2],
        mode='lines',
        line_color=qualitative.Plotly[0],
    ),
    row=1,
    col=2
)

# Plot traces and histogram of parameter
fig.add_trace(
    go.Histogram(
        name='Posterior samples',
        legendgroup='Group 1',
        x=samples['global.elimination_rate'].values[:, n_iterations//2::(n_iterations//2)//1000].flatten(),
        histnorm='probability density',
        showlegend=False,
        xbins=dict(size=0.2),
    marker_color=qualitative.Plotly[4],
    ),
    row=2,
    col=1
)
fig.add_trace(
    go.Scatter(
        name='Run 1',
        legendgroup='Group 2',
        x=np.arange(1, n_iterations+1),
        y=samples['global.elimination_rate'].values[0],
        mode='lines',
        line_color=qualitative.Plotly[2],
        showlegend=False
    ),
    row=2,
    col=2
)
fig.add_trace(
    go.Scatter(
        name='Run 2',
        legendgroup='Group 3',
        x=np.arange(1, n_iterations+1),
        y=samples['global.elimination_rate'].values[1],
        mode='lines',
        line_color=qualitative.Plotly[1],
        showlegend=False
    ),
    row=2,
    col=2
)
fig.add_trace(
    go.Scatter(
        name='Run 3',
        legendgroup='Group 4',
        x=np.arange(1, n_iterations+1),
        y=samples['global.elimination_rate'].values[2],
        mode='lines',
        line_color=qualitative.Plotly[0],
        showlegend=False
    ),
    row=2,
    col=2
)

# Plot traces and histogram of parameter
fig.add_trace(
    go.Histogram(
        name='Posterior samples',
        legendgroup='Group 1',
        x=samples['global.volume'].values[:, n_iterations//2::(n_iterations//2)//1000].flatten(),
        histnorm='probability density',
        showlegend=False,
        xbins=dict(size=0.5),
    marker_color=qualitative.Plotly[4],
    ),
    row=3,
    col=1
)
fig.add_trace(
    go.Scatter(
        name='Run 1',
        legendgroup='Group 2',
        x=np.arange(1, n_iterations+1),
        y=samples['global.volume'].values[0],
        mode='lines',
        line_color=qualitative.Plotly[2],
        showlegend=False
    ),
    row=3,
    col=2
)
fig.add_trace(
    go.Scatter(
        name='Run 2',
        legendgroup='Group 3',
        x=np.arange(1, n_iterations+1),
        y=samples['global.volume'].values[1],
        mode='lines',
        line_color=qualitative.Plotly[1],
        showlegend=False
    ),
    row=3,
    col=2
)
fig.add_trace(
    go.Scatter(
        name='Run 3',
        legendgroup='Group 4',
        x=np.arange(1, n_iterations+1),
        y=samples['global.volume'].values[2],
        mode='lines',
        line_color=qualitative.Plotly[0],
        showlegend=False
    ),
    row=3,
    col=2
)

# Plot traces and histogram of parameter
fig.add_trace(
    go.Histogram(
        name='Posterior samples',
        legendgroup='Group 1',
        x=samples['Sigma log'].values[:, n_iterations//2::(n_iterations//2)//1000].flatten(),
        histnorm='probability density',
        showlegend=False,
        xbins=dict(size=0.02),
    marker_color=qualitative.Plotly[4],
    ),
    row=4,
    col=1
)
fig.add_trace(
    go.Scatter(
        name='Run 1',
        legendgroup='Group 2',
        x=np.arange(1, n_iterations+1),
        y=samples['Sigma log'].values[0],
        mode='lines',
        line_color=qualitative.Plotly[2],
        showlegend=False
    ),
    row=4,
    col=2
)
fig.add_trace(
    go.Scatter(
        name='Run 2',
        legendgroup='Group 3',
        x=np.arange(1, n_iterations+1),
        y=samples['Sigma log'].values[1],
        mode='lines',
        line_color=qualitative.Plotly[1],
        showlegend=False
    ),
    row=4,
    col=2
)
fig.add_trace(
    go.Scatter(
        name='Run 3',
        legendgroup='Group 4',
        x=np.arange(1, n_iterations+1),
        y=samples['Sigma log'].values[2],
        mode='lines',
        line_color=qualitative.Plotly[0],
        showlegend=False
    ),
    row=4,
    col=2
)

# Visualise prior distribution
parameter_values = np.linspace(4, 16, num=200)
pdf_values = np.exp([
    log_prior._priors[0]([parameter_value])
    for parameter_value in parameter_values])
fig.add_trace(
    go.Scatter(
        name='Prior distribution',
        legendgroup='Group 5',
        x=parameter_values,
        y=pdf_values,
        mode='lines',
        line_color='black',
    ),
    row=1,
    col=1
)

parameter_values = np.linspace(0, 12, num=200)
pdf_values = np.exp([
    log_prior._priors[1]([parameter_value])
    for parameter_value in parameter_values])
fig.add_trace(
    go.Scatter(
        name='Prior distribution',
        legendgroup='Group 5',
        x=parameter_values,
        y=pdf_values,
        mode='lines',
        line_color='black',
        showlegend=False
    ),
    row=2,
    col=1
)

parameter_values = np.linspace(0, 12, num=200)
pdf_values = np.exp([
    log_prior._priors[2]([parameter_value])
    for parameter_value in parameter_values])
fig.add_trace(
    go.Scatter(
        name='Prior distribution',
        legendgroup='Group 5',
        x=parameter_values,
        y=pdf_values,
        mode='lines',
        line_color='black',
        showlegend=False
    ),
    row=3,
    col=1
)

parameter_values = np.linspace(0, 0.6, num=200)
pdf_values = np.exp([
    log_prior._priors[3]([parameter_value])
    for parameter_value in parameter_values])
fig.add_trace(
    go.Scatter(
        name='Prior distribution',
        legendgroup='Group 5',
        x=parameter_values,
        y=pdf_values,
        mode='lines',
        line_color='black',
        showlegend=False
    ),
    row=4,
    col=1
)

fig.update_layout(
    xaxis_title='k_a',
    yaxis_title='p',
    yaxis2_title='k_a',
    xaxis3_title='k_e',
    yaxis3_title='p',
    yaxis4_title='k_e',
    xaxis5_title='v',
    yaxis5_title='p',
    yaxis6_title='v',
    xaxis7_title='sigma',
    yaxis7_title='p',
    xaxis8_title='Number of iterations',
    yaxis8_title='sigma',
    template='plotly_white',
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    ),
    margin=dict(t=0, r=0, l=0)
)
fig.show()

3.4. Assessing convergence: Summary

Run the MCMC algorithm multiple times from different starting points. The chi.SamplingController automatically samples starting points from the prior distribution. Recommended number of runs is 3-5.
Visually assess the convergence of the runs to the same area of parameter space using trace plots i.e. sample values on the y-axis and iteration at which each sample was obtained on the x-axis.
Discard the samples where the runs have not converged as “warm-up”. If the runs do not converge, the MCMC algorithm has to run for more iterations until they converge, or the algorithm may not be suited for the inference problem.
Calculate the \(\hat{R}\) value using the remaining samples. We recommend the az.rhat function implemented in ArViz. Values \(<1.01\) indicate good convergence. Larger values may indicate problems with the inference.
(Optional) For MCMC runs that require a large number of iterations for convergence, it can make sense to subsample the samples. Often, 1000-3000 samples are sufficient to represent a distribution. For n_iterations post warm-up, keeping every n_iterations // 1000 th sample may therefore be sufficient.

For more details, please refer to the previous section.

3.5. Comparing model fits to data

Let us conclude this tutorial by comparing our model fit to the drug concentration measurements. The simplest way to do this is to just focus on the means of the posterior distributions, which we can extract from the summary dataframes presented in Section 3.3.2.

# Fix model parameters
model = chi.ReducedMechanisticModel(mechanistic_model=model)
model.fix_parameters(name_value_dict={
    'dose.drug_amount': 0,
    'global.drug_amount': 0,
})

# Set dosing regimen
dosing_regimen = problem.get_dosing_regimens()['1']
model.set_dosing_regimen(dosing_regimen)

# Extract model parameters from summary
parameters = [
    summary2['mean'].loc[parameter] for parameter in model.parameters()
]

# Simulate result
times = np.linspace(0.2, 3, 200)
simulation = model.simulate(parameters=parameters, times=times)[0]

# Plot results
fig = go.Figure()
fig.add_trace(go.Scatter(
    x=times,
    y=simulation,
    mode='lines',
    name='Fit'
))
fig.add_trace(go.Scatter(
    x=data.Time,
    y=data.Value,
    mode='markers',
    name='Measurements'
))
fig.update_layout(
    xaxis_title='Time',
    yaxis_title='Drug concentration',
    template='plotly_white'
)
fig.show()

In the above code block, we first fix the initial amount parameters of the mechanistic model using the chi.ReducedMechanisticModel class. This class is a wrapper for mechanistic models that implements the ReducedMechanisticModel.fix_parameters() method. This method works analogously to the ProblemModellingController.fix_parameters() and fixes model parameters to constant values. In the next lines, we extract the dosing regimen from the ProblemModellingController. The controller formats the dosing regimens from the dataset, Dataset_1, into a useful format for the chi.PKPDModel, returning the dosing regimens in form of a dictionary, using the IDs of the individuals as keys. This makes it possible to access the dosing regimen of the individual with 'ID 1' in Dataset_1 using the key '1'. In the remaining lines leading up to the plotting, we extract the estimates of the parameter means from the summary dataframe.

We can see that our fit describes the measurements reasonably well. However, it is a little bit unsatisfying that the uncertainty of the parameter estimates is not reflected in the model fit at all. To resolve this shortcoming of just using the parameter means to plot the model fit, Chi implements the chi.PosteriorPredictiveModel, which defines the posterior predictive distribution of the measurement process

\[p(y | \mathcal{D}, r, t) = \int \mathrm{d} \psi \, p(y | \psi, r, t)\, p(\psi | \mathcal{D}).\]

The posterior predictive distribution averages our model of the measurement distribution over the parameter values that we found to be consistent with the data. Before discussing this in more detail, let us take a look at how the posterior predictive distribution compares to the model fit where we just used the means of the parameters. To this end, we sample measurements repeatedly from \(p(y | \mathcal{D}, r, t)\) and estimate the mean and some of its percentiles at each time point.

# Define posterior predictive distribution
predictive_model = problem.get_predictive_model()
samples = samples.sel(draw=slice(n_iterations//2, n_iterations))
posterior_model = chi.PosteriorPredictiveModel(
    predictive_model=predictive_model, posterior_samples=samples)

# Approximate distribution using sampling
n_samples = 1000
conc_samples = posterior_model.sample(
    times=times, n_samples=n_samples, seed=1)

# Reshape samples, so we can calculate mean and percentiles at the different
# time points
reshaped_samples = np.empty(shape=(n_samples, len(times)))
for sample_idx, sample_id in enumerate(conc_samples.ID.unique()):
    reshaped_samples[
        sample_idx] = conc_samples[conc_samples.ID == sample_id].Value.values

# Calculate mean, 5th and 95th percentile of the distribution at each time
# point
means = np.mean(reshaped_samples, axis=0)
lower = np.percentile(reshaped_samples, q=5, axis=0)
upper = np.percentile(reshaped_samples, q=95, axis=0)

In the first few lines of the code block, we implement the chi.PosteriorPredictiveModel by first retrieving the chi.PredictiveModel from the problem modelling controller, and then combining it with the inferred posterior samples. The chi.PredictiveModel defines the measurement distribution, \(p(y | \psi, r, t)\), for the purpose of simulating measurements. Note that \(p(y | \psi, r, t)\) is the same distribution as the one we used to construct the likelihood in Section 3.1. The important difference is that for the likelihood the values of \((y, t, r)\) are fixed to the measurements, while for the chi.PredictiveModel, the times and the dosing regimen can be chosen at will to simulate any measurements of interest.

In the following 2 lines of the code block, we sample measurements from the posterior predictive distribution – 1000 samples at each simulated time point. The samples are returned as a pandas.DataFrame, which we reshape to a numpy array of shape (n_samples, n_times) in the next few lines. This reshaping makes it very easy to calculate the mean value and the percentiles of the samples at each time point, as demonstrated in the final lines of the code block.

The figure visualises the estimated mean and percentiles of the posterior predictive distribution on top of our earlier fit. The code to reproduce the figure is documented below. The area between the percentile estimates marks the region of drug concentration values which can be expected to be occupied by 90% of all measurements, assuming, of course, that our model correctly describes the measurement process. The estimated percentiles are more jittery than the plot with the mean parameter values because we use sampling to estimate the percentiles. As a result, the estimates are subject to sampling error, which can be reduced by increasing the number of samples, n_samples. However, for the purpose of this analysis, the jittering is not substantial enough to warrant a larger number of samples.

Plotting the posterior predictive distribution provides two important insights that the fit with the means of the parameters cannot provide: 1. it can quantify the uncertainty associated with the treatment response. This uncertainty is partially due to measurement noise, but it also is due to the residual uncertainty in the model parameters, as quantified by the posterior distribution. 2. It is able to better reflect nonlinearities between model parameters and the model output which are otherwise neglected when plotting the fit with the means of the model parameters. We can see an indication of this in the figure by noting that the fit with the parameter means is not the same as the mean of the posterior predictive distribution. This difference is a result of the, in general, nonlinear dependence of treatment response models on their parameters, meaning that the average treatment response under parametric uncertainty is not the same as the treatment response predicted with the mean parameters. In fact, these two predictions can differ substantially from each other, especially when predictions are made for future times or for previously unexplored dosing regimens.

# Plot results
fig = go.Figure()
fig.add_trace(go.Scatter(
    x=times,
    y=simulation,
    mode='lines',
    name='Fit: mean parameters'
))
fig.add_trace(go.Scatter(
    x=data.Time,
    y=data.Value,
    mode='markers',
    name='Measurements'
))
fig.add_trace(go.Scatter(
    x=times,
    y=means,
    mode='lines',
    name='Fit: mean posterior pred. distr.',
    line=dict(color='black')
))
fig.add_trace(go.Scatter(
    x=times,
    y=lower,
    mode='lines',
    name='Fit: 5th-95th perc. posterior pred. distr.',
    line=dict(color='black', dash='dash')
))
fig.add_trace(go.Scatter(
    x=times,
    y=upper,
    mode='lines',
    name='Fit: 5th-95th perc. posterior pred. distr.',
    line=dict(color='black', dash='dash'),
    showlegend=False
))
fig.update_layout(
    xaxis_title='Time',
    yaxis_title='Drug concentration',
    template='plotly_white',
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    ),
)
fig.show()

This concludes this tutorial. If you have any feedback or suggestions for improvement, or would like to report any typos, mistakes or bugs, please do reach out to us, for example by creating an Issue. We are looking forward to hearing from you!

3.6. Reference to ErrorModel, LogPDF and PredictiveModel API

`chi.ErrorModel`()	A base class for error models for the one-dimensional output of `MechanisticModel` instances.
`chi.GaussianErrorModel`()	An error model which assumes that the model error follows a Gaussian distribution.
`chi.LogNormalErrorModel`()	An error model which assumes that the model error follows a Log-normal distribution.
`chi.MultiplicativeGaussianErrorModel`()	An error model which assumes that the model error is a Gaussian heteroscedastic noise.
`chi.ConstantAndMultiplicativeGaussianErrorModel`()	An error model which assumes that the model error is a mixture between a Gaussian base-level noise and a Gaussian heteroscedastic noise.
`chi.ReducedErrorModel`(error_model)	A class that can be used to permanently fix model parameters of an `ErrorModel` instance.
`chi.LogLikelihood`(mechanistic_model, ...[, ...])	A log-likelihood that quantifies the likelihood of parameter values to capture the measurements within the model approximation of the data-generating process.
`chi.LogPosterior`(log_likelihood, log_prior)	A log-posterior constructed from a log-likelihood and a log-prior.
`chi.ProblemModellingController`(...[, outputs])	A problem modelling controller which simplifies the model building process of a pharmacokinetic and pharmacodynamic problem.
`chi.SamplingController`(log_posterior[, seed])	Sets up a sampling routine that attempts to find the posterior distribution of parameters defined by a `pints.LogPosterior`.
`chi.PredictiveModel`(mechanistic_model, ...)	Implements a model of a data-generating process.
`chi.PosteriorPredictiveModel`(...[, param_map])	Implements the posterior predictive model of the modelled data-generating process and the associated parameter posterior distribution.