3. Fitting mechanistic models to data
In the previous tutorial, The mechanistic model, we have seen how we can implement and simulate treatment response models in Chi. For example, we can simulate the time course of drug concentration levels by 1. implementing a model in SBML; 2. setting the route of administration and the dosing regimen; 3. simulating the drug concentration for a given set of model parameters.
import os
import chi
import numpy as np
import plotly.graph_objects as go
# Implement the model
directory = os.path.dirname(__file__)
filename = os.path.join(directory, 'one_compartment_pk_model.xml')
model = chi.PKPDModel(sbml_file=filename)
model.set_outputs(['global.drug_concentration'])
# Set administration and dosing regimen
model.set_administration(compartment='global', direct=False)
model.set_dosing_regimen(dose=2, period=1)
# Simulate treatment response
parameters = [0, 0, 10, 1, 2]
times = np.linspace(start=0, stop=3, num=200)
conc = model.simulate(parameters=parameters, times=times)[0]
# Plot results
fig = go.Figure()
fig.add_trace(go.Scatter(
x=times,
y=conc,
mode='lines',
))
fig.update_layout(
xaxis_title='Time',
yaxis_title='Drug concentration',
template='plotly_white'
)
fig.show()
This functionality to simulate treatment responses is pretty cool in its own right! For example, it can help us to study nonlinearities in the treatment response dynamics and to optimise dosing regimens to target a desired treatment response.
However, at this point, the simulated treatment responses have little to do with real treatment responses. To describe real treatment responses that we may observe in clinical practice, we need to somehow connect our model to reality.
The most common approach to relate models to real treatment responses is to compare the model predictions to measurements. Below, we have prepared an example dataset with drug concentration measurements. These drug concentrations were measured after repeatedly adminstering 2 mg of a drug every 24 hours. You can download the dataset from the following link: Dataset_1.
ID |
Time |
Time unit |
Observable |
Value |
Observable unit |
Duration |
Dose |
Dose unit |
---|---|---|---|---|---|---|---|---|
1 |
0.5 |
Day |
Drug concentration |
0.198 |
ng/mL |
|||
1 |
1.0 |
Day |
Drug concentration |
0.123 |
ng/mL |
|||
1 |
1.5 |
Day |
Drug concentration |
0.305 |
ng/mL |
|||
1 |
2.0 |
Day |
Drug concentration |
0.184 |
ng/mL |
|||
1 |
2.5 |
Day |
Drug concentration |
0.421 |
ng/mL |
|||
1 |
3.0 |
Day |
Drug concentration |
0.306 |
ng/mL |
|||
1 |
0.0 |
Day |
0.01 |
2.0 |
mg |
|||
1 |
1.0 |
Day |
0.01 |
2.0 |
mg |
|||
1 |
2.0 |
Day |
0.01 |
2.0 |
mg |
The dataset contains one column that identifies measured individuals (ID
),
two columns that specify measurement times and dose administration times
(Time
and Time unit
), three columns that specify measured values
(Observable
, Value
, Observable unit
), and three columns that specify
dose administrations (Dose
, Duration
, Dose unit
).
Downloading the file and saving it in the same directory as the Python script, we can visualise the measurements by executing the below script
import pandas as pd
# Import data
directory = os.path.dirname(__file__)
data = pd.read_csv(directory + '/dataset_1.csv')
# Plot data
fig = go.Figure()
fig.add_trace(go.Scatter(
x=data.Time,
y=data.Value,
mode='markers',
))
fig.update_layout(
xaxis_title='Time in day',
yaxis_title='Drug concentration in ng/mL',
template='plotly_white'
)
fig.show()
The figure shows the drug concentration measurements as blue scatter points. The treatment response dynamics indicated by the measurements is similar to the simulated treatment response in the previous code block. But looking more closely at the magnitude of the values, it appears that the measured values are much smaller than the simulated ones. We can therefore conclude that, at this point, our model does not provide an accurate description of the measured treatment response.
To find a better description of the treatment response, we have two options: 1. we can try to find parameter values that improve the proximity of the model output to the measurements; or 2. we can define a new mechanistic model and see whether this new model is able to describe the measurements better. This tutorial will be about the former and will detail how we can find better model parameters for a given model structure.
3.1. Estimating model parameters from data: Background
Before we can try to find parameter values that describe the observed treatment response most closely, we first need to agree on what we mean by “most closely” for the relationship between the mechanistic model output and the measurements. An intuitive way to define this notion of closeness is to use the distance between the measurements and the model output, i.e. the difference between the measured values and the simulated values. Then the model parameters that most closely describe the measurements would be those parameter values that make the mechanistic model output perfectly match the measurements, resulting in distances of 0 ng/mL between the model output and the measurements at all measured time points. However, as outlined in Sections 1.3 and 1.4 of the Quick overview, measurements of treatment responses are imprecise and noisy, and will therefore not perfectly represent the treatment response dynamics. Consequently, if we were to match the model outputs to measurements perfectly, we would end up with an inaccurate description of the treatment response as our model would be paying too much attention to the measurement noise.
One way to improve our notion of closeness is to incorporate the measurement
process into our computational model of the treatment response, thereby
explicitly stating that we do not expect the mechanistic model output to match
the measurements perfectly. In Chi, this can be done
using chi.ErrorModel
s. Error models promote the single value output
of mechanistic model simulations to a distribution of
values. This distribution characterises a
range of values around the mechanistic model output where measurements may be
expected.
For simulation, this distribution can be used to sample measurement values and
imitate the measurement process of real treatment responses, see
Section 1.3 in the Quick overview for an example. For parameter estimation,
the distribution can be used to quantify the likelihood with which the observed
measurements would have been generated by our model,
see Section 1.4 in the Quick overview. To account for measurement noise
during the parameter estimation, we therefore
choose to quantify the closeness between the model output an the measurements
using likelihoods.
Formally, we denote the measurement distribution by \(p(y | \psi, t, r)\), where \(y\) denotes the measurement value, \(\psi\) denotes the model parameters, \(t\) denotes the time point of the measurement, and \(r\) denotes the adminstered dosing regimen. With this measurement distribution in place, we can quantify the likelihood with which any given set of measurements would have been generated by the model. For example, the likelihood of a measurement \(y_1\) at time \(t_1\) under dosing regimen \(r^*\) is defined by the value of the probability density of the measurement distribution evaluated at the measurement, \(p(y_1 | \psi, t_1, r^*)\). Note that this likelihood depends on the choice of model parameters, \(\psi\). The model parameters with the maximum likelihood are the parameter values that most closely describe the measurements.
Note
The measurement distribution, \(p(y | \psi, t, r)\), is defined by the mechanistic model output and the error model. To illustrate this, let use denote the simulated drug concentration values of the the 1-compartment PK model by \(c(\psi, t, r)\). By definition of the model, the drug concentration values, \(c\), are a function of the model parameters, \(\psi = (a_0, k_a, k_e, v)\), the time, \(t\), and the dosing regimen, \(r\).
1. If we choose a chi.GaussianErrorModel
to describe the difference
between the model output and the measurements, we assume that measurements
are distributed according to a Normal distribution around the model output
where \(\sigma\) defines the width of the distribution. For ease of notation, we extend the definition of the model parameters to include \(\sigma\), \(\psi = (a_0, k_a, k_e, v, \sigma)\).
We can see that the model output defines the mean or Expectation Value of the measurement distribution.
2. If we choose a chi.LogNormalErrorModel
to describe the difference
between the model output and the measurements, we assume that measurements
are distributed according to a lognormal distribution around the model output
One can show that also for this distribution the model output defines the mean or Expectation Value of the measurement distribution.
The main difference between the two distributions is the shape. The
chi.GaussianErrorModel
is symmetrically distributed around the
model output, while chi.LogNormalErrorModel
is scewed in such a
way that measurements can never assume negative values. To visualise these
differences, we recommend simulating many measurements with different
error models, similar to Section 1.3 in Quick overview. But instead
of choosing different times, sample all measurements at the same time. You
can then histogram the samples, using for example go.Histogram
, as used
in Section 1.4.2 in Quick overview, to visualise the shape of
the probability density.
Assuming independence of measurements, the likelihood for a dataset with \(n\) measurements, \(\mathcal{D}=((t_1, y_1), (t_2, y_2), \ldots, (t_n, y_n), r)\), is given by the product of the individual likelihoods
where independence refers to the assumption that measurements are independently and identically distributed according to our model of the measurement process (this does not have to be the case, and is especially unlikely to be the case when our model fails to describe the measurement process accurately).
With this likelihood in place, Chi makes it straightforward to run numerical optimisation algorithms in order to derive the maximum likelihood estimates that best describe the measurements, see Section 1.4.1 in the Quick overview.
Alternatively, Chi also supports Bayesian inference of the model parameters: Bayesian inference is conceptually different from maximum likelihood estimation as it does not seek to find a single set of model parameters that “best” describes the observed measurements. Instead, Bayesian inference acknowledges the fact that noisy measurements leave uncertainty about which model parameters best describe the data, and therefore focuses instead on deriving a distribution of parameter values which are all consistent with the observed measurements, see Section 1.4.2 in Quick overview for a more detailed discussion. This distribution is derived from the likelihood using Bayes’ rule
where \(p(\psi)\) denotes the prior distribution of the model parameters. The prior distribution quantifies the baseline distribution of parameter values before the parameter estimation, making it possible to inform the inference results with otherwise hard-to-quantify knowledge of feasible or likely parameter values prior to the inference.
3.2. Defining the log-posterior
In Chi, we can derive posterior distributions from measurements
using Markov chain
Monte Carlo (MCMC) algorithms implemented in the open source Python package
Pints.
In Sections 1.4.1 and 1.4.2 in the Quick overview,
we showed in some detail how we can define (log-)posterior distributions,
chi.LogPosterior
, for this purpose. Here, we want to show
how we can use the chi.ProblemModellingController
to simplify and automate
this process as much as possible.
The tricky bit when implementing log-posteriors for treatment response models is often that log-posteriors do not only depend on the treatment response measurements, \(((t_1, y_1), (t_2, y_2), \ldots, (t_n, y_n))\), but also on the administered dosing regimen, \(r\). This can make it tedious to define log-posteriors, especially when parameters across measurements of multiple individuals with different dosing regimens are inferred, as dosing regimens of the model will have to be manually set to the dosing regimens administered during the measurement process. A mismatch between the dosing regimens would render the inference results invalid.
To eliminate this source of error, we have implemented the chi.ProblemModellingController
.
The chi.ProblemModellingController
facilitates the construction of
log-posteriors and reduces the workflow to a simple 4-step approach:
Definition of the mechanistic model
Definition of the error model
Definition of the measurements
Definition of the prior distribution
In the below code block, we illustrate this workflow for the above drug concentration dataset, Dataset_1.
import pints
# Define mechanistic model
directory = os.path.dirname(__file__)
filename = os.path.join(directory, 'one_compartment_pk_model.xml')
model = chi.PKPDModel(sbml_file=filename)
model.set_administration(compartment='global', direct=False)
model.set_outputs(['global.drug_concentration'])
# Define error model
error_model = chi.LogNormalErrorModel()
# Define data
directory = os.path.dirname(__file__)
data = pd.read_csv(directory + '/dataset_1.csv')
# Define prior distribution
log_prior = pints.ComposedLogPrior(
pints.GaussianLogPrior(mean=10, sd=2), # absorption rate
pints.GaussianLogPrior(mean=6, sd=2), # elimination rate
pints.LogNormalLogPrior(log_mean=0, scale=1), # volume of distribution
pints.LogNormalLogPrior(log_mean=-2, scale=0.5) # scale of meas. distrib.
)
# Define log-posterior using the ProblemModellingController
problem = chi.ProblemModellingController(
mechanistic_model=model, error_models=[error_model])
problem.set_data(
data=data,
output_observable_dict={'global.drug_concentration': 'Drug concentration'}
)
problem.fix_parameters(name_value_dict={
'dose.drug_amount': 0,
'global.drug_amount': 0,
})
problem.set_log_prior(log_prior=log_prior)
log_posterior = problem.get_log_posterior()
The first four blocks in the code define the individual components of the
log-posterior: the mechanistic model, the error model, the data, and the prior
distribution. Note that the administration of the dosing regimen is set
before passing the mechanistic model to the ProblemModellingController
.
The prior distribution defines marginal distributions for the parameters, and
is implemented using Pints. In Bayesian inference, we can use the prior
distribution to bias the inference
results towards feasible areas of the parameter space. In this
case, we have little prior knowledge about feasible parameter ranges, so we
choose relatively uninformative prior
distributions (below will be an illustration of the prior distribution).
Note that we seem to be specifying marginal prior distributions for only
4 of the 6 model parameters. This is because we fix the values of the initial drug
amounts to 0
prior to the inference
in the lines below, reflecting our knowledge that the subject had no prior
exposure to the drug before starting the trial. This reduces the number of model
parameters from 6 to 4. The fixing of model parmaters
is optional, but can sometimes improve the inference results when some model
parameters are already well understood.
For the remaining 4 parameters, only positive values make biological sense, so
we choose prior distributions that focus on positive values. For
two model parameters, the volume of distribution and the scale parameter,
negative or zero values are particularly bad as they will break the simulation
(1. a volume of zero causes a division by zero error; and 2. the lognormal distribution
is only defined for positive sigma). We therefore use pints.LogNormalLogPrior
to constrain those parameters to strictly positive values.
In the final block of the code, we define the log-posterior. In the first line,
we specify the mechanistic model and the error model. In the next line, we set
the dataset. Note that we need to use the output_observable_dict
to map the
output variable of the model, global.drug_concentration
, to the Observable name
in the dataset, Drug concentration
. Other specifications are not required, and
dosing regimens are automatically set, when the dosing regimen related columns,
Dose
and Duration
, are present in the dataset. In the following line, we
fix the initial drug amounts to 0
using ProblemModellingController.fix_parameters()
.
You can use this method to fix any parameters of the model.
In the last two lines, we set the log-prior and implement the log-posterior using
the ProblemModellingController.get_log_posterior
method.
3.3. Inferring the posterior distribution
With this chi.LogPosterior
in place, we can infer the posterior
distribution using any MCMC algorithm of our choice. Recall that inference, in this context,
means the reconstruction of the posterior distribution from the
chi.LogPosterior
, which defines the log-pdf of the posterior distribution
up to an unknown constant shift
By comparison to above Bayes rule, you will find that \(\text{constant} = -\log p(\mathcal{D})\), so the constant shift may seem not so ‘unkown’ after all. However, unknown here does not mean that we cannot write down an expression for it, it refers to the fact that for most treatment response models \(p(\mathcal{D})\) is (practically) impossible to evaluate, as evaluating \(p(\mathcal{D})\) requires the numerical integration of the likelihood-prior product over the full parameter space, \(p(\mathcal{D}) = \int \mathrm{d} \psi \, p(\mathcal{D}, \psi ) = \int \mathrm{d} \psi \, p(\mathcal{D}| \psi )\, p(\psi)\). This renders the value of the constant shift for all intents and purposes unknown.
The unknown shift makes it impossible to make statements about the absolute probability
of parameter values. However, it does allow for relative comparisons of
probabilities – a fact exploited by MCMC algorithms to circumvent the limitation
of the partially known log-posterior. MCMC algorithms use
the relative comparison of parameter probabilities to generate random samples from the
posterior distribution, opening a gateway to reconstruct the distribution. The
more random samples are generated, the closer the histogram over the samples will
approximate the posterior distribution. In fact, one can show that the histogram
will converge to the posterior distribution as the number of samples approaches
infinity. This makes it possible for MCMC algorithms
to reconstruct any posterior distribution from a chi.LogPosterior
.
To illustrate this, let us run an MCMC algorithm to infer the above defined posterior distribution of the 1-compartment PK model.
# Run MCMC algorithm
n_iterations = 20000
controller = chi.SamplingController(log_posterior=log_posterior, seed=1)
controller.set_n_runs(n_runs=3)
controller.set_parallel_evaluation(False)
controller.set_sampler(pints.HaarioBardenetACMC)
samples = controller.run(n_iterations=n_iterations, log_to_screen=True)
In the code block, we use an MCMC algorithm implemented in Pints, called
pints.HaarioBardenetACMC
. For technical reasons that we will discuss below,
we run the algorithm three times for 20,000 iterations.
Note that we use the chi.SamplingController
to set the number of
runs, the number of iterations, and to run the sampling algorithm. The
chi.SamplingController
can also do other things, such as running the
chains in parallel, but we will not go into this additional functionality in this
tutorial, and refer instead to the API reference.
Executing the above code block will spawn a response of the below form.
The left most column indicates the current
iteration of the MCMC algorithm. The other columns show the total number of
log-posterior evaluations and the fraction of accepted MCMC proposals of each
chain. The target acceptance ratio depends on the details of the MCMC algorithm.
For the pints.HaarioBardenetACMC
the target is 0.23
.
Using Haario-Bardenet adaptive covariance MCMC
Generating 3 chains.
Running in sequential mode.
Iter. Eval. Accept. Accept. Accept. Time m:s
0 3 0 0 0 0:00.0
1 6 0 0.5 0.5 0:00.0
2 9 0.333 0.667 0.333 0:00.0
3 12 0.25 0.75 0.5 0:00.0
20 63 0.714 0.571 0.476 0:00.0
40 123 0.756 0.61 0.561 0:00.0
60 183 0.738 0.475 0.59 0:00.0
80 243 0.716 0.407 0.642 0:00.1
100 303 0.693 0.406 0.653 0:00.1
120 363 0.736 0.421 0.686 0:00.1
.
.
.
When the three runs of the algorithm terminate, the generated samples are returned in
form of an xarray.Dataset
.
We can visualise the samples using the code
block documented at the end of this section (we move the code block to the end
to avoid disrupting the flow of the tutorial with less relevant code snippets).
The left column of the figure shows the histogram over the samples across the three runs in orange, as well as the probability density of the prior distribution in black. The first row shows the samples of the absorption rate, the second row shows the samples of the elimination rate, the third row shows the samples of the volume of distribution, and the fourth row shows the samples of the scale parameter of the error model, sigma. The right column of the figure shows the same samples, this time visualised over the iterations at which they were drawn. The samples from the different runs are illustrated in different colours: run 1 (green); run 2 (red); run 3 (blue).
3.3.1. The posterior distribution
The orange distribution is the result of the inference – the posterior distribution. It contains all parameter values that are consistent with the drug concentration measurements and our prior knowledge, assigning each set of parameter values with a probability of being the data-generating parameter values. Notably, the figure shows that our prior knowledge and Dataset_1 are insufficient to conclude on a single set of parameter values (see orange distribution). Instead, the measurements only allow us to refine our understanding of feasible parameter values. For example, we can see in the second row of the figure that the marginal posterior distribution substantially differs from the marginal prior distribution. This is because the drug concentration measurements contain important information about the elimination rate, rendering rates above 1.5 1/day or below 0.25 1/day as extremely unlikely for the model of the treatment response. This in in stark contrast to the relatively wide range of model parameters that we deemed feasible prior to the inference (see black line). However, the measurements are not conclusive enough to reduce the distribution of feasible elimination rates to a single value. Similarly, for the volume of distribution (row 3) and the error scale parameter (row 4), the measurements lead to substaintial updates relative to the prior distribution. In comparison, the measurements appear less informative about the absorption rate (see row 1), given that the marginal posterior distribution of the absorption rate is almost identical to its prior distribution. We will have a closer look at an intuitive understanding of why the measurements contain little information about the absorption rate below. The take-away from this discussion is that inferring distributions of parameter values consistent with the measurements is in many ways more informative than just estimating a single set of model parameters, as it can help to quantify the feasibility of different parameter values and the overall remaining level of uncertainty. In a Bayesian setting, inferring posterior distributions also allows us to supplement the information contained in the measurements by prior knowledge of feasible model parameters, facilitating the inference of parameter values when treatment response measurements are limited.
3.3.2. Assessing convergence
Similar to most numerical inference algorithms, MCMC algorithms have practical limitations, making them not equally well suited to all inference problems. Some MCMC algorithms are, for example, only well suited to estimate model parameters when the total number of parameters is small, while other MCMC algorithms only work when the curvature of the posterior surface is not too extreme. One way to test the suitability of an MCMC algorithm for a particular inference problem is the use of the \(\hat{R}\) metric. We will motivate the metric below and provide an intuitive explanation. Technical details are left to interested readers to explore on their own.
Let us begin this section by revisiting the right column in the figure above. The column shows the samples from the three MCMC algorithm runs at each iteration. For early iterations of the algorithm, the samples from the MCMC runs look quite distinct – each run appears to sample from a different area of the parameter space. In contrast, the MCMC runs seem to converge and sample from the same area of the parameter space at later iterations. Intuitively, it does not really make sense for the samples from the MCMC runs to look different – after all, we use the same MCMC algorithm to sample from the same posterior distribution. The histogram over the samples should therefore be identical within the limits of sampling variation. However, if we plot the samples from each of the three runs separately, we find that the histograms actually look quite different. For illustrative purposes, we focus on the absorption rate in the code block.
fig = go.Figure()
# Plot histograms
fig.add_trace(
go.Histogram(
name='Run 1',
x=samples['dose.absorption_rate'].values[0, ::n_iterations//1000],
histnorm='probability density',
showlegend=True,
xbins=dict(size=0.5),
marker_color=qualitative.Plotly[2],
)
)
fig.add_trace(
go.Histogram(
name='Run 2',
x=samples['dose.absorption_rate'].values[1, ::n_iterations//1000],
histnorm='probability density',
showlegend=True,
xbins=dict(size=0.5),
marker_color=qualitative.Plotly[1],
)
)
fig.add_trace(
go.Histogram(
name='Run 3',
x=samples['dose.absorption_rate'].values[2, ::n_iterations//1000],
histnorm='probability density',
showlegend=True,
xbins=dict(size=0.5),
marker_color=qualitative.Plotly[0],
)
)
fig.update_layout(
template='plotly_white',
xaxis_title='Absorption rate in 1/day',
yaxis_title='Probability density',
barmode='overlay'
)
fig.update_traces(opacity=0.75)
fig.show()
We select the samples of the absorption rate from the xarray.Dataset
using
the name of the parameter in the mechanistic model,
samples['dose.absorption_rate'].values
. The returned object is a numpy array
of shape (n_runs, n_iterations)
. We can select the samples from
run n using the index n-1, e.g. for run 1: samples['dose.absosption_rate'].values[0]
.
In addition, we subsample the MCMC samples in the above code block, which we can,
for now, regard as a cosmetic choice.
We can see that the histograms over the samples from the three runs look very different. This seems contradictory to the orange distribution which we presented above as the histogram over the MCMC samples across the runs. But also, how can it makes sense for the three histograms to differ when the posterior distribution is the same? – It does not, and in fact, it tells us that the three histograms in the figure above do not represent the posterior distribution. Although, we did not disclose it until now, the orange distribution, respresenting the posterior distribution, is only based on the second half of each MCMC run!
So why is it crucial that we only choose the second half of each MCMC run, and is the particular choice of the second half important? The answer comes back to a common limitation of all MCMC algorithm which we can see in the right column of the figure presented earlier: MCMC algorithms generate samples from the posterior distribution conditional on the latest generated sample. For some MCMC algorithms, this conditioning has little influences on sequential samples because the internal sampling strategy is advanced enough to sufficiently decorrelate subsequent samples. But for many MCMC algorithms the conditioned sample substantially influences the sampled value. That means that if the last samples of two MCMC runs are far away from each other, the following samples of the runs will also differ substantially from each other, see for example the first 5000 iterations of the elimination rate in the second row of the figure in the previous section.
At the start of an MCMC run, there is no “last sampled” parameter value,
and we have to manually provide a starting point to the MCMC algorithm.
The chi.SamplingController
automatically samples these starting points from the prior distribution, to
reduce the number of manual steps during the inference in Chi.
This means that the above runs of the MCMC algorithm start from three
different positions in parameter space. These starting points have little
to do with the posterior distribution, and are potentially far away from the area of
the parameter space that is interesting for the posterior distribution.
It therefore makes sense that during the early iterations of the runs, the sampled
values of the MCMC runs do not agree. Fortunately, MCMC algorithms are constructed
in such a way that their samples are, at least in theory, guaranteed to converge
to the posterior distribution, meaning that runs of MCMC algorithms starting from
different areas in parameter space can be expected to converge to the same area.
In practice, numerical limitations may nevertheless
make it impossible, or very time-consuming for samples from MCMC algorithms not
suited to the inference problem to
converge to the posterior distribution. Starting multiple runs from different
areas in parameter space therefore provides an effective way of testing whether
a given MCMC algorithm is suited for the inference of the problem at hand. If
the runs do not converge, the MCMC algorithm either needs more iterations to converge, or
is not suited for the inference problem, in which case the MCMC samples cannot
be expected to represent the posterior distribution. If the runs do converge,
we can be confident that the inference results are correct (although there is always
a chance for edge cases where this convergence test fails to identify the
limitations of an algorithm).
So, starting multiple MCMC runs from different locations in parameter space is a good idea to gain confidence in the inference results. We recommend running the MCMC algorithm from 3-5 different starting points, randomly sampled from the prior distribution. On the one hand, a large number of starting points will test the suitability of the MCMC algorithm more thoroughly, but each run also comes at a computational cost. 3-5 runs therefore provide a good tradeoff. In addition, starting points randomly sampled from the prior usually make sure that the starting points are not too close together, so that the suitability of the MCMC algorithm is indeed tested. At the same time, it also ensures that the starting points are not extremely far away from each other, located in areas of the parameter space that we do not even deem relevant for the inference result. So, even if the MCMC algorithm would encounter problems in these extreme areas of parameter space, it may not be very relevant for our inference problem. Randomly sampling from the prior distribution therefore provides a good tradeoff between well dispersed and not too extreme starting points.
In conclusion, the initial iterations of the MCMC runs can be used to establish the validity of the inference results when the starting points of different runs are sufficiently dispersed. At the same time, these initial samples have little to do with the posterior distribution, as outlined above. It is therefore common practice to exclude these early samples from the inference result. To identify which samples should be discarded, and which should be included in the result, the earlier presented “trace plot”, visualising the samples of the different runs at each iteration can be useful. We can see, for example, that somewhere around the 8000s iteration the three runs of the MCMC algorithm converge. We therefore choose the 10,000s iteration as the cut-off for the “warm-up” (2000 extra iteration for good measure). Below, we plot the histogram over the samples from the three chains, this time using only the samples from the second half of each run.
fig = go.Figure()
# Plot histograms
fig.add_trace(
go.Histogram(
name='Run 1',
x=samples['dose.absorption_rate'].values[0, n_iterations//2::(n_iterations//2)//1000],
histnorm='probability density',
showlegend=True,
xbins=dict(size=0.5),
marker_color=qualitative.Plotly[2],
)
)
fig.add_trace(
go.Histogram(
name='Run 2',
x=samples['dose.absorption_rate'].values[1, n_iterations//2::n_iterations//1000],
histnorm='probability density',
showlegend=True,
xbins=dict(size=0.5),
marker_color=qualitative.Plotly[1],
)
)
fig.add_trace(
go.Histogram(
name='Run 3',
x=samples['dose.absorption_rate'].values[2, n_iterations//2::n_iterations//1000],
histnorm='probability density',
showlegend=True,
xbins=dict(size=0.5),
marker_color=qualitative.Plotly[0],
)
)
fig.update_layout(
template='plotly_white',
xaxis_title='Absorption rate in 1/day',
yaxis_title='Probability density',
barmode='overlay'
)
fig.update_traces(opacity=0.75)
fig.show()
We can see that the histograms over the samples are now in much better agreement!
This visual assessment of the convergence already gets us very far, but for those
that would prefer more quantitative metrics to assess the convergence of MCMC
runs, we recommend the open source library ArViz. ArViz implements a number
of widely established metrics to assess the convergence of MCMC algorithms,
including the \(\hat{R}\) metric and the effective sample size. We can estimate
both values (and many more) using the az.summary
function. In the below code
block, we first estimate the values for all MCMC samples, and then estimate
the values only for the samples post warm-up.
import arviz as az
# Summary for all samples
summary1 = az.summary(samples)
print(summary1)
# Summary for the samples post warmup
summary2 = az.summary(samples.sel(draw=slice(n_iterations//2, n_iterations)))
print(summary2)
The return of az.summary
is a pandas.DataFrame
, containing the estimated
values, as illustrated below
mean |
sd |
hdi_3% |
hdi_97% |
mcse_mean |
mcse_sd |
ess_bulk |
ess_tail |
r_hat |
|
---|---|---|---|---|---|---|---|---|---|
dose.absorption_rate |
8.898 |
3.516 |
0.399 |
13.007 |
1.503 |
1.125 |
10.0 |
11.0 |
1.21 |
global.elimination_rate |
1.525 |
1.959 |
0.207 |
6.662 |
0.938 |
0.714 |
9.0 |
11.0 |
1.25 |
global.volume |
6.018 |
2.529 |
0.47 |
9.233 |
1.05 |
0.783 |
8.0 |
11.0 |
1.26 |
Sigma log |
0.185 |
0.072 |
0.091 |
0.295 |
0.004 |
0.003 |
502.0 |
264.0 |
1.01 |
mean |
sd |
hdi_3% |
hdi_97% |
mcse_mean |
mcse_sd |
ess_bulk |
ess_tail |
r_hat |
|
---|---|---|---|---|---|---|---|---|---|
dose.absorption_rate |
10.014 |
1.957 |
6.456 |
13.778 |
0.046 |
0.033 |
1802.0 |
3110.0 |
1.0 |
global.elimination_rate |
0.795 |
0.261 |
0.351 |
1.301 |
0.006 |
0.005 |
1850.0 |
2121.0 |
1.0 |
global.volume |
6.862 |
1.605 |
3.923 |
9.947 |
0.038 |
0.027 |
1757.0 |
2295.0 |
1.0 |
Sigma log |
0.178 |
0.053 |
0.095 |
0.278 |
0.001 |
0.001 |
1651.0 |
2381.0 |
1.0 |
The summary output contains the estimates of the mean value, the standard deviation,
and many other values. For the assessment of the convergence, we would like to focus
on the \(\hat{R}\) values in the right-most column of the dataframes, the r_hat
columns.
We can see that the summary provides one estimate for each parameter. For the
samples post warm-up, the estimates are all 1.00
, while the estimates for all
samples assume larger values. Loosely speaking, the \(\hat{R}\) metric compares
the variance of samples across runs to the variance of samples within runs. If
samples from different runs are very different, the variance across runs is much larger
than the variance of samples within the same run, leading to an estimate \(>1\).
On the flip side, if the inter-run variance is the same as the intra-run variance,
the samples across runs look the same and the \(\hat{R}\) metric returns a
value of \(1\). As a result, \(\hat{R}\) estimates of \(1\) indicate
convergence, while estimate \(>1\) indicate that the runs have not yet fully converged.
from plotly.colors import qualitative
from plotly.subplots import make_subplots
# Plot results
fig = make_subplots(
rows=4, cols=2, vertical_spacing=0.15, horizontal_spacing=0.1)
# Plot traces and histogram of parameter
fig.add_trace(
go.Histogram(
name='Posterior samples',
legendgroup='Group 1',
x=samples['dose.absorption_rate'].values[:, n_iterations//2::(n_iterations//2)//1000].flatten(),
histnorm='probability density',
showlegend=True,
xbins=dict(size=0.5),
marker_color=qualitative.Plotly[4],
),
row=1,
col=1
)
fig.add_trace(
go.Scatter(
name='Run 1',
legendgroup='Group 2',
x=np.arange(1, n_iterations+1),
y=samples['dose.absorption_rate'].values[0],
mode='lines',
line_color=qualitative.Plotly[2],
),
row=1,
col=2
)
fig.add_trace(
go.Scatter(
name='Run 2',
legendgroup='Group 3',
x=np.arange(1, n_iterations+1),
y=samples['dose.absorption_rate'].values[1],
mode='lines',
line_color=qualitative.Plotly[1],
),
row=1,
col=2
)
fig.add_trace(
go.Scatter(
name='Run 3',
legendgroup='Group 4',
x=np.arange(1, n_iterations+1),
y=samples['dose.absorption_rate'].values[2],
mode='lines',
line_color=qualitative.Plotly[0],
),
row=1,
col=2
)
# Plot traces and histogram of parameter
fig.add_trace(
go.Histogram(
name='Posterior samples',
legendgroup='Group 1',
x=samples['global.elimination_rate'].values[:, n_iterations//2::(n_iterations//2)//1000].flatten(),
histnorm='probability density',
showlegend=False,
xbins=dict(size=0.2),
marker_color=qualitative.Plotly[4],
),
row=2,
col=1
)
fig.add_trace(
go.Scatter(
name='Run 1',
legendgroup='Group 2',
x=np.arange(1, n_iterations+1),
y=samples['global.elimination_rate'].values[0],
mode='lines',
line_color=qualitative.Plotly[2],
showlegend=False
),
row=2,
col=2
)
fig.add_trace(
go.Scatter(
name='Run 2',
legendgroup='Group 3',
x=np.arange(1, n_iterations+1),
y=samples['global.elimination_rate'].values[1],
mode='lines',
line_color=qualitative.Plotly[1],
showlegend=False
),
row=2,
col=2
)
fig.add_trace(
go.Scatter(
name='Run 3',
legendgroup='Group 4',
x=np.arange(1, n_iterations+1),
y=samples['global.elimination_rate'].values[2],
mode='lines',
line_color=qualitative.Plotly[0],
showlegend=False
),
row=2,
col=2
)
# Plot traces and histogram of parameter
fig.add_trace(
go.Histogram(
name='Posterior samples',
legendgroup='Group 1',
x=samples['global.volume'].values[:, n_iterations//2::(n_iterations//2)//1000].flatten(),
histnorm='probability density',
showlegend=False,
xbins=dict(size=0.5),
marker_color=qualitative.Plotly[4],
),
row=3,
col=1
)
fig.add_trace(
go.Scatter(
name='Run 1',
legendgroup='Group 2',
x=np.arange(1, n_iterations+1),
y=samples['global.volume'].values[0],
mode='lines',
line_color=qualitative.Plotly[2],
showlegend=False
),
row=3,
col=2
)
fig.add_trace(
go.Scatter(
name='Run 2',
legendgroup='Group 3',
x=np.arange(1, n_iterations+1),
y=samples['global.volume'].values[1],
mode='lines',
line_color=qualitative.Plotly[1],
showlegend=False
),
row=3,
col=2
)
fig.add_trace(
go.Scatter(
name='Run 3',
legendgroup='Group 4',
x=np.arange(1, n_iterations+1),
y=samples['global.volume'].values[2],
mode='lines',
line_color=qualitative.Plotly[0],
showlegend=False
),
row=3,
col=2
)
# Plot traces and histogram of parameter
fig.add_trace(
go.Histogram(
name='Posterior samples',
legendgroup='Group 1',
x=samples['Sigma log'].values[:, n_iterations//2::(n_iterations//2)//1000].flatten(),
histnorm='probability density',
showlegend=False,
xbins=dict(size=0.02),
marker_color=qualitative.Plotly[4],
),
row=4,
col=1
)
fig.add_trace(
go.Scatter(
name='Run 1',
legendgroup='Group 2',
x=np.arange(1, n_iterations+1),
y=samples['Sigma log'].values[0],
mode='lines',
line_color=qualitative.Plotly[2],
showlegend=False
),
row=4,
col=2
)
fig.add_trace(
go.Scatter(
name='Run 2',
legendgroup='Group 3',
x=np.arange(1, n_iterations+1),
y=samples['Sigma log'].values[1],
mode='lines',
line_color=qualitative.Plotly[1],
showlegend=False
),
row=4,
col=2
)
fig.add_trace(
go.Scatter(
name='Run 3',
legendgroup='Group 4',
x=np.arange(1, n_iterations+1),
y=samples['Sigma log'].values[2],
mode='lines',
line_color=qualitative.Plotly[0],
showlegend=False
),
row=4,
col=2
)
# Visualise prior distribution
parameter_values = np.linspace(4, 16, num=200)
pdf_values = np.exp([
log_prior._priors[0]([parameter_value])
for parameter_value in parameter_values])
fig.add_trace(
go.Scatter(
name='Prior distribution',
legendgroup='Group 5',
x=parameter_values,
y=pdf_values,
mode='lines',
line_color='black',
),
row=1,
col=1
)
parameter_values = np.linspace(0, 12, num=200)
pdf_values = np.exp([
log_prior._priors[1]([parameter_value])
for parameter_value in parameter_values])
fig.add_trace(
go.Scatter(
name='Prior distribution',
legendgroup='Group 5',
x=parameter_values,
y=pdf_values,
mode='lines',
line_color='black',
showlegend=False
),
row=2,
col=1
)
parameter_values = np.linspace(0, 12, num=200)
pdf_values = np.exp([
log_prior._priors[2]([parameter_value])
for parameter_value in parameter_values])
fig.add_trace(
go.Scatter(
name='Prior distribution',
legendgroup='Group 5',
x=parameter_values,
y=pdf_values,
mode='lines',
line_color='black',
showlegend=False
),
row=3,
col=1
)
parameter_values = np.linspace(0, 0.6, num=200)
pdf_values = np.exp([
log_prior._priors[3]([parameter_value])
for parameter_value in parameter_values])
fig.add_trace(
go.Scatter(
name='Prior distribution',
legendgroup='Group 5',
x=parameter_values,
y=pdf_values,
mode='lines',
line_color='black',
showlegend=False
),
row=4,
col=1
)
fig.update_layout(
xaxis_title='k_a',
yaxis_title='p',
yaxis2_title='k_a',
xaxis3_title='k_e',
yaxis3_title='p',
yaxis4_title='k_e',
xaxis5_title='v',
yaxis5_title='p',
yaxis6_title='v',
xaxis7_title='sigma',
yaxis7_title='p',
xaxis8_title='Number of iterations',
yaxis8_title='sigma',
template='plotly_white',
legend=dict(
orientation="h",
yanchor="bottom",
y=1.02,
xanchor="right",
x=1
),
margin=dict(t=0, r=0, l=0)
)
fig.show()
3.4. Assessing convergence: Summary
Run the MCMC algorithm multiple times from different starting points. The
chi.SamplingController
automatically samples starting points from the prior distribution. Recommended number of runs is 3-5.Visually assess the convergence of the runs to the same area of parameter space using trace plots i.e. sample values on the y-axis and iteration at which each sample was obtained on the x-axis.
Discard the samples where the runs have not converged as “warm-up”. If the runs do not converge, the MCMC algorithm has to run for more iterations until they converge, or the algorithm may not be suited for the inference problem.
Calculate the \(\hat{R}\) value using the remaining samples. We recommend the
az.rhat
function implemented in ArViz. Values \(<1.01\) indicate good convergence. Larger values may indicate problems with the inference.(Optional) For MCMC runs that require a large number of iterations for convergence, it can make sense to subsample the samples. Often, 1000-3000 samples are sufficient to represent a distribution. For
n_iterations
post warm-up, keeping everyn_iterations // 1000
th sample may therefore be sufficient.
For more details, please refer to the previous section.
3.5. Comparing model fits to data
Let us conclude this tutorial by comparing our model fit to the drug concentration measurements. The simplest way to do this is to just focus on the means of the posterior distributions, which we can extract from the summary dataframes presented in Section 3.3.2.
# Fix model parameters
model = chi.ReducedMechanisticModel(mechanistic_model=model)
model.fix_parameters(name_value_dict={
'dose.drug_amount': 0,
'global.drug_amount': 0,
})
# Set dosing regimen
dosing_regimen = problem.get_dosing_regimens()['1']
model.set_dosing_regimen(dosing_regimen)
# Extract model parameters from summary
parameters = [
summary2['mean'].loc[parameter] for parameter in model.parameters()
]
# Simulate result
times = np.linspace(0.2, 3, 200)
simulation = model.simulate(parameters=parameters, times=times)[0]
# Plot results
fig = go.Figure()
fig.add_trace(go.Scatter(
x=times,
y=simulation,
mode='lines',
name='Fit'
))
fig.add_trace(go.Scatter(
x=data.Time,
y=data.Value,
mode='markers',
name='Measurements'
))
fig.update_layout(
xaxis_title='Time',
yaxis_title='Drug concentration',
template='plotly_white'
)
fig.show()
In the above code block, we first fix the initial amount parameters of the mechanistic
model using the chi.ReducedMechanisticModel
class. This class is a wrapper
for mechanistic models that implements the ReducedMechanisticModel.fix_parameters()
method. This method works analogously to the ProblemModellingController.fix_parameters()
and fixes model parameters to constant values. In the next lines, we extract the
dosing regimen from the ProblemModellingController
. The controller formats
the dosing regimens from the dataset, Dataset_1, into a useful format for the chi.PKPDModel
, returning the
dosing regimens in form of a dictionary, using the
IDs of the individuals as keys. This makes it possible to access the dosing regimen
of the individual with 'ID 1'
in Dataset_1 using the key '1'
.
In the remaining lines leading up to the plotting, we extract the
estimates of the parameter means from the summary dataframe.
We can see that our fit describes the measurements reasonably well. However, it is
a little bit unsatisfying that the uncertainty of the parameter estimates is not
reflected in the model fit at all. To resolve this shortcoming of just using the
parameter means to plot the model fit, Chi implements the chi.PosteriorPredictiveModel
,
which defines the posterior predictive distribution of the measurement
process
The posterior predictive distribution averages our model of the measurement distribution over the parameter values that we found to be consistent with the data. Before discussing this in more detail, let us take a look at how the posterior predictive distribution compares to the model fit where we just used the means of the parameters. To this end, we sample measurements repeatedly from \(p(y | \mathcal{D}, r, t)\) and estimate the mean and some of its percentiles at each time point.
# Define posterior predictive distribution
predictive_model = problem.get_predictive_model()
samples = samples.sel(draw=slice(n_iterations//2, n_iterations))
posterior_model = chi.PosteriorPredictiveModel(
predictive_model=predictive_model, posterior_samples=samples)
# Approximate distribution using sampling
n_samples = 1000
conc_samples = posterior_model.sample(
times=times, n_samples=n_samples, seed=1)
# Reshape samples, so we can calculate mean and percentiles at the different
# time points
reshaped_samples = np.empty(shape=(n_samples, len(times)))
for sample_idx, sample_id in enumerate(conc_samples.ID.unique()):
reshaped_samples[
sample_idx] = conc_samples[conc_samples.ID == sample_id].Value.values
# Calculate mean, 5th and 95th percentile of the distribution at each time
# point
means = np.mean(reshaped_samples, axis=0)
lower = np.percentile(reshaped_samples, q=5, axis=0)
upper = np.percentile(reshaped_samples, q=95, axis=0)
In the first few lines of the code block, we implement the
chi.PosteriorPredictiveModel
by first retrieving the
chi.PredictiveModel
from the problem modelling controller, and then
combining it with the inferred posterior samples.
The chi.PredictiveModel
defines the measurement distribution,
\(p(y | \psi, r, t)\), for the purpose of simulating measurements. Note that
\(p(y | \psi, r, t)\) is the same distribution as the one we used to construct the
likelihood in Section 3.1.
The important difference is that for the likelihood the values of \((y, t, r)\)
are fixed to the measurements, while for the chi.PredictiveModel
, the times
and the dosing regimen can be chosen at will to simulate any measurements of interest.
In the following 2 lines of the code block, we sample measurements from the posterior predictive distribution –
1000 samples at each simulated time point. The samples are returned as a pandas.DataFrame
,
which we reshape to a numpy array of shape (n_samples, n_times)
in the next few lines.
This reshaping makes it very easy to calculate the mean value and the percentiles
of the samples at each time point, as demonstrated in the final lines of the code block.
The figure visualises the estimated mean and percentiles of the posterior predictive
distribution on top of our earlier fit. The code to reproduce the figure is documented below.
The area between the percentile estimates marks the region of drug concentration values
which can be expected to be occupied by 90% of all measurements,
assuming, of course, that our model correctly describes the measurement process.
The estimated percentiles are more jittery than the plot with the mean parameter values
because we use sampling to estimate the percentiles. As a result, the estimates
are subject to sampling error, which can be reduced by increasing the number of
samples, n_samples
. However, for the purpose of this analysis, the jittering
is not substantial enough to warrant a larger number of samples.
Plotting the posterior predictive distribution provides two important insights that the fit with the means of the parameters cannot provide: 1. it can quantify the uncertainty associated with the treatment response. This uncertainty is partially due to measurement noise, but it also is due to the residual uncertainty in the model parameters, as quantified by the posterior distribution. 2. It is able to better reflect nonlinearities between model parameters and the model output which are otherwise neglected when plotting the fit with the means of the model parameters. We can see an indication of this in the figure by noting that the fit with the parameter means is not the same as the mean of the posterior predictive distribution. This difference is a result of the, in general, nonlinear dependence of treatment response models on their parameters, meaning that the average treatment response under parametric uncertainty is not the same as the treatment response predicted with the mean parameters. In fact, these two predictions can differ substantially from each other, especially when predictions are made for future times or for previously unexplored dosing regimens.
# Plot results
fig = go.Figure()
fig.add_trace(go.Scatter(
x=times,
y=simulation,
mode='lines',
name='Fit: mean parameters'
))
fig.add_trace(go.Scatter(
x=data.Time,
y=data.Value,
mode='markers',
name='Measurements'
))
fig.add_trace(go.Scatter(
x=times,
y=means,
mode='lines',
name='Fit: mean posterior pred. distr.',
line=dict(color='black')
))
fig.add_trace(go.Scatter(
x=times,
y=lower,
mode='lines',
name='Fit: 5th-95th perc. posterior pred. distr.',
line=dict(color='black', dash='dash')
))
fig.add_trace(go.Scatter(
x=times,
y=upper,
mode='lines',
name='Fit: 5th-95th perc. posterior pred. distr.',
line=dict(color='black', dash='dash'),
showlegend=False
))
fig.update_layout(
xaxis_title='Time',
yaxis_title='Drug concentration',
template='plotly_white',
legend=dict(
orientation="h",
yanchor="bottom",
y=1.02,
xanchor="right",
x=1
),
)
fig.show()
This concludes this tutorial. If you have any feedback or suggestions for improvement, or would like to report any typos, mistakes or bugs, please do reach out to us, for example by creating an Issue. We are looking forward to hearing from you!
3.6. Reference to ErrorModel, LogPDF and PredictiveModel API
A base class for error models for the one-dimensional output of |
|
An error model which assumes that the model error follows a Gaussian distribution. |
|
An error model which assumes that the model error follows a Log-normal distribution. |
|
An error model which assumes that the model error is a Gaussian heteroscedastic noise. |
|
An error model which assumes that the model error is a mixture between a Gaussian base-level noise and a Gaussian heteroscedastic noise. |
|
|
A class that can be used to permanently fix model parameters of an |
|
A log-likelihood that quantifies the likelihood of parameter values to capture the measurements within the model approximation of the data-generating process. |
|
A log-posterior constructed from a log-likelihood and a log-prior. |
|
A problem modelling controller which simplifies the model building process of a pharmacokinetic and pharmacodynamic problem. |
|
Sets up a sampling routine that attempts to find the posterior distribution of parameters defined by a |
|
Implements a model of a data-generating process. |
|
Implements the posterior predictive model of the modelled data-generating process and the associated parameter posterior distribution. |