Population Filters

Population filters in chi can be used for population filter inference.

Classes

Detailed API

class chi.ComposedPopulationFilter(population_filters)[source]

A filter composed of multiple filters.

A composed filter takes a list of filters and defines the log-likelihood of simulated measurements as the sum over the individual log-likelihoods of the filters

\[\log p(Y | \tilde{Y}) = \sum _{ij} \log p_j(y_{ij} | \tilde{Y}_j ),\]

where \(\tilde{Y}_j = \{ \tilde{y}_{sj} \}\) are the simulated measurements and \(p_j(\cdot | \tilde{Y}_j )\) is the filter at time point \(t_j\).

The input instances of chi.PopulationFilter may model multiple time points at once. The measurement times are expected to be ordered according to the concatenated measurement times of the individual filters.

Extends chi.PopulationFilter.

Parameters:

population_filters (List[chi.PopulationFilter]) – List of filters.

compute_log_likelihood(simulated_obs)[source]

Returns the log-likelihood of the simulated observations with respect to the data and filter.

Parameters:

simulated_obs (np.ndarray of shape (n_sim, n_observables, n_times)) – Simulated measurements.

Return type:

float

compute_sensitivities(simulated_obs)[source]

Returns the log-likelihood of the simulated observations with respect to the data and filter, and the sensitivities of the log-likelihood with respect to the simulated observations.

Parameters:

simulated_obs (np.ndarray of shape (n_sim, n_observables, n_times)) – Simulated measurements.

Return type:

Tuple[float, np.ndarray] where the array has shape (n_sim, n_observables, n_times)

n_observables()[source]

Returns the number of observables in the dataset.

n_times()[source]

Returns the number of measurement times in the dataset.

sort_times(order)[source]

Sorts the observations along the time dimension according to the provided indices.

Parameters:

order (np.ndarray of shape (n_times,)) – An array with indices that orders the observations along thetime dimension.

class chi.GaussianFilter(observations)[source]

Implements a Gaussian filter.

A Gaussian filter approximates the distribution of measurements at time point \(t_j\) by a Gaussian distribution whose mean and variance are estimated from simulated measurements

\[\log p(\mathcal{D} | \tilde{Y}) = \sum _{ij} \log \mathcal{N} (y_{ij} | \mu _j, \sigma ^2_j),\]

where the mean \(\mu _j\) and the variance \(\sigma ^2_j\) are given by empirical estimates from the simulated measurements

\[\mu _j = \frac{1}{n_s} \sum _{s=1}^{n_s} \tilde{y}_{sj} \quad \text{and} \quad \sigma ^2 _j = \frac{1}{n_s-1} \sum _{s=1}^{n_s} \left( \tilde{y}_{sj} - \mu _j \right) ^2.\]

Here, we use \(i\) to index measured individuals from the dataset, \(j\) to index measurement time points and \(s\) to index simulated measurements. \(n_s\) denotes the number of simulated measurements per time point.

For multiple measured observables the above expression can be straightforwardly extended to

\[\log p(\mathcal{D} | \tilde{Y}) = \sum _{ijr} \log \mathcal{N} (y_{ijr} | \mu _{jr}, \sigma ^2_{jr}),\]

where \(r\) indexes observables and \(\mu _{jr}\) and \(\sigma^2 _{jr}\) are the empirical mean and variance over the simulated measurements of the observable \(r\) at time point \(t_j\).

Extends PopulationFilter

Parameters:

observations (np.ndarray of shape (n_ids, n_observables, n_times)) – Measurements.

compute_log_likelihood(simulated_obs)[source]

Returns the log-likelihood of the simulated observations with respect to the data and filter.

Parameters:

simulated_obs (np.ndarray of shape (n_sim, n_observables, n_times)) – Simulated measurements.

Return type:

float

compute_sensitivities(simulated_obs)[source]

Returns the log-likelihood of the simulated observations with respect to the data and filter, and the sensitivities of the log-likelihood with respect to the simulated observations.

Parameters:

simulated_obs (np.ndarray of shape (n_sim, n_observables, n_times)) – Simulated measurements.

Return type:

Tuple[float, np.ndarray] where the array has shape (n_sim, n_observables, n_times)

n_observables()

Returns the number of observables in the dataset.

n_times()

Returns the number of measurement times in the dataset.

sort_times(order)

Sorts the observations along the time dimension according to the provided indices.

Parameters:

order (np.ndarray of shape (n_times,)) – An array with indices that orders the observations along thetime dimension.

class chi.GaussianKDEFilter(observations)[source]

Implements a Gaussian kernel density estimation filter.

A Gaussian KDE filter approximates the distribution of measurements at time point \(t_j\) by a Gaussian KDE approximation of the simulated measurements. The Gaussian KDE approximation is defined by the average over Gaussian probability densities whose means are equal to the simulated measurements and the standard deviation (or bandwidth) is a hyperparameter. By default the bandwidth is chosen by an adapted rule of thumb.

The log-likelihood of the simulated measurements with respect to the measurements and the filter is defined as

\[\log p(\mathcal{D} | \tilde{Y}) = \sum _{ij} \log \left( \frac{1}{n_s} \sum _{s=1}^{n_s} \mathcal{N} (y_{ij} | \tilde{y}_{sj}, \tilde{\sigma} ^2_j) \right).\]

Here, we use \(i\) to index measured individuals from the dataset, \(j\) to index measurement time points and \(s\) to index simulated measurements. \(n_s\) denotes the number of simulated measurements per time point.

An adapted rule of thumb is used to estimate an appropriate bandwidth for each time point \(t_j\)

\[\tilde{\sigma} _j = \left( \frac{4}{3n_s}\right) ^ {1/5} \sqrt{\frac{1}{n_s - 1} \sum _s (\tilde{y}_{sj} - \tilde{\mu} _j)^2},\]

where \(\tilde{\mu} _j = \sum _s \tilde{y}_{sj} / n_s\) is the empirical mean over the simulated measurements at time \(t_j\).

For multiple measured observables the above expression can be straightforwardly extended to

\[\log p(\mathcal{D} | \tilde{Y}) = \sum _{ijr} \log \left( \frac{1}{n_s} \sum _{s=1}^{n_s} \mathcal{N} (y_{ijr} | \tilde{y}_{sjr}, \tilde{\sigma} ^2_{jr}) \right),\]

where \(r\) indexes observables and \(\tilde{\sigma} _{jr}\) is the bandwidth for observable \(r\) at time point \(t_j\).

Extends PopulationFilter

Parameters:

observations (np.ndarray of shape (n_ids, n_observables, n_times)) – Measurements.

compute_log_likelihood(simulated_obs)[source]

Returns the log-likelihood of the simulated observations with respect to the data and filter.

Parameters:

simulated_obs (np.ndarray of shape (n_sim, n_observables, n_times)) – Simulated measurements.

Return type:

float

compute_sensitivities(simulated_obs)[source]

Returns the log-likelihood of the simulated observations with respect to the data and filter, and the sensitivities of the log-likelihood with respect to the simulated observations.

Parameters:

simulated_obs (np.ndarray of shape (n_sim, n_observables, n_times)) – Simulated measurements.

Return type:

Tuple[float, np.ndarray] where the array has shape (n_sim, n_observables, n_times)

n_observables()

Returns the number of observables in the dataset.

n_times()

Returns the number of measurement times in the dataset.

sort_times(order)

Sorts the observations along the time dimension according to the provided indices.

Parameters:

order (np.ndarray of shape (n_times,)) – An array with indices that orders the observations along thetime dimension.

class chi.GaussianMixtureFilter(observations, n_kernels=2)[source]

Implements a Gaussian mixture filter.

A Gaussian mixture filter approximates the distribution of measurements at time point \(t_j\) by a Gaussian mixture distribution whose kernel means and variances are estimated from simulated measurements

\[\log p(\mathcal{D} | \tilde{Y}) = \sum _{ij} \log \sum_m \frac{1}{M} \mathcal{N} (y_{ij} | \mu _{jm}, \sigma ^2_{jm}),\]

where the mean \(\mu _{jm}\) and the variance \(\sigma ^2_{jm}\) of the mth Gaussian distribution are given by the empirical estimates from the mth subset of the simulated measurements

\[\mu _{jm} = \frac{1}{n} \sum _{s=1}^{n} \tilde{y}_{sjm} \quad \text{and} \quad \sigma ^2 _{jm} = \frac{1}{n-1} \sum _{s=1}^{n} \left( \tilde{y}_{sjm} - \mu _{jm} \right) ^2.\]

Here, we use \(i\) to index measured individuals from the dataset, \(j\) to index measurement time points and \(s\) to index simulated measurements. \(n\) denotes the number of simulated measurements per time point.

For multiple measured observables the above expression can be straightforwardly extended to

\[\log p(\mathcal{D} | \tilde{Y}) = \sum _{ijr} \log \sum_m \frac{1}{M} \mathcal{N} (y_{ijr} | \mu _{jrm}, \sigma ^2_{jrm}),\]

where \(r\) indexes observables and \(\mu _{jrm}\) and \(\sigma^2 _{jrm}\) are the empirical mean and variance over the mth subset of simulated measurements of the observable \(r\) at time point \(t_j\).

Extends PopulationFilter

Parameters:

observations (np.ndarray of shape (n_ids, n_observables, n_times)) – Measurements.

compute_log_likelihood(simulated_obs)[source]

Returns the log-likelihood of the simulated observations with respect to the data and filter.

Parameters:

simulated_obs (np.ndarray of shape (n_sim, n_observables, n_times)) – Simulated measurements.

Return type:

float

compute_sensitivities(simulated_obs)[source]

Returns the log-likelihood of the simulated observations with respect to the data and filter, and the sensitivities of the log-likelihood with respect to the simulated observations.

Parameters:

simulated_obs (np.ndarray of shape (n_sim, n_observables, n_times)) – Simulated measurements.

Return type:

Tuple[float, np.ndarray] where the array has shape (n_sim, n_observables, n_times)

n_observables()

Returns the number of observables in the dataset.

n_times()

Returns the number of measurement times in the dataset.

sort_times(order)

Sorts the observations along the time dimension according to the provided indices.

Parameters:

order (np.ndarray of shape (n_times,)) – An array with indices that orders the observations along thetime dimension.

class chi.LogNormalFilter(observations)[source]

Implements a lognormal filter.

A lognormal filter approximates the distribution of measurements at time point \(t_j\) by a lognormal distribution whose location and scale are estimated from simulated measurements

\[\log p(\mathcal{D} | \tilde{Y}) = \sum _{ij} \log \mathrm{LN} (y_{ij} | \mu _j, \sigma _j),\]

where the location \(\mu _j\) and the location \(\sigma _j\) are given by empirical estimates of the log-mean and the log-standard deviation of the simulated measurements

\[\mu _j = \frac{1}{n_s} \sum _{s=1}^{n_s} \log \tilde{y}_{sj} \quad \text{and} \quad \sigma _j = \sqrt{\frac{1}{n_s-1} \sum _{s=1}^{n_s} \left( \log \tilde{y}_{sj} - \mu _j \right) ^2}.\]

Here, we use \(i\) to index measured individuals from the dataset, \(j\) to index measurement time points and \(s\) to index simulated measurements. \(n_s\) denotes the number of simulated measurements per time point.

For multiple measured observables the above expression can be straightforwardly extended to

\[\log p(\mathcal{D} | \tilde{Y}) = \sum _{ijr} \log \mathrm{LN} (y_{ijr} | \mu _{jr}, \sigma _{jr}),\]

where \(r\) indexes observables and \(\mu _{jr}\) and \(\sigma _{jr}\) are the empirical mean and variance over the simulated log-measurements of the observable \(r\) at time point \(t_j\).

Extends PopulationFilter

Parameters:

observations (np.ndarray of shape (n_ids, n_observables, n_times)) – Measurements.

compute_log_likelihood(simulated_obs)[source]

Returns the log-likelihood of the simulated observations with respect to the data and filter.

Parameters:

simulated_obs (np.ndarray of shape (n_sim, n_observables, n_times)) – Simulated measurements.

Return type:

float

compute_sensitivities(simulated_obs)[source]

Returns the log-likelihood of the simulated observations with respect to the data and filter, and the sensitivities of the log-likelihood with respect to the simulated observations.

Parameters:

simulated_obs (np.ndarray of shape (n_sim, n_observables, n_times)) – Simulated measurements.

Return type:

Tuple[float, np.ndarray] where the array has shape (n_sim, n_observables, n_times)

n_observables()

Returns the number of observables in the dataset.

n_times()

Returns the number of measurement times in the dataset.

sort_times(order)

Sorts the observations along the time dimension according to the provided indices.

Parameters:

order (np.ndarray of shape (n_times,)) – An array with indices that orders the observations along thetime dimension.

class chi.LogNormalKDEFilter(observations, bandwidth=None)[source]

Implements a lognormal kernel density estimation filter.

A lognormal KDE filter approximates the distribution of measurements at time point \(t_j\) by a lognormal KDE approximation of the simulated measurements. The lognormal KDE approximation is defined by the average over lognormal probability densities whose locations are equal to the simulated measurements and the scale (or bandwidth) is a hyperparameter. By default the bandwidth is chosen by an adapted rule of thumb.

The log-likelihood of the simulated measurements with respect to the measurements and the filter is defined as

\[\log p(\mathcal{D} | \tilde{Y}) = \sum _{ij} \log \left( \frac{1}{n_s} \sum _{s=1}^{n_s} \mathrm{LN} (y_{ij} | \tilde{y}_{sj}, \sigma _j) \right).\]

Here, we use \(i\) to index measured individuals from the dataset, \(j\) to index measurement time points and \(s\) to index simulated measurements. \(n_s\) denotes the number of simulated measurements per time point.

An adapted rule of thumb is used to estimate an appropriate bandwidth for each time point \(t_j\)

\[\sigma _j = \left( \frac{4}{3n_s}\right) ^ {1/5} \sqrt{\frac{1}{n_j - 1}\sum _i (\log y_{ij} - \mu _j)^2},\]

where \(\mu _j = \sum _i \log y_{ij} / n_j\) is the empirical mean over the log-measurements and \(n_j\) is the number of measurements at time \(t_j\). Note that this deviates from the standard definition of the rule of thumb, where the empirical variance would be estimated from the simulated measurements.

For multiple measured observables the above expression can be straightforwardly extended to

\[\log p(\mathcal{D} | \tilde{Y}) = \sum _{ijr} \log \left( \frac{1}{n_s} \sum _{s=1}^{n_s} \mathrm{LN} (y_{ijr} | \tilde{y}_{sjr}, \sigma _{jr}) \right),\]

where \(r\) indexes observables and \(\sigma _{jr}\) is the bandwidth for observable \(r\) at time point \(t_j\).

Extends PopulationFilter

Parameters:

observations (np.ndarray of shape (n_ids, n_observables, n_times)) – Measurements.

compute_log_likelihood(simulated_obs)[source]

Returns the log-likelihood of the simulated observations with respect to the data and filter.

Parameters:

simulated_obs (np.ndarray of shape (n_sim, n_observables, n_times)) – Simulated measurements.

Return type:

float

compute_sensitivities(simulated_obs)[source]

Returns the log-likelihood of the simulated observations with respect to the data and filter, and the sensitivities of the log-likelihood with respect to the simulated observations.

Parameters:

simulated_obs (np.ndarray of shape (n_sim, n_observables, n_times)) – Simulated measurements.

Return type:

Tuple[float, np.ndarray] where the array has shape (n_sim, n_observables, n_times)

n_observables()

Returns the number of observables in the dataset.

n_times()

Returns the number of measurement times in the dataset.

sort_times(order)

Sorts the observations along the time dimension according to the provided indices.

Parameters:

order (np.ndarray of shape (n_times,)) – An array with indices that orders the observations along thetime dimension.

Base classes

Detailed API

class chi.PopulationFilter(observations)[source]

A base class for filters.

A filter estimates the likelihood with which simulated observations \(\tilde{y}_{sj}\) come from the same distribution as the measurements \(y_{ij}\), where \(s\) indexes simulated individuals, \(j\) time points and \(i\) measured individuals.

Formally the log-likelihood of the simulated observations with respect to the filter is defined as

\[\log p(Y | \tilde{Y}) = \sum _{ij} \log p(y_{ij} | \tilde{Y}_j ),\]

where \(\tilde{Y}_j = \{ \tilde{y}_{sj} \}\) are the simulated observations at time point \(t_j\).

The measurements are expected to be arranged into a 3 dimensional numpy array of shape (n_ids, n_observables, n_times), where n_ids is the number of measured individuals at a given time point, n_observables is the number of unique observables that were measurement, and n_times is the number of unique time points. The filter expects the simulated measurements to be ordered in the same way, so no record of the measurement times is needed.

If varying numbers of individuals were measured at different time points, or not all observables were measured for each individual, the missing values can be filled with np.nan to be able to shape the observations into the format (n_ids, n_observables, n_times). The missing values will be filtered internally.

Parameters:

observations (np.ndarray of shape (n_ids, n_observables, n_times)) – Measurements.

compute_log_likelihood(simulated_obs)[source]

Returns the log-likelihood of the simulated observations with respect to the data and filter.

Parameters:

simulated_obs (np.ndarray of shape (n_sim, n_observables, n_times)) – Simulated measurements.

Return type:

float

compute_sensitivities(simulated_obs)[source]

Returns the log-likelihood of the simulated observations with respect to the data and filter, and the sensitivities of the log-likelihood with respect to the simulated observations.

Parameters:

simulated_obs (np.ndarray of shape (n_sim, n_observables, n_times)) – Simulated measurements.

Return type:

Tuple[float, np.ndarray] where the array has shape (n_sim, n_observables, n_times)

n_observables()[source]

Returns the number of observables in the dataset.

n_times()[source]

Returns the number of measurement times in the dataset.

sort_times(order)[source]

Sorts the observations along the time dimension according to the provided indices.

Parameters:

order (np.ndarray of shape (n_times,)) – An array with indices that orders the observations along thetime dimension.