# Likelihood¶

The likelihood module contains functions used to calculate and visualize the log-likelihood surfaces for models and data in Curveball. Functions that use growth data expect a pandas.DataFrame generated by the ioutils module. Functions that use often results of model fitting require lmfit.model.ModelResult objects.

## Members¶

curveball.likelihood.loglik(t, y, y_sig, f, penalty=None, \*\*params)[source]

Computes the log-likelihood of seeing the data given a model assuming normal distributed observation/measurement errors.

$\log{L(y | \theta)} = -\frac{1}{2} \sum_i { \log{(2 \pi \sigma_{i}^{2})} + \frac{(y - f(t_i; \theta))^2}{\sigma_{i}^{2}} }$

which is the log-likelihood of seeing the data points $$t_i, y_i$$ with measurement error $$\sigma_i$$ given the model function $$f$$, the model parameters $$\theta$$, and that the measurement error at time $$t_i$$ has a normal distribution with mean 0.

Parameters
tnp.ndarray

one dimensional array of time

ynp.ndarray

one dimensional array of the means of the observations

y_signp.ndarray

one dimensional array of standrad deviations of the observations

fcallable

a function the calculates the expected observations (f(t)) from t and any parameters in params

penaltycallable

a function that calculates a scalar penalty from the parameters in params to be substracted from the log-likelihood

paramsfloats, optional

model parameters

Returns
float

the log-likelihood result

curveball.likelihood.loglik_r_nu(r_range, nu_range, df, f=<function baranyi_roberts_function at 0x7f10687d9d08>, penalty=None, \*\*params)[source]

Estimates the log-likelihood surface for $$r$$ and $$\nu$$ given data and a model function.

Parameters
r_range, nu_rangenumpy.ndarray

vectors of floats of $$r$$ and $$\nu$$ values on which to compute the log-likelihood

dfpandas.DataFrame

data frame with Time and OD columns

fcallable, optional

model function, defaults to curveball.baranyi_roberts_model.baranyi_roberts_function()

penaltycallable, optional

if given, the result of penalty will be substracted from the log-likelihood for each parameter set

paramsfloats

values for the model model parameters used by f

Returns
np.ndarray

two-dimensional array of log-likelihood calculations; value at index i, j will have the value for r_range[i] and nu_range[j]

curveball.likelihood.loglik_r_q0(r_range, q0_range, df, f=<function baranyi_roberts_function at 0x7f10687d9d08>, penalty=None, \*\*params)[source]

Estimates the log-likelihood surface for $$r$$ and $$\nu$$ given data and a model function.

Parameters
r_range, q0_rangenumpy.ndarray

vectors of floats of $$r$$ and $$q_0$$ values on which to compute the log-likelihood

dfpandas.DataFrame

data frame with Time and OD columns

fcallable, optional

model function, defaults to curveball.baranyi_roberts_model.baranyi_roberts_function()

penaltycallable, optional

if given, the result of penalty will be substracted from the log-likelihood for each parameter set

paramsfloats

values for the model model parameters used by f

Returns
np.ndarray

two-dimensional array of log-likelihood calculations; value at index i, j will have the value for r_range[i] and q0_range[j]

curveball.likelihood.plot_loglik(Ls, xrange, yrange, xlabel=None, ylabel=None, columns=4, fig_title=None, normalize=True, ax_titles=None, cmap='viridis', colorbar=True, ax_width=4, ax_height=4, ax=None)[source]

Plots one or more log-likelihood surfaces.

Parameters
Lssequence of numpy.ndarray

list or tuple of log-likelihood two-dimensional arrays; if one array is given it will be converted to a size 1 list

xrange, yrangenp.ndarray

values on x-axis and y-axis of the plot (rows and columns of Ls, respectively)

xlabel, ylabelstr, optional

strings for x and y labels

columnsint, optional

number of columns in case that Ls has more than one matrice

fig_titlestr, optional

a title for the whole figure

normalizebool, optional

if True, all matrices will be plotted using a single color scale

ax_titleslist or tuple of str, optional

titles corresponding to the different matrices in Ls

cmapstr. optional

name of a matplotlib colormap (to see list, call matplotlib.pyplot.colormaps()), defaults to viridis

colorbarbool, optional

if True a colorbar will be added to the plot

ax_width, ax_heightint

width and height of each panel (one for each matrice in Ls)

axmatplotlib axes or numpy.ndarray of axes

if given, will plot into ax, otherwise will create a new figure

Returns
figmatplotlib.figure.Figure

figure object

axnumpy.ndarray

array of axis objects

Examples

>>> L = loglik_r_nu(rs, nus, df, y0=y0, K=K, q0=q0, v=v)
>>> plot_loglik(L0, rs, nus, normalize=False, fig_title=fig_title, xlabel=r'$r$', ylabel=r'$\nu$', colorbar=False)

curveball.likelihood.plot_model_loglik(m, df, fig_title=None)[source]

Plot the log-ikelihood surfaces for $$\nu$$ over $$r$$ and $$q_0$$ over $$r$$ for given data and model fitting result.

Parameters
mlmfit.model.ModelResult

model for which to plot the log-likelihood surface

dfpandas.DataFrame

data frame with Time and OD columns used to fit the model

fig_titlestr

title for the plot

Returns
figmatplotlib.figure.Figure

figure object

axnumpy.ndarray

array of axis objects

Examples

>>> m = curveball.models.fit_model(df)
>>> curveball.likelihood.plot_model_loglik(m, df)

curveball.likelihood.ridge_regularization(lam, \*\*center)[source]

Create a penaly function that employs the ridge regularization method:

$P = \lambda ||\theta - \theta_0||_2$

where $$\lambda$$ is the regularization scale, $$\theta$$ is the model parameters vector, and $$\theta_0$$ is the model parameters guess vector. This is similar to using a multivariate Gaussian prior distribution on the model parameters with the Gaussian centerd at $$\theta_0$$ and scaled by $$\lambda$$.

Parameters
lamfloat

the penalty factor or regularization scale

centerfloats, optional

guesses of model parameters

Returns
callable

the penalty function, accepts model parameters as float keyword arguments and returns a float penalty to the log-likelihood

Examples

>>> penalty = ridge_regularization(1, y=0.1, K=1, r=1)
>>> loglik(t, y, y_sig, logistic, penalty=penalty, y0=0.12, K=0.98, r=1.1)