Reading the data

The first thing we do to start working with the data is import it!

First let's just look at the data using the readtable function.

df = readtable('PNAS2011_fold_change.xlsx', 'Sheet', 'tidy')

df =
     Strain        O3        O3_err        O2          O2_err         O1          O1_err         Oid         Oid_err      lacI    lacI_err
    _________    _______    ________    _________    __________    _________    __________    __________    __________    ____    ________

    'HG104'      0.79851     0.19812      0.17556      0.014744     0.038858     0.0049209     0.0075545    0.00069117     11       2    
    'RBS1147'    0.65236     0.15702     0.045195      0.002421    0.0089022     0.0014933     0.0016201    0.00022775     30      10    
    'RBS446'     0.62103     0.13827     0.021512     0.0040026    0.0047207     0.0012129     0.0010037      0.000243     62      15    
    'RBS1027'    0.47735    0.048084     0.012406     0.0013887    0.0027741    0.00049222    0.00062351    0.00010108    130      20    
    'RBS1'       0.24943    0.077998    0.0057002     0.0010577    0.0015083    0.00047283    0.00029141    6.3418e-05    610      80    
    '1I'         0.17431    0.016457    0.0046264    0.00050946    0.0011003    0.00021999     0.0001799    5.1989e-05    870     170    

We can now plot the the fold change as a function of the repressor copy number in a log-log plot.

Note that we didn't use the function loglog for this plot. Unfortunately the Live-scripts on Matlab are still on its very early days and they cannot handle yet log-log or semi-log plots. But if you do this plot from your terminal you will be able to have a nice log-log plot!

hold on

plot(log10(df.lacI), log10(df.Oid), 'd')

plot(log10(df.lacI), log10(df.O1), 'd')

plot(log10(df.lacI), log10(df.O2), 'd')

plot(log10(df.lacI), log10(df.O3), 'd')

xlabel('$\log_{10}R$', 'Interpreter', 'Latex', 'fontsize', 20)

ylabel('$\log_{10}\mbox{(fold-change)}$', 'Interpreter', 'Latex', 'fontsize', 20)

legend('Oid', 'O1', 'O2', 'O3', 'Location', 'southwest')

legend('boxoff')

hold off

Introduction to Bayesian Regression

This section is based on Justin Bois' fantastic data analysis course at Caltech.

We highly recommend you to read his introduction to bayesian parameter estimation to learn with more details what we will present here. Due to time constraints we will just go through a short primer on the topic.

As we mentioned during lecture for Bayesians probability is a measure of plausibility for any statement. For parameter estimation we want to find the probability distribution of the value of our parameters of interest, i.e. as Bayesians we will never say something like: "this is the value of the parameter we measure in our experiment with 100% certainty"; but rather we would report the probability distribution of the values that these parameters can take.

For example in our specific case we are interested in finding the value of the repressor binding energy for the different operator sequences. We have data on the expression level fold-change for different copy number of repressors . As derived in lecture and in Garcia & Phillips 2011 the fold-change under the weak promoter approximation is given by

where is the number of non-specific binding sites for the repessor ( E. coli's genome size) and is the inverse of the temperature times the Boltzmann constant.

Since the parameter we are trying to infer given some data is the binding energy, this can be written in terms of Bayes theorem as

where represents our data set, and represents all the previous information we have available. Recall that each term on Bayes theorem has a name commonly used in data analysis

But note that for our purposes of finding the posterior probability of the binding energy given the data the evidence doesn't depend on this parameter. So for this case it is just a normalization constant that we can ignore, giving us

As researches we have to make decisions on what functional forms the likelihood and the prior will take. It is common to use a Gaussian prior given that the powerful central limit theorem tells us that processes which involve many subprocesses tend to be Gaussianly distributed (we are giving a very superficial explanation of this, for more information we highly recommend you Sivia's outstanding book). Therefore assuming each data point is independent of the others our likelihood will be given by

where means all the data points in our data set, is the experimentally measured fold-change for a strain with repressors, and is the equation we previously wrote for the theoretical fold-change as a function of the repressor copy number and the binding energy.

Notice that we added an extra parameter for this Gaussian likelihood. This is a nuisance parameter that describes how the data deviates from the theoretical model; but since we are not interested on its actual value we can at the end integrate it out of our posterior probability.

The prior that now also includes the extra parameter captures the information we knew about these parameters before performing the experiment. Without going too much into the details on how to properly choose the functional form of a prior we will assume for simplicity a so-called uniform prior for the binding energy, and a Jeffreys prior for the standard deviation . This means that we expect the energy to fall withing a reasonable range with equal probability for all values. Also we know that by definition the standard deviation cannot be negative. Mathematically this can be expressed as

Again we insist you read Justin's notes and Sivia's book to get more details on this selection of prior probabilities.

Putting our likelihood and our prior together, the posterior probability looks like

We leave you as an exercise to show that this is in fact the functional form of the posterior probability.

To get rid of the nuisance parameter we can integrate it out as

Performing this integral can be a very nasty task. Nevertheless it can be evaluated giving us a posterior of the form

where is the number of data points.

Many times when managing very small numbers (such as probabilities) is much easier to work with the log scales. In our example if we take the log of this posterior we find

Now that we have the theory in place we can just compute the posterior probability for the binding energy. Let us use as a first illustrative example. We have included a function log_post that contains this previous equation. The function takes three arguments:

  1. epsilon: . Repressor binding energy.
  2. R_arr: Number of repressors.
  3. fc_arr: Experimental fold-change measured for those given values of repressors.

We will compute the posterior probability for energies between -14 and -17 .

% Define energies to compute the posterior probability on.

energies = -17:0.01:-14;

% Initialize array to save the log posterior probability

log_posterior = zeros(1, length(energies));

% Loop through the energies computing the log probability at each step.

for i=1:length(log_posterior)

log_posterior(i) = log_post(energies(i), df.lacI', df.O1');

end

We can now plot and see the whole distribution! This is part of the power of the Bayesian approach. We don't have to stare only at one fixed value for our parameter, but we can get a complete distribtion of the possible values that the parameter can take.

% Plot the energies vs probability

% remember to exponentiate since we computed the log posterior!

plot(energies, exp(log_posterior))

% Label the axis

xlabel('$\Delta \varepsilon_r$ $[k_BT]$', 'Interpreter', 'Latex', 'fontsize', 20)

ylabel('$P(\Delta \varepsilon_r \mid D, I)$', 'Interpreter', 'Latex', 'fontsize', 20)

Remember that the scale of the y axis is meaningless since we are not taking into account the normalization factor.

Even when we can plot the whole distribution sometimes it is good to report a "summary". For posterior distributions that are nicely behaved as it is in our case people use the so-called Gaussian approximation to report a mean and a variance for the parameter.

This approximation consists of the following procedure: expand the logarithm of the posterior in a Taylor series around the maximum up to second order. In our case that would be

where the first order derivative was equal to zero since the expansion was done around the global maximum.

If we exponentiate both sides of the equation we obtain

where is just a normalization constant. This is exactly the functional form of a Gaussian distribution! If this is not clear to you note that the variance would be given by

Don't get lost with the math; this is indeed a very simple concept. Let us recapitulate once again what we just did:

From the log posterior probability we just performed a Taylor expansion around the maximum probability value, keeping up to the quadratic term. We then simply exponentiate this approximation finding out that what we end up with is a function that resembles a lot the Gaussian distribution where the inverse of the second derivative with respect to the parameter plays the role of the variance.

The benefit of using a numerical environment such as Matlab is that we don't have to worry about evaluating these nasty derivatives. We can simply ask Matlab to first find the value of the energy that maximizes the posterior probability and then compute also numerically the second derivative around this point!

Finding the energy that maximizes the posterior probability

Finding the parameter that maximizes our posterior probability amounts to an optimization problem in which we find the arguments (parameters) for which a function (the posterior distribution) is maximal. If we look at the form of the posterior, we see that the posterior is maximal when the sum

is minimal. So we need to find the value of that minimizes this sum. For this we define a function call resid to compute this sum.

resid = @(epsilon) df.O1' - fc(df.lacI', epsilon);

Then we use the function lsqnonlin that uses the Levenberg-Marquardt to minimize the sum of the squares of a function. For this we must give an initial guess epsilon_o that we chose based on the plot of the posterior distribution.

% select initial guess for the algorithm

epsilon_o = -15;

% run the non linear optimization

epsilon_O1 = lsqnonlin(resid, epsilon_o);

Local minimum found.

Optimization completed because the size of the gradient is less than
the default value of the optimality tolerance.

<stopping criteria details>

sprintf('energy with the max probability: %s kBT', epsilon_O1)

ans = energy with the max probability: -1.552064e+01 kBT

Once we have the local maximum we can compute the second derivative around this value using the hessian function.

syms e_r

fun = @(e_r) log_post(e_r, df.lacI', df.O1');

H = hessian(fun, e_r)

plot(log10(df.lacI), log10(df.O1), 'd')

hold on

plot(log10(10:1:1000), log10(fc(10:1:1000, epsilon_O1)))

hold off