Questions from TK 2020-04-01

★ Replies are in green.

The calculations are repeated for 8 random sets for each sub-sample size.

My current understanding is that the mean $m_i$ and correlation $C_{i,j}$ are more universal (over different samplings of the system) than the inferred $h_i$ and $J_{i,j}$ in determining the thermodynamic behavior of the system.

Show evidences and analysis to support your claim?? also average $m$ for your sub-samples as compared to the original sample? Do we see this effect in all the data set we got from them? Those data sets have different numbers of ROI. An assumption proved? — This will establish the fact that even though we may not have single neuron resolution as Bialek's, the data is not sensitive to it.

This comes from sub-sampling with different sizes. The $m_i$, $C_{i,j}$ distributions are close to the original while the $h_i$, $J_{i,j}$ distributions show more deviation. fss-0425-3t1_dists.svg Also, compare with the case of shuffling $J_{i,j}$: shuffle-hj-0425-3t1_dists.svg We see identical distributions of $h_i$ and $J_{i,j}$ with significant different $m_i$ and $C_{i,j}$ distributions. 0425-3t1-sd_shuffled_thermo1.svg At the same time, the specific heat curve also changes dramatically upon shuffling as shown to the right.

The PCA in your fig. 1 for both cases, show a good conservation for different sizes, although the two cases have different slope. So can these two mice in different state?

It seems the slopes for the scaling regimes of both cases are about $1/2$, which is also the value quoted in Bialek's talk. However, the deviation also is in the large-eigenvalue limit of the plots. Maybe, these largest eigenvalues are related to the functional states of the mice thus differ from each other. While, the scaling regime is related to some fundamental life dynamics of mice thus stays the same.

However, whether the distributions of $m_i$ and $C_{i,j}$ are enough or the network topology also plays an important role does still need to be checked. Shuffling or re-sampling $m_i$ and $C_{i,j}$ from the observed distributions is a way of checking this. However, they are tricky since the result may not be a valid combination that can be generated by a distribution of system configurations. This is mentioned in and they coped with it by checking the consistency of all marginal distributions of spin pairs of the system and redraw from the distributions if any of the marginal distributions is invalid. I have improved their method so that redrawing is not necessary and all marginal distributions of spin pairs are valid. However, this does not guaranty triplets or higher marginal distributions are all valid and the shuffled $m_i$ and $C_{i,j}$ can still be un-physical. This makes me wonder if this is one of the reason why this paper was not published. And, we need to think of different ways of perturbing the system to find the minimal criteria of when the collective properties of the system are preserved.

What do you mean by marginal distribution? I thought you just arbitrarily select 64, 32 neurons but keep the experimental data of $C_{i,j}$ and $m_i$. I thought that is what Kay did by cutting off the first 8 neurons or the last 4 neurons.

Randomly selecting neurons and keeping their $m_i$ and $C_{i,j}$ is kind of coarse-graining the network but preserving its structure. For selected neurons $i$ and $j$, their $m_i$, $m_j$ and $C_{i,j}$ are all preserved. But, if we are generating new values from the distributions of $m$ and $C$ then the original network structure will be lost.

The “marginal distribution” for a spin pair $\langle i,j\rangle$ is the joint probability distribution of $\sigma_i$ and $\sigma_j$, $P(\sigma_i,\sigma_j)$, which is obtained from the state distribution by integrating out the all other spins. \[ P(\sigma_i,\sigma_j) = \prod_{k\neq i,j}\left(\sum_{\sigma_k}\right) P(\sigma_1,\sigma_2,…,\sigma_N) \] With the marginal distribution $P(\sigma_i,\sigma_j)$ ($\geq 0$, since probability cannot be negative), we can get the values $m_i=P(1,0)+P(1,1)$, $m_j=P(0,1)+P(1,1)$, and $C_{i,j}=P(1,1)$. Since these values are not fully independent, random drawing can produce invalid values that no possible $P(\sigma_i,\sigma_j)$ exists. For example, if $m_i=1$ and $m_j=1$, then $C_{i,j}$ cannot be anything other than $1$.

In the paper you quoted, they enlarge the system to 120 neurons by a quote in page 10 “We thus generated several synthetic networks of 120 neuons by randomly choosing once more out of the distribution of $m_i$ and $C_{i,j}$ observed experimentally.” – have you done this to study large system?

I have attempted this. However, after the drawing, the Boltzmann learning that is performed to obtain $h_i$ and $J_{i,j}$ does not converge as well as the original system so I am leaving them running on the cluster. I will check the results after I am back in my office next week.

However, the fact that they are not converging well may be an indication that the generated $m_i$ and $C_{i,j}$ may not be fully valid since we only ensure its validity with spin pairs and not with higher order of spin combinations. When the values $m_i$ and $C_{i,j}$ are not valid, no solution of $h_i$ and $J_{i,j}$ can exactly reproduce $m_i$ and $C_{i,j}$ and there will be a none-zero minimum for the error.

We can still try to follow our previous method, keeping the rank and shuffle only $C_{i,j}$ within a small interval, similarly for $m_i$.

If we increase the size of the interval, then we will get to completely reshuffle. Then we can check how sensitive our result to the network.

This should be OK. But, as above, we need to check the validity of resulting combination of $m_i$ and $C_{i,j}$.

In this Bialek's paper for 40 neurons, a statement is different from what I saw before: when we randomize the fields hi and interactions Jij, we find that distribution of mean spike probabilities sigma_i changes dramatically, and as a result everything else about the network also changes(heat capacity, entropy, … ).

This is same as our result that distribution is not good enough! But I thought the 78 neuron paper only cares the distribution not the network, I cannot find out where I saw this, but I should have mentioned it to you? Do you know where to find that statement?


I couldn't find a corresponding statement in the 78-neuron paper, either.