[FRIAM] Universal and Accessible Entropy Estimation Using a Compression Algorithm

Mon Dec 9 19:04:47 EST 2019

I love it when papers answer the questions that 1st pop into my head. My 1st question was about the encoding and targeting the particular compression algorithm. And sure enough, they answer my question in the supplement:

> Compression algorithms are designed to minimally rep-resent a dataset’s alphabet (finite set of symbols, of which a sequence is composed). However, often data is recorded as continuous variables which contain insignificant digits that are effectively random and independent, due to noise or numerical inaccuracy.  Such digits render the dataset’s alphabet enormous.  In principle,SA asymptotically con-verges for any sized alphabet; however, for practical pur-poses, the required sample size increases dramatically. To treat the issue, we approximate a system’s entropy by the entropy of a projected system with discretized degrees of freedom  (i.e.,  coarse-graining),  for  which SA converges at much smaller sample size.

On 12/9/19 10:55 AM, Roger Critchlow wrote:
> What is the relationship between physical entropy and information?
> 
> Well, according to this, https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.123.178102, also available as a preprint, https://arxiv.org/pdf/1709.10164.pdf, and the press release from Tel Aviv University turning up here and there, you can compute an accurate estimate of the physical entropy of a molecular dynamics simulation by running zip compression on the coordinate trajectories of the simulation and looking at the compression achieved.
> 
> This makes perfect sense and it's amazing it took us this long to figure it out.