Infomation Theory Basics

Notes about infomation theory basics.

Entropy

In information theory, an entropy of a distribution pp, is captured by the following equation:

H[p]=jp(j)logp(j)H[p] = \sum_j -p(j) \log p(j)

Or for single probalility:

H[y]=ylogyH[y] = - y \log y

Entropy is level of surprise experienced by someone who knows the true probability.
熵描述了一个变量的不确定性。

Nut and Bit

In order to encode data drawn randomly from the distribution pp, we need at least H[p]H[p] nuts to encode it.

  • Nut
    Nut is the equivalent of bit but when using a code with base ee rather than one with base 22.

    1nut=1log(2)1.44bit1 \, \text{nut} = \frac{1}{\log(2)} \approx 1.44 \, \text{bit}

  • H[p]2\frac{H[p]}{2} is often also called the binary entropy.

Cross-Entropy

Cross-Entropy from pp to qq, denoted H(p,q)H(p,q), is the expected surprisal of an ovserver with subjective probalilities qq upon seeing data that was actually generated accroding to probalilities pp.
交叉熵则描述了概率分布间的差异性。

H(p,q)=jp(j)logq(j)H(p,q) = \sum_j -p(j) \log q(j)

Kullback-Leibler Divergance

Kullback-Leibler Divergance, KL Divergance, or Relative Entropy, is the most commom way to measure the distance between two distributions. Which is simply the differece between the cross-entropy and the entropy.

D(pq)=H(p,q)H[p]=jp(j)logp(j)q(j)D(p||q) = H(p,q) - H[p] = \sum_j p(j) \log \frac{p(j)}{q(j)}

文章作者: Sheey
文章链接: https://sheey.moe/article/infomation-theory/
版权声明: 本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 Sheey的小窝