Notes about infomation theory basics.
Entropy
In information theory, an entropy of a distribution , is captured by the following equation:
Or for single probalility:
Entropy is level of surprise experienced by someone who knows the true probability.
熵描述了一个变量的不确定性。
Nut and Bit
In order to encode data drawn randomly from the distribution , we need at least nuts to encode it.
- Nut
Nut is the equivalent of bit but when using a code with base rather than one with base . - is often also called the binary entropy.
Cross-Entropy
Cross-Entropy from to , denoted , is the expected surprisal of an ovserver with subjective probalilities upon seeing data that was actually generated accroding to probalilities .
交叉熵则描述了概率分布间的差异性。
Kullback-Leibler Divergance
Kullback-Leibler Divergance, KL Divergance, or Relative Entropy, is the most commom way to measure the distance between two distributions. Which is simply the differece between the cross-entropy and the entropy.