Notes about infomation theory basics.
In information theory, an entropy of a distribution , is captured by the following equation:
Or for single probalility:
Entropy is level of surprise experienced by someone who knows the true probability.
In order to encode data drawn randomly from the distribution , we need at least nuts to encode it.
Nut is the equivalent of bit but when using a code with base rather than one with base .
- is often also called the binary entropy.
Cross-Entropy from to , denoted , is the expected surprisal of an ovserver with subjective probalilities upon seeing data that was actually generated accroding to probalilities .
Kullback-Leibler Divergance, KL Divergance, or Relative Entropy, is the most commom way to measure the distance between two distributions. Which is simply the differece between the cross-entropy and the entropy.