The Nature of Information IV

George Hrabovsky



In this article I will explore two theorems attributed, at least in part, to Claude Shannon.


What do we mean by the word code, or coding? A code is a method for converting a symbol in one set of symbols (called an alphabet) into another alphabet (or perhaps as a changed element of the same alphabet). In this way a code can be seen, mathematically, as a transformation or operator.

Essentially there are two broad categories of codes. The first is where you attempt to make the coding with as few bits as possible, thus reducing the entropy of the string of symbols. This is called source coding, or data compression.

The second kind is where you are trying to reduce the effect of noise over the communications path (or channel). You actually add bits (increase the entropy) in order to make it more likely that no errors will appear (see the Shannon-Mcmillan theorem, below). This is called channel coding, or error-correction.

Shannon's First Coding Theorem

This is essentially the basis of source coding. It exploits the Asymptotic Equipartion Property to state that samples taken from some distribution can, on average, be described by the entropic bits of H(x) rather than the bits from natinfo4_1.gif. In fact it is the difference between these that allows you to code the string of samples using N H(x) bits.

Shannon-McMillan Theorem

By exploiting the Law of Large Numbers, you can reduce the chance of an error in coding by using longer strings. The longer the string, the less chance of an error. The entropy,


gives the average number of bits required to describe a string drawn frm some distribution,


where natinfo4_4.gif is the information in seeing the event natinfo4_5.gif and the entropy, H(x), is the expected value of that information.

Spikey Created with Wolfram Mathematica 8.0