The Nature of Information III

George Hrabovsky

MAST

Introduction

In this article we examine some odds and ends about the nature of information.

Some Rules About Entropy

We will examine some more of the mathematics of entropy. We still do not have a good grasp of what entropy is other than some function of the number of pieces of information. The more ways that  information can be combined, the higher the entropy. A good example of this idea is to think of the letter tiles in a Scrabble game. If you take the titles and throw them into a bag what is the chance that you will not get a meaningful word by drawing them at random? For a small number of tiles the chance of disorder is lower than with a larger number of tiles. In other words it becomes harder to make sense out of a large amount of random information, than a small amount. Another way of saying it is with a large quantity of information there is more a chance of getting gibberish.

Having said that, here are some principles about the entropy of information:

H(p) is continuous in p. This means that small changes in a probability distribution will produce small changes in the entropy.

H(p) >= 0. This means that entropy never gets smaller. You always get more information than you started unless you already know everything, when H(p) = 0.

H(p)≤ c. This means that the entropy is equal to or less than some parameter which is itself based on specific values that can be obtained by some distribution. The entropy is equal to the parameter when all values are equally likely. Another way of interpreting this is that when there are a lot of options you are less likely to be able to predict what will happen next.

Bits and Nats

I previously stated that the bit is the fundamental unit of information. The bit is actually the unit of the entropy of information, and this is only true if the information is binary, that is base-2. This is usually interpreted as on or off, thus there are two possible states for any piece of information to be in; true or false, yes or no, etc. There is another base we can use, called the natural base, or base e. In this base the unit of entropy is the nat.

Strings of Samples

Let’s say that you have a string of Nsamples of some variable, say x, each drawn from some probability distribution natinfo3_1.gif. This string is denoted,

natinfo3_2.gif

In that case you will discover that there is a certain number of instances where the natinfo3_3.gif value appears. The probability of this instance is natinfo3_4.gif, and these probabilities are independent of each other. As such the probability to see the string is given as the product,

natinfo3_5.gif

this can be rewritten, assuming that there are M different values,

natinfo3_6.gif

Asymptotic Equipartition Property

One useful analysis technique is to look at the behavior of a function for extremely large values. Since we have a power, this will become very large indeed! Let’s reduce this, without changing the behavior, by taking its logarithm,

natinfo3_7.gif

natinfo3_8.gif

We can then divide both sides by natinfo3_9.gif,

natinfo3_10.gif

natinfo3_11.gif

Previously we defined the entropy as,

natinfo3_12.gif

This is a result of the law of large numbers from probability theory.

natinfo3_13.gif

Using base 2 and inverting the previous result we get,

natinfo3_14.gif

How do we interpet this? The chance of finding a long string is independent of the elements of the string! This is the Asymptotic Equipartion Property, also called AEP.  The inverse of this,

natinfo3_15.gif

is the effective number of strings of the given length. The actual number M, is

natinfo3_16.gif

The difference between M and natinfo3_17.gif is the reason behind data compression schemes.

Spikey Created with Wolfram Mathematica 8.0