Below are three different "languages," meaning the result of a series
of "experiments" in each of the following scenarios:
- You have an unbalanced coin that is twice as likely to roll a head
as a tail.
- You have a six-sided die (one of a pair of dice) that is
lopsided. Rolls of 1 and 2 are equally likely; so are rolls of 3
and 4; and so are rolls of 5 and 6. However, the die rolls a 1
twice as often as a 3, and roll a 5 twice as often as a 1.
- You work as a programmer for the Goldilocks Porridge Polling
Company. Company pollsters ask the local bear population about
porridge preferences, both in variety (sweetened or unsweetened) and
serving temperature (hot, cold, just-right). Thus, any respondent can
have one of six preference profiles. Surprisingly, preferences in
variety and preferences in temperature are entirely independent of one
another. But sweetened porridge is twice as popular as unsweetened.
Hot and cold are equally popular, but just-right is twice as popular
as either of the others.
The goal is to come up with encodings for each of these languages
which are streaming, lossless and uniquely decodable. For each
language, answer the following six questions:
Also, try this one: Your genetic code (DNA) uses a language of four
bases (A, C, T, G). Each of twenty amino acids is coded with a
sequence of three bases. The encoding is redundant since various
possible sequences code for the same amino acid. Assume that all
bases and encodings are equally likely. What is the entropy of this
language? (I haven't done this one, so I'm not sure how it comes
- What are the symbols in the language?
- Compute the entropy of this language. (Please write down the
instance of the formula; you don't have to compute a final numeric
- What is the naive encoding for this language?
- Devise an encoding with the required properties and does better
on average than the naive encoding.
- Can your encoding ever be worse than the naive encoding? Explain
- Provide a rigorous argument that your encoding is better on
average than the naive encoding.