Entropy and Information Theory: First Edition
This book is devoted to the theory of probabilistic information measures and their application to coding theorems for information sources and noisy channels. The eventual goal is a general development of Shannon’s mathematical theory of communication, but much of the space is devoted to the tools and methods required to prove the Shannon coding theorems. These tools form an area common to ergodic theory and information theory and comprise several quantitative notions of the information in random variables, random processes, and dynamical systems.
Examples are entropy, mutual information, conditional entropy, conditional information, and relative entropy (discrimination, Kullback-Leibler information), along with the limiting normalized versions of these quantities such as entropy rate and information rate. When considering multiple random objects, in addition to information we will be concerned with the distance or distortion between the random objects, that is, the accuracy of the representation of one random object by another. Much of the book is concerned with the properties of these quantities, especially the long term asymptotic behavior of average information and distortion, where both sample averages and probabilistic averages are of interest.
The book has been strongly influenced by M. S. Pinsker’s classic Information and Information Stability of Random Variables and Processes and by the seminal work of A. N. Kolmogorov, I. M. Gelfand, A. M. Yaglom, and R. L. Dobrushin on information measures for abstract alphabets and their convergence properties. Many of the results herein are extensions of their generalizations of Shannon’s original results.
The mathematical models of this treatment are more general than traditional treatments in that nonstationary and nonergodic information processes are treated. The models are somewhat less general than those of the Soviet school of information theory in the sense that standard alphabets rather than completely abstract alphabets are considered. This restriction, however, permits many stronger results as well as the extension to nonergodic processes. In addition, the assumption of standard spaces simplifies many proofs and such spaces include as examples virtually all examples of engineering interest.