Desurvire E. Classical and Quantum Information Theory: An Introduction for the Telecom Scientist

Подождите немного. Документ загружается.

Appendix E (Chapter 6) From discrete

to continuous entropy

In this appendix, we ﬁnd how the two entropy deﬁnitions in the discrete-source and

continuous-source cases connect.

For the discrete case, we have, by deﬁnition:

H(X ) =−



∈X

p(x

)logp(x

), (E1)

and for the continuous case:

H(X



) =−



(x)log p



(x)dx, (E2)

where the discrete and continuous distributions, which relate to the sources X (discrete)

and X



(continuous) are called p(x

) and p



(x), respectively.

Considering the continuous function p(x), we can decompose its integration domain

into small bins of width , which we label with the index j . The variable x belongs to

the bin if the condition j  ≤ x < ( j + 1) is satisﬁed. The width  is chosen small

enough so that in any bin j there exists a value x

,forwhich

( j+1)



j



(x)

x ≡ p



) (E3)

(note that there is no functional relation between the discrete distribution p(x

) and the

continuous distribution p



) at points x

of the integration domain). Equation (E3),

however, deﬁnes a discrete distribution, which we shall call p



) = p



). Such

a distribution carries over the discrete set X



={x

}, for which Eq. (E3) is satisﬁed.

According to the additivity property of the integrals, this distribution satisﬁes:



∈X



) =



(x)

x = 1(E4)

(we note that the left-hand side of this equation is called the Riemann integral of the

continuous function p



(x)). We have, thus, obtained a strict equivalence between the

continuous distribution p



(x) and a discrete distribution p



). We, thus, expect that

See also: T. M. Cover and J. A. Thomas, Elements of Information Theory (New York: John Wiley & Sons,

1991). D. Feldmann, A Brief Tutorial On: Information Theory, Excess Entropy and Statistical Complexity

(2002), available online at http://hornacek.coa.edu/dave/Tutorial/index.html.

588 Appendix E

their respective entropies become very nearly equal if the bin interval  is chosen

sufﬁciently small. The surprise is that it this not all the case, as I show next.

The entropy H (X



) of the discrete distribution p



related to the source X



is deﬁned

according to Eq. (E1):

H(X



) =−



∈X



)logp(x

)(E5)

Substituting in Eq. (E5) the deﬁnition p



) = p



), and using the property in

Eq. (E4), we obtain:

H(X



) =−



∈X





) log[ p



)]

=−



∈X





)logp



) − log 



∈X





)

≡−



∈X





)logp



) − log .

(E6)

In Eq. (E6), the ﬁrst term represents the Riemann integral of the function −p(x)log p



(x).

This Riemann integral converges to the integral −

p(x)logp



(x)dx = H(X



)asthe

bin size is made to vanish ( → 0), which is the continuous entropy. But in such a

limit, the second term in −log  becomes inﬁnite! This divergence reﬂects the fact that

what we have done is an n-bit quantization of the continuous function p



(x). In such a

quantization, the number of bits is n =−log  (or  = 2

−n

). Thus, the result in Eq.

(E6) shows that the relation between the discrete entropy H (X



) and what we have

deﬁned to be the continuous entropy H(X



) is actually

H(X



) = H (X



) + n. (E7)

The quantity H(X



) + n can be interpreted as the number of bits required (on average)

to describe the continuous random variable X



with n-bit accuracy. To understand what

“n-bit accuracy” means, consider that  = 1, or n = 0, corresponds to integers. Then

 = 1/2, or n = 1, corresponds to numbers increasing by steps of 0.5. It takes 1 bit of

extra accuracy to specify whether x is rounded to an integer I or to I + 0.5. Then  =

1/4orn = 2 corresponds to numbers increasing by steps of 0.25. It takes 2 bits of extra

accuracy to specify the value of x out of the four rounded cases I, I + 0.25, I + 0.5,

or I + 0.75. Thus, n-bit accuracy corresponds to the number of extra bits required to be

accurate within any incremental power of

. The divergence in −log , which makes

the discrete-case entropy inﬁnite, reﬂects the fact that it takes an inﬁnite number of bits

to discretize a continuous set of variables accurately.

Appendix F (Chapter 8) Kraft–McMillan

inequality

Assume a preﬁx code made of a set of N codewords c

, c

...c

of various lengths

, l

...l

satisfying l

≤ l

≤···≤l

If the code is binary, the Kraft–McMillan inequality is



k=1

−l

≤ 1, (F1)

and for M-ary code:



k=1

−l

≤ 1. (F2)

To prove this inequality, I shall use the demonstration of Proakis (2001),

one of several

other variations to be found in the literature. For simplicity, we consider the case M = 2,

but the arguments developed are valid for all M.

We ﬁrst construct a binary tree of order l = l

,asshowninFig. F1 (assuming here,

for instance, l = 4).

As the ﬁgure illustrates, the tree has 2

terminal nodes or “leaves” (here 2

= 16

terminal nodes). The tree also has 2

− 1 = 15 branching nodes. Each node is located

on a uniquely deﬁned path, labeled by the 0 and 1 signs at each splitting. For instance,

the location of terminal node A is deﬁned by the path labeled 1011. We call the node

order the length of its path label (e.g., node A is of order four). The idea is to assign each

of the codewords c

(length l

) to any of the nodes of the tree (branching or terminal),

which is of order l

. Such an assignment is continued until there is no codeword left.

Assume that there are only ﬁve codewords (N = 5), with lengths l

= 1, l

= 2, l

= 3,

and l

= l

= 4. As shown in Fig. F2, the codeword c

could, thus, be assigned (for

instance) to the order-1 node deﬁned by the path 1.

Because the code is a preﬁx code, c

cannot be the preﬁx to any other codewords,

meaning that this choice eliminates all the subsequent nodes found in the path beginning

with 1 (connected in the ﬁgure by dashed lines). We continue by assigning c

to any

available order-2 node, and so on until c

, as illustrated in the ﬁgure. Based on this

example, it is easy to observe that each assignment of codeword c

with length l

eliminates 2

l−l

terminal nodes. The total number of terminal nodes eliminated is,

J. G. Proakis, Digital Communications, 4th edn. (New York: McGraw Hill, 2001).

590 Appendix F

Branching node

Terminal node

Figure F1 Binary tree of order l = l

Figure F2 Binary tree for codewords c

, c

,andc

therefore,



k=1

l−l

≤ 2

, (F3)

where 2

(to recall) is the total number of terminal nodes. The result of (F3) gives



k=1

−l

≤ 1, (F4)

which is the Kraft–McMillan inequality.

Appendix G (Chapter 9) Overview of

data compression standards

This appendix provides a brief overview of common data compression standards used for

sounds, texts, ﬁles, images, and videos. The description is just meant to be introductory

and makes no pretense of comprehensively deﬁning the actual standards and their current

updated versions. The list of selected standards is also indicative, and does not reﬂect the

full diversity of those available in the market, as freeware, shareware, or under license.

It is a tricky endeavor to attempt a description here in a few pages of a subject that

would ﬁll entire bookshelves. The hope is that the reader will get a ﬂavor and will be

enticed to learn more about this seemingly endless, yet fascinating subject. Why put this

whole matter into an appendix, and not a fully ﬂedged chapter? This is because this set

of chapters is primarily focused on information theory, not on information standards.

While the ﬁrst provides a universal and slowly evolving background reference, like

science, the second represents practically all the reverse. As we shall see through this

appendix, however, information standards are extremely sophisticated and “intellectually

smart,” despite being just an application ﬁeld for the former. And there are no telecom

engineers or scientists who may ignore or will not beneﬁt from this essential fact and

truth!

Sounds

Speech is historically the ﬁrst type of information that has been subject to coding

and compression. The need for speech coding has ﬁrst come from the development

of telephony, with the introduction of digitally sampled voice progressively replacing

the old analog telephone service. The beneﬁts of digital voice are essentially twofold:

(a) a better sound quality for the users, perceived as free from background noise and

interference, and (b) owing to the high ﬁdelity of the transmission, and in particular

to error-correction coding, the possibility for the telephone operator to compress and

multiplex together several voice channels in the same time slot (referenced to as time-

division multiplexing, or TDM). But there are even more important beneﬁts for the

telephone operator, for managing the voice trafﬁc, in terms of switching, multiplexing,

provisioning, and servicing. Another key application is the possibility to mix voice and

computer data in the same telephone line, which ﬁrst appeared under the name of ISDN

(integrated services digital networks), as the precursor of our current Internet.

592 Appendix G

2-channel audio (2 ¥ 2 bytes)

CRC

Bytes 24 8 1 = 33

Subcode

Figure G1 Schematic representation of standard audio-CD frame.

The standard of digital voice for telephony, which was released by ITU-T in 1972,

is referred to as G.711.

Analog-to-digital (A–D) voice conversion is based on the

technique of pulse-code modulation (PCM).

It is an uncompressed, lossless code,

which converts 8000 samples per second into eight-bit codewords, resulting in a channel

rate of 8 × 8000 = 64 kbit/s. The two main sampling algorithms used in PCM are the

A-law,

as used in Europe, and the µ-law,

as used in North America and Japan. My

earlier work gives more details about these sampling algorithms and their elaborated

variants.

This book also describes the digital-voice multiplexing (TDM) standards used

by telephone operators, from the early plesiosynchronous digital hierarchy (PDH), to the

current synchronous digital hierarchy (SDH or SONET). While the original bit rate has

been set to 64 kbit/s, voice channels can also be encoded into 16–32 kbit/s, with the same

perceived sound quality, and, using more sophisticated algorithms, even down to some

2 kbit/s, with reasonably acceptable quality. There exist many algorithms to perform

digital-voice compression, for instance removing the silences in phone conversation

(a 40% bandwidth saving!), or predicting coding of voice patterns. Nowadays, the

majority of developed countries use digital telephony, whether based on wireline or

wireless network systems. With today’s Internet, digital telephony is almost a mere

commodity, which a number of operators offer for free, regardless of connection time

and reach.

Music represents a second major application area of digital coding. The revolution,

called by some the “big bang” in digital audio, came in the late 1970s with the audio CD

(compact disk), as originated from a joint standardization effort of Sony and Philips.

The resulting audio-CD standard has been published in the Red Book.

As with speech,

the A–D conversion is achieved with PCM, here with a two-channel (stereo effect), each

made of 16-bit or two-byte codewords, at a sampling rate of 44.1 kHz. The ﬁrst level of

block code is called a frame. As illustrated in Fig. G1, the audio-CD frame includes six

stereo samplings (2 ×2 ×6 = 24 bytes), an eight-byte error-correction ﬁeld (CRC), a

“subcode” byte for control and display purposes, e.g., telling to the CD player which

See, for instance: http://en.wikipedia.org/wiki/G.711.

See, for instance: http://en.wikipedia.org/wiki/Pulse-code_modulation.

See, for instance: http://en.wikipedia.org/wiki/A-law_algorithm.

See, for instance: http://en.wikipedia.org/wiki/Mu-law_algorithm.

E. Desurvire, Wiley Survival Guides in Global Telecommunications, Signaling Principles, Network Proto-

cols, and Wireless Systems (New York: J. Wiley & Sons, 2004).

See, for instance: http://en.wikipedia.org/wiki/Compact_disk; http://searchstorage.techtarget.com/

sDeﬁnition/0,,sid5_gci503642,00.html; www.answers.com/topic/compact-disc-2.

See, for instance: www.mpeg.org/MPEG/DVD/Red_Book/CD.html.

Overview of data compression standards 593

song or track is currently being read. This makes the frame 24 +8 +1 = 33 bytes

long altogether, as seen from the ﬁgure. The frame is then processed according to

the following. Each audio byte is ﬁrst converted into a 14-bit EFM codeword (eight-

to-fourteen modulation, also called 8/14 code).

The EFM code expansion ensures

that each 1 bit to be physically recorded on the CD is surrounded by at least two

0 bits (up to a maximum of ten 0 bits), which is a necessary condition for mechanical

tracking, phase-locking, and synchronization purposes.

The 14-bit EFM codeword is

then interleaved with a three-bit merging word, and ﬁnally appended with a unique,

24- or 27-bit synchronization codeword, acting as a frame delimiter. The whole frame

conversion results in 33 ×(14 +3) +27 = 588 bits. Like any computer disk, the CD is

organized into sectors, each sector containing 98 frames. The aforementioned one-byte

subcode, thus, provides eight channels with 98 bit/sector to ensure various functions,

like track monitoring and timing or indexing information within tracks. The standard

reading speed being set to 75 sectors per second, we obtain the channel rate:

588 bit/frame × 98 frame/sector × 75 sector/second = 4.32180 Mbit/s.

After EFM demodulation and decoding, overhead removal, and error correction, the

user (or payload) channel rate is reduced to 2 × 16 bit ×44.1kHz = 1.4112 Mbit/s.

The physically recorded program area is 86.05 cm

, with a track pitch of p = 1.5–

1.6 µm. Thus, the full length of the track (referred to as the “recordable spiral”) is

l = 86.05 cm

/ p × 10

−4

cm = 5.73–5.38 km. With a standard scanning velocity of v =

1.2 m/s, the corresponding playing time is t = l/v = 75–80 minutes. At the user rate,

this corresponds to 1.470 (Mbit/s) × t(s) = 6615–7056 Mbit or 826–882 Mbyte, which,

after some extra sector or error-correction overhead (representing 13%),

yielding 720–

767 Mbytes, is quite close to the storage capacity offered by current CD-ROM vendors

(750–800 Mbytes). The CD-ROM standard will be described later under the heading

“Files.” Why this 75–80 minute duration for audio CDs? One alleged explanation is that

CDs had to be able to play the entire Beethoven’s 9th symphony, which took exactly

74 minutes in the slowest recording at the time.

Any program that performs digital encoding and decoding is referred to as a codec

(short for coder–decoder, compressor–decompressor, or compression–decompression

algorithm). Thus, audio codecs represent the family of codes and algorithms that generate

digital audio ﬁles and restore the sound to the human ear. As we have seen earlier, the

audio codec used in CDs is based on PCM with 44.1 kHz sampling rate and 2 ×16-

bit codewords (the factor of two being for stereo-effect purposes). To recall, PCM is

an uncompressed, lossless code, which yields maximum sound quality (referred to as

CD-quality) for professionals or audiophiles, but necessarily comes with relatively large

sizes for the digital-audio ﬁles. Microsoft and IBM have adapted this uncompressed-

PCM audio codec for use in home computers, under the brand name WAV (short for

See, for instance: http://en.wikipedia.org/wiki/Eight-to-Fourteen_Modulation.

See, for instance: www.physics.udel.edu/∼watson/scen103/efm.html.

See, for instance: http://en.wikipedia.org/wiki/A-law_algorithm.

http://en.wikipedia.org/wiki/Compact_disk; http://searchstorage.techtarget.com/sDeﬁnition/0,,sid5_

gci503642,00.html; www.answers.com/topic/compact-disc-2.

594 Appendix G

waveform). The data are recognized by the .wav ﬁlename extension. Any audio ﬁle can

be compressed using lossy codecs, typically up to an 80% compression rate,

with the

original sound quality being lost forever, yet producing acceptable sound restitution,

depending on the music, audience, and utilization. See further on for this topic. In

contrast, lossless audio compression fully preserves the original CD quality. Two key

examples of lossless compression codes that can be used for music are the popular ZIP

(see further), which can achieve 10–20% compression, and FLAC (short for free lossless

audio codec), which can achieve over 40% compression.

In short, FLAC is based on

two algorithms: linear predictive coding (LPC)

and run-length encoding (RLE).

The

ﬁrst makes it possible to decompose acoustic spectra into a reduced list of parameters,

which in the reverse implementation can faithfully reproduce the original sounds. Linear

predictive coding was originally developed for speech analysis, low bit-rate compression

and re-synthesis. Key applications of LPC include cellular telephony (GSM), speech

recognition, and electronically synthesized music. The principle of RLE is to replace

sequences of repeated codewords (called “runs”) with the codewords preceded by their

counts. For instance, the sequence XXXXXXXXXXYYYYYY is readily compressed

into 10X6Y. This provides additional sound compression, for instance with silent or

monotone passages. The LPC parameters are stored by means of Rice–Golomb codes,

which are described in Chapter 10. Next to FLAC, there actually exists a wealth of

lossless audio codecs with compression rates near 40% or better, such as: WavPack,

ALAC, Monkey’s Audio, OptimFROG, Shorten, WMA, LA, TTA, LPAC, MPEG4 ALS,

Real Lossless, Shorten, MUSICompress/WaveZIP, AudioZip, WaveArc, Pegasus SPS

(ELS-Ultra), Sonarc, WavPack, and RKAU. Several comparative-merit lists for these

different codecs are available.

Lossy data compression

turns out most useful in any application where the full

integrity of the original information does not have to be preserved. The key beneﬁts

that outweigh integrity are manifold: reduced ﬁle sizes (or fuller use of available stor-

age or disk space); faster ﬁle transmission (or up- and downloading on the Internet);

faster encoding and decoding processing for real-time or streaming applications. Key

applications of lossy compression concern audio, images, and video ﬁles. Concerning

audio ﬁles, an underlying principle of lossy compression is based on psychoacoustic

Meaning that the size S of the compressed data is 20% of the size U of the uncompressed, source data.

Note that the general convention is to deﬁne the compression as the ratio S/U , not 1 − S/U , which scales

the opposite way as the ratio.

The compression rate is in fact dependent on the type of music source. It can be 30–40% for pop,

rock, techno, and other loud, noisy music, and 40–60% for quieter choral and orchestral pieces, see:

www.ﬁrstpr.com.au/audiocomp/lossless/#Links.

There exist many Internet sites and tutorials for LPC, see for instance: www.data-compression.com/

speech.shtml; www.answers.com/topic/linear-predictive-coding; http://cnx.org/content/m12473/latest/.

See: http://ﬂac.sourceforge.net/features.html; www.answers.com/topic/ﬂac; http://en.wikipedia.org/wiki/

FLAC.

http://wiki.hydrogenaudio.org/index.php?title=Lossless_comparison; http://ﬂac.sourceforge.net/

comparison.html; www.compression-links.info/Lossless_Audio_Coding; http://members.home.nl/w.

speek/comparison.htm.

http://en.wikipedia.org/wiki/Lossy_data_compression.

Overview of data compression standards 595

Data

Header

…

MP3 frame

Metadata

ID3/APEv2

4 bytes

(418 bytes)

Figure G2 MP3 frame sequence.

analysis,

which capitalizes on the limitations of the human ear and the subjective

perception and interpretation of sounds. For instance, sounds can obscure or mask each

other in both time (forward or backward time masking) and frequency (frequency mask-

ing). Also, weak sounds in the frequency spectrum may be masked by louder sounds

at other frequencies. In all cases, the information concerning masked sounds can safely

be removed without any audible loss of quality. The compression algorithm may also

give priority to the sounds situated well within the audible range, which conveys an

idea of the power of psychoacoustic modeling. Most lossy audio codecs, which include

MP3, Ogg Vorbis, WMA, Musicam, LAME, and ATRAC, for instance, are based on such

principles. The resulting compressed ﬁles typically represent 10–12% of the original

audio recordings. Such an unparalleled feature opened new perspectives for broadcast-

ing, downloading, and peer-to-peer sharing music over the Internet (overlooking, here,

tricky issues of copyright, illegal copying, and piracy!), and in consumer electronics,

such as portable music players. The key differences between these codecs are not only

measured in compression and quality performance, or by their operating-system com-

patibility, but also in the fact that they are either “proprietary,” to be implemented under

license, or alternatively offered as open-source programs, left to any manufacturer or

private user to implement freely or even to modify.

As of today (2006), the patented and standardized MP3,

which appeared in the

mid 1990s, is still by far the most popular. The name is short for MPEG-1/2 audio

layer 3. The encoded MP3 ﬁles (having the ﬁlename extension .mp3) are made of a

sequence of independent of MP3 frames, as illustrated in Fig. G2. A given sequence

may be “encapsulated” by heading metadata ﬁle (whose format is referred to as ID3

or APEv2). This tag contains information, such as the music title, author, artist, and

track number. Each frame has an MP3 header ﬁeld and an MP3 data ﬁeld. The four-

byte header contains a sync-marker word (12 bits), the number of the MPEG and

layer versions (3 bits), the bit rate (4 bits, e.g., 1010 = 160 bit/s, for instance), the

sampling frequency (2 bits, e.g., 00 = 44.1 kHz), and other size information. There are

lots of available bit rates in the two MPEG-1/2 standards, with B = 128 kbit/s or B =

192 bit/s being most often chosen as de facto values. The bit rate can also be varied from

http://en.wikipedia.org/wiki/Psychoacoustic_model; www.sfu.ca/sonic-studio/handbook/Psychoacoustics.

html; www.binaural.com/serendipity/index.php?/archives/62-Tutorial-The-Psychoacoustics-of-

Multichannel-Audio.html.

See, for instance: http://en.wikipedia.org/wiki/MP3; www.mp3-tech.org/; www.mpeg.org/MPEG/mp3.

html#overview; www.pcmag.com/encyclopedia_term/0,,t=mp3&i=47286,00.asp.

596 Appendix G

one MP3 frame to the next, which allows one to allocate more bits to the most dynamic

music segments (i.e., with more complex spectral movements) and fewer bits in the

less dynamic ones. Thus, a rate of 224 kbit/s would be used for a symphonic orchestra,

while 48 kbit/s is sufﬁcient for music made of pure frequency tones. These ﬁgures are

to be compared with the bit rate of uncompressed CD-quality recording, which, as

we have seen is 1.411 Mbit/s, which explains the aforementioned 10×to 12× average

compression improvement. As for the MP3 data ﬁeld, the size depends on the number of

time-frequency samples or the sampling rate. For the S = 44.1 kHz sampling-rate case,

there are exactly 1152 one-byte samples in the data ﬁeld.

The byte size is eventually

given by the formula 1152 × B/(8 × S), or 418 bytes with B = 128 kbit/s. However, the

data are then compressed through Huffman coding (see Chapter 9), which is optimized

for each individual frame’s payload. This does not allow one to predict the actual byte

size of the resulting MP3 data ﬁeld, or the actual compression rate, which is payload

dependent.

Dataﬁles

Unlike with digital audio, image, and video ﬁles, which can suffer lossy compression

without loss of perceived quality, most computer data require 100% ﬁdelity in compres-

sion processing. Hence, the importance of lossless compression, each with algorithms

being ideally suited to the type of source for the most efﬁcient ﬁle squeezing. We are

now talking about ﬁle archiving and packaging, to make the best use of our computer

memory space and save time in loading and transmitting ﬁles, in particular when it

comes to email. The inventory of existing licensed or open-source dataﬁle compres-

sion standards is quite substantial.

Most home-computer users, regardless of operating

systems, are familiar with the shareware ZIP (ﬁle extension .zip and many other vari-

ants), which has evolved from a complex IP-litigation history.

The 1989-originated

root algorithm, PKZIP, is now used in programs with mutually supported formats called

WinZip, BOMArchiveHelper, KGB Archiver, PicoZip, Info-Zip, WinRar, 7-Zip, Izarc,

and ALZip, to quote only a few.

Zooming in further, PKZIP is based on the algo-

rithms DEFLATE (for compression) and INFLATE (for decompression), which both

use a combination of Lempel–Ziv coding (LZ77, see Chapter 10) and Huffman coding

(see Chapter 9). Based on DEFLATE, the ﬁle archiver GZIP (short for GNU ZIP, with

ﬁlename extensions .gz,.tgz,.tar.gz, not to be confused with ZIP)

was developed

in the early 1990s as a freeware, circumventing its patented predecessors. Its format

includes a ten-byte header (version, timestamp), some optional extra headers (e.g., to

include the original ﬁle name), a DEFLATE body of compressed data, and an eight-byte

See for instance: www.compuphase.com/mp3/sta013.htm#MP3FRAMEHDR.

See, for instance: http://en.wikipedia.org/wiki/List_of_archive_formats; for a full list, see: http://en.

wikipedia.org/wiki/List_of_ﬁle_archivers.

See, for instance: http://en.wikipedia.org/wiki/ZIP_%28ﬁle_format%29.

See, for instance: http://en.wikipedia.org/wiki/ZIP_ﬁle_format.

See, for instance: http://en.wikipedia.org/wiki/Gzip.