Proceedings of the IEEE, 2000, -63 pp.
During the last decade, CD-quality digital audio has essentially replaced analog audio. Emerging digital audio applications for network, wireless, and multimedia computing systems face a series of constraints such as reduced channel bandwidth, limited storage capacity, and low cost. These new applications have created a demand for high-quality digital audio delivery at low bit rates. In response to this need, considerable research has been devoted to the development of algorithms for perceptually transparent coding of high-fidelity (CD-quality) digital audio. As a result, many algorithms have been proposed, and several have now become inteational and/or commercial product standards. This paper reviews algorithms for perceptually transparent coding of CD-quality digital audio, including both research and standardization activities.
This paper is organized as follows. First, psychoacoustic principles are described, with the MPEG psychoacoustic signal analysis model 1 discussed in some detail. Next, filter bank design issues and algorithms are addressed, with a particular emphasis placed on the modified discrete cosine transform, a perfect reconstruction cosine-modulated filter bank that has become of central importance in perceptual audio coding. Then, we review methodologies that achieve perceptually transparent coding of FM- and CD-quality audio signals, including algorithms that manipulate transform components, subband signal decompositions, sinusoidal signal components, and linear prediction parameters, as well as hybrid algorithms that make use of more than one signal model. These discussions concentrate on architectures and applications of those techniques that utilize psychoacoustic models to exploit efficiently masking characteristics of the human receiver. Several algorithms that have become inteational and/or commercial standards receive in-depth treatment, including the ISO/IEC MPEG family (-1, -2, -4), the Lucent Technologies PAC/EPAC/MPAC, the Dolby1 AC-2/AC-3, and the Sony ATRAC/SDDS algorithms. Then, we describe subjective evaluation methodologies in some detail, including the ITU-R BS.1116 recommendation on subjective measurements of small impairments. This paper concludes with a discussion of future research directions.
Introduction.
Generic Perceptual Audio Coding Architecture.
Paper Organization.
Psychoacoustic Principles.
Absolute Threshold of Hearing.
Critical Bands.
Simultaneous Masking, Masking Asymmetry, and the Spread of Masking.
Nonsimultaneous Masking.
Perceptual Entropy.
Example Codec Perceptual Model: ISO 11172-3 (MPEG-1) Psychoacoustic Model 1.
Time-Frequency Analysis: Filter Banks and Transforms.
Filter Banks for Audio Coding: Design Considerations.
Cosine Modulated Pseudo —QMF M-Band Banks.
Cosine Modulated PR M-Band Banks and the MDCT.
Pre-Echo Distortion.
Pre-Echo Control Strategies.
Transform Coders.
Optimum Coding in the Frequency Domain (OCF-1, OCF-2, OCF-3).
Perceptual Transform Coder (PXFM).
Brandenburg–Johnston Hybrid Coder.
CNET Coder.
ASPEC.
DPAC.
DFT Noise Substitution.
DCT with Vector Quantization.
MDCT with Vector Quantization.
Subband Coders.
MASCAM.
MUSICAM.
Wavelet Decompositions.
Adapted Wavelet Packet Decompositions.
Hybrid Harmonic/Wavelet Decompositions.
Signal-Adaptive, Nonuniform Filter Bank (NUFB) Decompositions.
IIR Filter Banks.
Sinusoidal Coders.
Analysis/Synthesis Audio Codec.
Harmonic and Individual Lines Plus Noise Coder.
FM Synthesis.
Hybrid Sinusoidal Coders.
Linear-Prediction-Based Coders.
Multipulse Excitation.
Discrete Wavelet Excitation Coding.
Sinusoidal Excitation Coding.
Frequency Warped LP.
Audio Coding Standards.
ISO/IEC 11172-3 (MPEG-1) and ISO/IEC IS13818-3 (MPEG-2 BC).
ISO/IEC IS13818-7 (MPEG-2 NBC/AAC).
ISO/IEC 14 496-3 (MPEG-4).
Precision Adaptive Subband Coding.
Adaptive Transform Acoustic Coding.
Sony Dynamic Digital Sound (SDDS).
Lucent Technologies Perceptual Audio Coder (PAC), Enhanced PAC (EPAC), and Multichannel PAC (MPAC).
DOLBY AC-2, AC-2A.
Quality Measures for Perceptual Audio Coding.
Subjective Quality Measures.
Confounding Factors in Subjective Evaluations.
Subjective Evaluations of Two-Channel Standardized Codecs.
Subjective Evaluations of 5.1-Channel Standardized Codecs.
Conclusion.
Summary of Applications for Commercial and Inteational Standards.
Summary of Recent Research and Future Research Directions.
During the last decade, CD-quality digital audio has essentially replaced analog audio. Emerging digital audio applications for network, wireless, and multimedia computing systems face a series of constraints such as reduced channel bandwidth, limited storage capacity, and low cost. These new applications have created a demand for high-quality digital audio delivery at low bit rates. In response to this need, considerable research has been devoted to the development of algorithms for perceptually transparent coding of high-fidelity (CD-quality) digital audio. As a result, many algorithms have been proposed, and several have now become inteational and/or commercial product standards. This paper reviews algorithms for perceptually transparent coding of CD-quality digital audio, including both research and standardization activities.
This paper is organized as follows. First, psychoacoustic principles are described, with the MPEG psychoacoustic signal analysis model 1 discussed in some detail. Next, filter bank design issues and algorithms are addressed, with a particular emphasis placed on the modified discrete cosine transform, a perfect reconstruction cosine-modulated filter bank that has become of central importance in perceptual audio coding. Then, we review methodologies that achieve perceptually transparent coding of FM- and CD-quality audio signals, including algorithms that manipulate transform components, subband signal decompositions, sinusoidal signal components, and linear prediction parameters, as well as hybrid algorithms that make use of more than one signal model. These discussions concentrate on architectures and applications of those techniques that utilize psychoacoustic models to exploit efficiently masking characteristics of the human receiver. Several algorithms that have become inteational and/or commercial standards receive in-depth treatment, including the ISO/IEC MPEG family (-1, -2, -4), the Lucent Technologies PAC/EPAC/MPAC, the Dolby1 AC-2/AC-3, and the Sony ATRAC/SDDS algorithms. Then, we describe subjective evaluation methodologies in some detail, including the ITU-R BS.1116 recommendation on subjective measurements of small impairments. This paper concludes with a discussion of future research directions.
Introduction.
Generic Perceptual Audio Coding Architecture.
Paper Organization.
Psychoacoustic Principles.
Absolute Threshold of Hearing.
Critical Bands.
Simultaneous Masking, Masking Asymmetry, and the Spread of Masking.
Nonsimultaneous Masking.
Perceptual Entropy.
Example Codec Perceptual Model: ISO 11172-3 (MPEG-1) Psychoacoustic Model 1.
Time-Frequency Analysis: Filter Banks and Transforms.
Filter Banks for Audio Coding: Design Considerations.
Cosine Modulated Pseudo —QMF M-Band Banks.
Cosine Modulated PR M-Band Banks and the MDCT.
Pre-Echo Distortion.
Pre-Echo Control Strategies.
Transform Coders.
Optimum Coding in the Frequency Domain (OCF-1, OCF-2, OCF-3).
Perceptual Transform Coder (PXFM).
Brandenburg–Johnston Hybrid Coder.
CNET Coder.
ASPEC.
DPAC.
DFT Noise Substitution.
DCT with Vector Quantization.
MDCT with Vector Quantization.
Subband Coders.
MASCAM.
MUSICAM.
Wavelet Decompositions.
Adapted Wavelet Packet Decompositions.
Hybrid Harmonic/Wavelet Decompositions.
Signal-Adaptive, Nonuniform Filter Bank (NUFB) Decompositions.
IIR Filter Banks.
Sinusoidal Coders.
Analysis/Synthesis Audio Codec.
Harmonic and Individual Lines Plus Noise Coder.
FM Synthesis.
Hybrid Sinusoidal Coders.
Linear-Prediction-Based Coders.
Multipulse Excitation.
Discrete Wavelet Excitation Coding.
Sinusoidal Excitation Coding.
Frequency Warped LP.
Audio Coding Standards.
ISO/IEC 11172-3 (MPEG-1) and ISO/IEC IS13818-3 (MPEG-2 BC).
ISO/IEC IS13818-7 (MPEG-2 NBC/AAC).
ISO/IEC 14 496-3 (MPEG-4).
Precision Adaptive Subband Coding.
Adaptive Transform Acoustic Coding.
Sony Dynamic Digital Sound (SDDS).
Lucent Technologies Perceptual Audio Coder (PAC), Enhanced PAC (EPAC), and Multichannel PAC (MPAC).
DOLBY AC-2, AC-2A.
Quality Measures for Perceptual Audio Coding.
Subjective Quality Measures.
Confounding Factors in Subjective Evaluations.
Subjective Evaluations of Two-Channel Standardized Codecs.
Subjective Evaluations of 5.1-Channel Standardized Codecs.
Conclusion.
Summary of Applications for Commercial and Inteational Standards.
Summary of Recent Research and Future Research Directions.