6.1.1 Beat-matching
Certainly, there is more to the art of DJ-ing than technical abilities. In this
first application, however, we are essentially interested in the problem of beat-
matching and cross-fading songs as “smoothly” as possible. This is one of
DJs most common practices, and it is relatively simple to explain but harder
to master. The goal is to select songs with similar tempos, and align their
beat over the course of a transition while cross-fading their volumes. The beat
markers, as found in s ec tion 3.5, are obviously particularly relevant features.
The length of a transition is chosen arbitrarily by the user (or the computer),
from no transition to the length of an entire song; or it could be chosen through
the detection of salient changes of structural attributes; however, this is not
currently implemented. We extend the beat-matching principle to downbeat
matching by making sure that downbeats align as well. In our application, the
location of a transition is chosen by selecting the most similar rhythmic pattern
between the two songs as in section 4.6.3. The analysis may be restricted to
finding the best match between specific sections of the songs (e.g., the last 30
seconds of song 1 and the first 30 seconds of song 2).
To ensure perfect match over the course of long transitions, DJs typically ad-
just the playback speed of the music through specialized mechanisms, such
as a “relative-speed” controller on a turntable (specified as a relative posi-
tive/negative percentage of the original speed). Digitally, a similar effect is
implemented by “sampling-rate conversion” of the audio signal [154]. The pro-
cedure, however, distorts the perceptual quality of the music by detuning the
whole sound. For correcting this artifact, we implement a time-scaling algo-
rithm that is capable of speeding up or slowing down the music without affecting
the pitch.
6.1.2 Time-scaling
There are three main classes of audio time-scaling (or time-stretching): 1) the
time-domain approach, which involves overlapping and adding small windowed
fragments of the waveform; 2) the frequency-domain approach, which is typi-
cally accomplished through phase-vocoding [40]; and 3) the signal-modeling ap-
proach, which consists of changing the rate of a parametric s ignal description,
including deterministic and stochastic parameters. A review of these meth-
ods can be found in [16], and implementations for polyphonic music include
[15][94][102][43].
We have experimented with both the time-domain and frequency-domain meth-
ods, with certain original properties to them. For instance, it is suggested in
[16] to preserve transients unprocessed in order to reduce artifacts, due to the
granularity effect of windowing. While the technique has previously been used
with the phase vocoder, we apply it to our time-domain algorithm as well. The
98 CHAPTER 6. COMPOSING WITH SOUNDS