3.5.1 Comparative models
Beat induction models can be categorized by their general approach: top-down
(rule- or knowledge-based), or bottom-up (signal processing). Early techniques
usually operate on quantized and symbolic representations of the signal, for
instance after an onset detection stage. A set of heuristic and gestalt rules
(based on accent, proximity, and grouping) is applied to infer the underly-
ing metrical structure [99][37][159][45]. More recently, the trend has been on
signal-processing approaches. The scheme typically starts with a front-end
subband analysis of the signal, traditionally using a filter bank [165][141][4]
or a discrete Fourier Transform [59][96][91]. Then, a periodicity estimation
algorithm—including oscillators [141], histograms [39], autocorrelations [63], or
probabilistic methods [95]—finds the rate at which signal events occur in con-
current channels. Finally, an integration procedure combines all channels into
the final beat estimation. Goto’s multiple-agent strategy [61] (also used by
Dixon [38][39]) combines heuristics and correlation techniques together, includ-
ing a chord change detector and a drum pattern detector. Klapuri’s Bayesian
probabilistic method applied on top of Scheirer’s bank of resonators determines
the best metrical hypothesis with constraints on continuity over time [92]. Both
approaches stand out for their c oncern with explaining a hierarchical organiza-
tion of the meter (section 4.6).
3.5.2 Our approach
A causal and bottom-up beat tracker based on our front-end auditory spec-
trogram (25 bands) and Scheirer’s bank of resonators [141] is develop ed. It
assumes no prior knowledge, and includes a confidence value, which accounts
for the presence of a beat in the music. The range 60–240 BPM is logarithmi-
cally distributed to a large bank of comb filters, whose properties are to resonate
at a given tempo. The filters are tested on multiple frequency channels of the
auditory spectrogram simultaneously, and are tuned to fade out within sec onds,
as a way to model short-term memory. At any given time, their internal en-
ergy can be summed across channels by tempo class, which results in a tempo
spectrum as depicted in Figure 3-14 (bottom). Yet, one of the main drawbacks
of the model is its unreliable tempo-peak selection mechanism. A few peaks
of the spectrum may give a plausible answer, and choosing the highest is not
necessarily the best, or most stable strategy. A template mechanism is used
to favor the extraction of the fastest tempo in case of ambiguity
1
. Section 5.3,
however, introduces a bias-free method that can overcome this stability issue
through top-down feedback control.
Figure 3-14 shows an example of beat tracking a polyphonic jazz-fusion piece at
supposedly 143 BPM. A tempogram (middle pane) displays the tempo knowl-
edge gained over the course of the analysis. It starts with no knowledge, but
slowly the tempo space emerges . Note in the top pane that beat tracking was
1
It is always possible to down-sample by a tempo oct ave if necessary.
56 CHAPTER 3. MUSIC LISTENING