
78 A Automated Search Tree Generation
Algorithm (see [9] for discussion and references) learns
any class consisting of m Boolean functions with mistake
bound log
2
m, and would thus provide an attribute-effi-
cient algorithm for such a class C. However, the running
time would not be polynomial. Another serious drawback
would be that the Halving Algorithm does not tolerate any
noise. Interestingly, a multiplicative update similar to (1)
has been used in Littlestone and Warmuth’s Weighted
Majority Algorithm [10], and also Vovk’s Aggregating Al-
gorithm [14], to produce a noise-tolerant generalization of
the Halving Algorithm.
Attribute-efficient learning has also been studied in
other learning models than the mistake bound model, such
as Probably Approximately Correct learning [4], learning
with uniform distribution [12], and learning with mem-
bership queries [3]. The idea has been further developed
into learning with a potentially infinite number of at-
tributes [2].
Applications
Attribute-efficient algorithms for simple function classes
have a potentially interesting application as a component
in learning more complex function classes. For exam-
ple, any monotone k-term DNF formula over variables
x
1
,:::,x
n
can be represented as a monotone k-literal dis-
junction over 2
n
variables z
A
,wherez
A
=
Q
i2A
x
i
for A
f
1;:::;n
g
is defined. Running Winnow with the trans-
formed inputs z 2
f
0; 1
g
2
n
would give a mistake bound
O(k log 2
n
)=O(kn). Unfortunately the running time
would be linear in 2
n
, at least for a naive implementa-
tion. Khardon et al. [6] provide discouraging computa-
tional hardness results for this potential application.
Online learning algorithms have a natural application
domain in signal processing. In this setting, the sender
emits a true signal y
t
at time t,fort =1; 2; 3;:::.Atsome
later time (t + d), a receiver receives a signal z
t
,whichis
a sum of the original signal y
t
and various echoes of earlier
signals y
t
0
, t
0
< t, all distorted by random noise. The task
is to recover the true signal y
t
based on received signals
z
t
; z
t1
;:::; z
tl
over some time window l. Currently at-
tribute-efficient algorithms are not used for such tasks, but
see [11] for preliminary results.
Attribute-efficient learning algorithms are similar in
spirit to statistical methods that find sparse models. In par-
ticular, statistical algorithms that use L
1
regularization are
closely related to multiplicative algorithms such as Win-
now and Exponentiated Gradient. In contrast, more clas-
sical L
2
regularization leads to algorithms that are not at-
tribute-efficient [13].
Cross References
Boosting Textual Compression
Learning DNF Formulas
Recommended Reading
1. Auer, P., Warmuth, M.K.: Tracking the best disjunction. Mach.
Learn. 32(2), 127–150 (1998)
2. Blum, A., Hellerstein, L., Littlestone, N.: Learning in the pres-
ence of finitely or infinitely many irrelevant attributes. J. Comp.
Syst. Sci. 50(1), 32–40 (1995)
3. Bshouty, N., Hellerstein, L.: Attribute-efficient learning in query
and mistake-bound models. J. Comp. Syst. Sci. 56(3), 310–319
(1998)
4. Dhagat, A., Hellerstein, L.: PAC learning with irrelevant at-
tributes. In: Proceedings of the 35th Annual Symposium on
Foundations of Computer Science, Santa Fe, pp 64–74. IEEE
Computer Society, Los Alamitos (1994)
5. Gentile, C., Warmuth, M.K.: Linear hinge loss and average mar-
gin. In: Kearns, M.J., Solla, S.A., Cohn, D.A. (eds.) Advances in
neural information processing systems 11, p. 225–231. MIT
Press, Cambridge (1999)
6. Khardon, R., Roth, D., Servedio, R.A.: Efficiency versus conver-
gence of boolean kernels for on-line learning algorithms. J. Ar-
tif. Intell. Res. 24, 341–356 (2005)
7. Kivinen, J., Warmuth, M.K.: Exponentiated gradient versus gra-
dient descent for linear predictors. Inf. Comp. 132(1), 1–64
(1997)
8. Klivans, A.R. Servedio, R.A.: Toward attribute efficient learning
of decision lists and parities. J. Mach. Learn. Res. 7(Apr), 587–
602 (2006)
9. Littlestone, N.: Learning quickly when irrelevant attributes
abound: A new linear threshold algorithm. Mach. Learn. 2(4),
285–318 (1988)
10. Littlestone, N., Warmuth, M.K.: The weighted majority algo-
rithm. Inf. Comp. 108(2), 212–261 (1994)
11. Martin, R.K., Sethares, W.A., Williamson, R.C., Johnson, Jr., C.R.:
Exploiting sparsity in adaptive filters. IEEE Trans. Signal Pro-
cess. 50(8), 1883–1894 (2002)
12. Mossel, E., O’Donnell, R., Servedio, R.A.: Learning functions of
k relevant variables. J. Comp. Syst. Sci. 69(3), 421–434 (2004)
13. Ng, A.Y.: Feature selection, L
1
vs. L
2
regularization, and rota-
tional invariance. In: Greiner, R., Schuurmans, D. (eds.) Proceed-
ings of the 21st International Conference on Machine Learn-
ing, pp 615–622. The International Machine Learning Society,
Princeton (2004)
14. Vovk,V.:Aggregatingstrategies.In:Fulk,M.,Case,J.(eds.)
Proceedings of the 3rd Annual Workshop on Computational
Learning Theory, p. 371–383. Morgan Kaufmann, San Mateo
(1990)
Automated Search Tree Generation
2004; Gramm, Guo, Hüffner, Niedermeier
FALK HÜFFNER
Department of Math and Computer Science,
University of Jena, Jena, Germany