
774 R Renaming
function approximation,
V
t
(s;)=
T
X
i=1
s
(i)
t
(i) ;
where each state has a set of vector features,
s
.Afeature
based function approximation was analyzed and demon-
strated in [2,15].Themaingoalhereisdesigningalgo-
rithm which converge to almost optimal polices under re-
alistic assumptions.
Factored Markov Decision Process: In a FMDP the set
of states is described via a set of random variables X =
fX
1
;:::;X
n
g,whereeachX
i
takesvaluesinsomefinite
domain Dom(X
i
). A state s defines a value x
i
2 Dom(X
i
)
for each variable X
i
. The transition model is encoded us-
ing a dynamic Bayesian network. Although the represen-
tation is efficient, not only is finding an "-optimal policy
intractable [8], but it cannot be represented succinctly [1].
However, under few assumptions on the FMDP structure
thereexistsalgorithmssuchas[5] that have both theoreti-
cal guarantees and nice empirical results.
Cross References
Attribute-Efficient Learning
Learning Automata
Learning Constant-Depth Circuits
Mobile Agents and Exploration
PAC Learning
Recommended Reading
1. Allender,E.,Arora,S.,Kearns,M.,Moore,C.,Russell,A.:Note
on the representational incompatabilty of function approxi-
mation and factored dynamics. In: Advances in Neural Infor-
mation Processing Systems 15, 2002
2. Bertsekas, D.P., Tsitsiklis, J. N.: Neuro-Dynamic Programming.
Athena Scientific, Belmont (1996)
3. Brafman, R., Tennenholtz, M.: R-max – a general polyno-
mial time algorithm for near optimal reinforcement learning.
J.Mach.Learn.Res.3, 213–231 (2002)
4. Even-Dar, E., Mansour, Y.: Learning rates for Q-learning.
J.Mach.Learn.Res.5, 1–25 (2003)
5. Guestrin, C., Koller, D., Parr, R., Venkataraman, S.: Efficient solu-
tion algorithms for factored mdps. J. Artif. Intell. Res. 19, 399–
468 (2003)
6. Kakade, S.: On the Sample Complexity of Reinforcement Learn-
ing. Ph. D. thesis, University College London (2003)
7. Kearns, M., Singh, S.: Near-optimal reinforcement learning in
polynomial time. Mach. Learn. 49(2–3), 209–232 (2002)
8. Lusena, C., Goldsmith, J., Mundhenk, M.: Nonapproximabil-
ity results for partially observable markov decision processes.
J. Artif. Intell. Res. 14, 83–103 (2001)
9. Ng, A.Y., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B.,
Berger, E., Liang, E.:Inverted autonomous helicopter flight via
reinforcement learning. In: International Symposium on Exper-
imental Robotics, 2004
10. Papadimitriu, C.H., Tsitsiklis, J.N.: The complexity of markov
decision processes. In: Mathematics of Operations Research,
1987, pp. 441–450.
11. Puterman, M.: Markov Decision Processes. Wiley-Interscience,
New York (1994)
12. Sutton, R.: Learning to predict by the methods of temporal dif-
ferences. Mach. Learn. 3, 9–44 (1988)
13. Sutton, R., Barto, A.: Reinforcement Learning. An Introduction.
MIT Press, Cambridge (1998)
14. Tesauro, G.J.: TD-gammon, a self-teaching backgammon pro-
gram, achieves a master-level play. Neural Comput. 6, 215–219
(1996)
15. Tsitsiklis, J.N., Van Roy, B.: Feature-based methods for large
scale dynamic programming. Mach. Learn. 22, 59–94 (1996)
16. Watkins, C.: Learning from Delayed Rewards. Ph. D. thesis,
Cambridge University (1989)
17. Watkins, C., Dyan, P.: Q-learning. Mach. Learn. 8(3/4), 279–292
(1992)
Renaming
1990; Attiya, Bar-Noy, Dolev, Peleg, Reischuk
MAURICE HERLIHY
Department of Computer Science, Brown University,
Providence, RI, USA
Keywords and Synonyms
Wait-free renaming
Problem Definition
Consider a system in which n + 1 processes P
0
;:::;P
n
communicate either by message-passing or by reading
and writing a shared memory. Processes are asynchronous:
there is no upper or lower bounds on their speeds, and up
to t of them may fail undetectably by halting. In the renam-
ing task proposed by Attiya, Bar-Noy, Dolev, Peleg, and
Reischuk [1], each process is given a unique input name
taken from a range 0;:::;N, and chooses a unique output
name taken from a strictly smaller range 0;:::;K.Torule
out trivial solutions, a process’s decision function must de-
pend only on input names, not its preassigned identifier
(so that P
i
cannot simply choose output name i). Attiya et
al. showed that the task has no solution when K = n,but
does have a solution when K = N + t. In 1993, Herlihy
and Shavit [2] showed that the task has no solution when
K < N + t.
Vertexes, simplexes, and complexes model decision
tasks. (See the companion article entitled Topology Ap-
proach in Distributed Computing). A process’s state at the
start or end of a task is represented as a vertex
E
v labeled