314 Bibliography
[O58] M. Wittmann and G. Hager. A proof of concept for optimizing task paral-
lelism by locality queues.
http://arxiv.org/abs/0902.1884
[O59] G. Hager, F. Deserno and G. Wellein. Pseudo-vectorization and RISC op-
timization techniques for the Hitachi SR8000 architecture. In: S. Wagner
et al. (eds.), High Performance Computing in Science and Engineering Mu-
nich 2002 (Springer-Verlag, Berlin, Heidelberg), 425–442.
[O60] D. Barkai and A. Brandt. Vectorized multigrid poisson solver for the CDC
Cyber 205. Applied Mathematics and Computation 13, (1983) 217–227.
[O61] M. Kowarschik. Data Locality Optimizations for Iterative Numerical Algo-
rithms and Cellular Automata on Hierarchical Memory Architectures (SCS
Publishing House), 2004. ISBN 3-936150-39-7.
[O62] K. Datta, S. Kamil, S. Williams, L. Oliker, J. Shalf and K. Yelick. Op-
timization and performance modeling of stencil computations on modern
microprocessors. SIAM Review 51, (2009) 129–159.
[O63] J. Treibig, G. Wellein and G. Hager. Efficient multicore-aware paralleliza-
tion strategies for iterative stencil computations. Submitted.
http://arxiv.org/abs/1004.1741
[O64] M. Müller. Some simple OpenMP optimization techniques. In: OpenMP
Shared Memory Parallel Programming: International Workshop on
OpenMP Applications and Tools, WOMPAT 2001, West Lafayette, IN, USA,
July 30-31, 2001: Proceedings. 31–39.
[O65] G. Hager, T. Zeiser and G. Wellein. Data access optimizations for highly
threaded multi-core CPUs with multiple memory controllers. In: Workshop
on Large-Scale Parallel Processing 2008 (IPDPS2008), Miami, FL, April
18, 2008.
http://arxiv.org/abs/0712.2302
[O66] S. Williams, L. Oliker, R. W. Vuduc, J. Shalf, K. A. Yelick and J. Demmel.
Optimization of sparse matrix-vector multiplication on emerging multicore
platforms. Parallel Computing 35(3), (2009) 178–194.
[O67] C. Terboven, D. an Mey, D. Schmidl, H. Jin and T. Reichstein. Data and
thread affinity in OpenMP programs. In: MAW ’08: Proceedings of the 2008
workshop on Memory access on future processors (ACM, New York, NY,
USA). ISBN 978-1-60558-091-3, 377–384.
[O68] B. Chapman, F. Bregier, A. Patil and A. Prabhakar. Achieving performance
under OpenMP on ccNUMA and software distributed shared memory sys-
tems. Concurrency Comput.: Pract. Exper. 14, (2002) 713–739.