@@ -100,10 +100,11 @@ comes only in Fortran. It's also slow; for example,
100100its matrix-matrix multiply routine uses nearly the same triply nested
101101loops that a naïve developer would write. The intent of the BLAS is
102102that users who care about performance find optimized implementations,
103- either by hardware vendors or by projects like ATLAS (Whaley et
104- al. 2001), the
103+ either by hardware vendors or by projects like
104+ [ ATLAS ] ( http://math-atlas.sourceforge.net/ ) (see also Whaley et al. 2001),
105105[ GotoBLAS] ( https://www.tacc.utexas.edu/research-development/tacc-software/gotoblas2 ) ,
106- or [ OpenBLAS] ( http://www.openblas.net ) .
106+ [ OpenBLAS] ( https://github.com/xianyi/OpenBLAS ) ,
107+ or [ BLIS] ( https://github.com/flame/blis ) .
107108
108109Suppose that our developer has found an optimized implementation of
109110the BLAS, and they want to call some of its routines from C++. Here
@@ -1494,16 +1495,16 @@ Thanks to Damien Lebrun-Grandie for reviewing Revision 1 changes.
14941495 A Portable, High-Performance, ANSI C Coding Methodology and its
14951496 application to Matrix Multiply," LAPACK Working Note 111, 1996.
14961497
1497- * K. Goto and R. A. van de Geijn, "Anatomy of high-performance matrix
1498- multiplication", ACM Transactions of Mathematical Software (TOMS),
1499- Vol. 34, No. 3, May 2008.
1498+ * K. Goto and R. A. van de Geijn,
1499+ [ "Anatomy of high-performance matrix multiplication"] ( https://doi.org/10.1145/1356052.1356053 ) ,
1500+ * ACM Transactions of Mathematical Software* (TOMS),
1501+ Vol. 34, No. 3, May 2008. See also
15001502
15011503* M. Hoemmen, D. Hollman, C. Trott, D. Sunderland, N. Liber, A. Klinvex,
15021504 Li-Ta Lo, D. Lebrun-Grandie, G. Lopez, P. Caday, S. Knepper, P. Luszczek,
15031505 and T. Costa,
15041506 "A free function linear algebra interface based on the BLAS,"
1505- P1673R6,
1506- Dec. 2021.
1507+ P1673R7, Apr. 2022.
15071508
15081509* C. Trott, D. Hollman, M. Hoemmen, and D. Sunderland,
15091510 "` mdarray ` : An Owning Multidimensional Array Analog of ` mdspan ` ",
@@ -1521,14 +1522,18 @@ Thanks to Damien Lebrun-Grandie for reviewing Revision 1 changes.
15211522
15221523* J. Siek and A. Lumsdaine, "The Matrix Template Library: A Generic
15231524 Programming Approach to High Performance Numerical Linear Algebra,"
1524- in proceedings of the Second International Symposium on Computing in
1525+ in Proceedings of the Second International Symposium on Computing in
15251526 Object-Oriented Parallel Environments (ISCOPE) 1998, Santa Fe, NM,
15261527 USA, Dec. 1998.
1528+
1529+ * F. G. Van Zee and R. A. van de Geijn,
1530+ [ "BLIS: A Framework for Rapidly Instantiating BLAS Functionality,"] ( https://doi.org/10.1145/2764454 ) ,
1531+ * ACM Transactions on Mathematical Software* (TOMS), Vol. 41, No. 3, June 2015.
15271532
15281533* R. Vuduc, "Automatic performance tuning of sparse matrix kernels,"
15291534 PhD dissertation, Electrical Engineering and Computer Science,
15301535 University of California Berkeley, 2004.
15311536
15321537* R. C. Whaley, A. Petitet, and J. Dongarra, "Automated Empirical
1533- Optimization of Software and the ATLAS Project," Parallel Computing,
1538+ Optimization of Software and the ATLAS Project," * Parallel Computing* ,
15341539 Vol. 27, No. 1-2, Jan. 2001, pp. 3-35.
0 commit comments