Skip to content

Commit ca1ed29

Browse files
authored
P1674: Changes based on PR suggestions for P1673
PR ORNL#228 by Jeff Hammond suggests changes to P1673. Some of those can be applied to P1674 as well. This PR does that.
1 parent eaae29c commit ca1ed29

1 file changed

Lines changed: 15 additions & 10 deletions

File tree

D1674/evolving-from-blas.md

Lines changed: 15 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -100,10 +100,11 @@ comes only in Fortran. It's also slow; for example,
100100
its matrix-matrix multiply routine uses nearly the same triply nested
101101
loops that a naïve developer would write. The intent of the BLAS is
102102
that users who care about performance find optimized implementations,
103-
either by hardware vendors or by projects like ATLAS (Whaley et
104-
al. 2001), the
103+
either by hardware vendors or by projects like
104+
[ATLAS](http://math-atlas.sourceforge.net/) (see also Whaley et al. 2001),
105105
[GotoBLAS](https://www.tacc.utexas.edu/research-development/tacc-software/gotoblas2),
106-
or [OpenBLAS](http://www.openblas.net).
106+
[OpenBLAS](https://github.com/xianyi/OpenBLAS),
107+
or [BLIS](https://github.com/flame/blis).
107108

108109
Suppose that our developer has found an optimized implementation of
109110
the BLAS, and they want to call some of its routines from C++. Here
@@ -1494,16 +1495,16 @@ Thanks to Damien Lebrun-Grandie for reviewing Revision 1 changes.
14941495
A Portable, High-Performance, ANSI C Coding Methodology and its
14951496
application to Matrix Multiply," LAPACK Working Note 111, 1996.
14961497

1497-
* K. Goto and R. A. van de Geijn, "Anatomy of high-performance matrix
1498-
multiplication", ACM Transactions of Mathematical Software (TOMS),
1499-
Vol. 34, No. 3, May 2008.
1498+
* K. Goto and R. A. van de Geijn,
1499+
["Anatomy of high-performance matrix multiplication"](https://doi.org/10.1145/1356052.1356053),
1500+
*ACM Transactions of Mathematical Software* (TOMS),
1501+
Vol. 34, No. 3, May 2008. See also
15001502

15011503
* M. Hoemmen, D. Hollman, C. Trott, D. Sunderland, N. Liber, A. Klinvex,
15021504
Li-Ta Lo, D. Lebrun-Grandie, G. Lopez, P. Caday, S. Knepper, P. Luszczek,
15031505
and T. Costa,
15041506
"A free function linear algebra interface based on the BLAS,"
1505-
P1673R6,
1506-
Dec. 2021.
1507+
P1673R7, Apr. 2022.
15071508

15081509
* C. Trott, D. Hollman, M. Hoemmen, and D. Sunderland,
15091510
"`mdarray`: An Owning Multidimensional Array Analog of `mdspan`",
@@ -1521,14 +1522,18 @@ Thanks to Damien Lebrun-Grandie for reviewing Revision 1 changes.
15211522

15221523
* J. Siek and A. Lumsdaine, "The Matrix Template Library: A Generic
15231524
Programming Approach to High Performance Numerical Linear Algebra,"
1524-
in proceedings of the Second International Symposium on Computing in
1525+
in Proceedings of the Second International Symposium on Computing in
15251526
Object-Oriented Parallel Environments (ISCOPE) 1998, Santa Fe, NM,
15261527
USA, Dec. 1998.
1528+
1529+
* F. G. Van Zee and R. A. van de Geijn,
1530+
["BLIS: A Framework for Rapidly Instantiating BLAS Functionality,"](https://doi.org/10.1145/2764454),
1531+
*ACM Transactions on Mathematical Software* (TOMS), Vol. 41, No. 3, June 2015.
15271532

15281533
* R. Vuduc, "Automatic performance tuning of sparse matrix kernels,"
15291534
PhD dissertation, Electrical Engineering and Computer Science,
15301535
University of California Berkeley, 2004.
15311536

15321537
* R. C. Whaley, A. Petitet, and J. Dongarra, "Automated Empirical
1533-
Optimization of Software and the ATLAS Project," Parallel Computing,
1538+
Optimization of Software and the ATLAS Project," *Parallel Computing*,
15341539
Vol. 27, No. 1-2, Jan. 2001, pp. 3-35.

0 commit comments

Comments
 (0)