TODO (Mar 2024): set/get cuda archictures CUDA PreJIT kernels GB_cuda_matrix_advise: write it dot3: allow iso use a stream pool (from RMM) can rmm_wrap be thread safe? # of threadblocks in reduce reduce calls GB_enumify_reduce twice set/get which GPU(s) to use data types > 32 bytes handling nvcc compiler errors static device function for computing ks (acts like GB_ek_slice, so call it GB_ek_slice_device -------------------------------------------------------------------------------- all the FIXMEs clean up comments and code style hexadecimal stream pool test complex reduce: do any monoid terminal condition? ANY monoid in mxm full test suite when to use the GPU? which GPU? See the new GxB_Context object rmm_init -------------------------------------------------------------------------------- future: cuda: needs source directory (1) environment var set. If so, use it. (3) not found so no cuda jit all of GrB_mxm? (1) DOT dot2: C<#> = A'*B C is bitmap or full dot3: C=A'B C is sparse empty, M is sparse/hyper dot4: C += A'B C is full (+ same as semiring monoid) (2) colscale C = A*D (3) rowscale C = D*B (4) SAXPY saxpy3 C<#> = A*B C is sparse or hyper saxpy4 C += A*B C is full, A hyper or sparse B is full or bitmap saxpy5 C += A*B C is full, B hyper or sparse A is full or bitmap bitmap C<#> = A*B C bitmap NO bitmap C<#> += A*B C bitmap NO outer C<#> = AB' C sparse? full? bitmap? GrB_select?