# 10/22/2020

### Attending:

* Cary Phillips
* Christina Tempelaar-Lietz
* Eric Enderton
* Joseph Goldstone
* Kimball Thurston
* Larry Gritz
* Nick Porcino
* Owen Thompson
* Peter Hillman
* Steve Parker

### Discussion:

* ASWF project survey: should OpenEXR participate?

  * Larry: we struggle to enumerate what questions to ask.

  * Cary: the generic questions that Larry contributed to the survey
    are interesting to us, but the survey is not likely to reach the
    larger OpenEXR community.

  * Rod: We've asked questions before in email and GitHub Issues and
    not gotten much response, on issues like "should we support
    autotools?"

* GPU acceleration of EXR reading:

  * Steve Parker of NVIDIA, Rod, Kimball met to discuss how to
    efficiently read exrs directly to GPU.

  * Steve: reading compressed files, decompress on the GPU. Not tiled
    and mipmapped already. The right thing to do is read off of disk,
    minimally, only as needed.

  * Steve: The motivation is that compressed files exist.

  * Reading data is fine, all well and good, until you get to the
    point where you need to operate on the data.  Then you need to
    decompress it.  Haven’t touched format conversions.

  * Upshot is, 1 month ago, RTXIO was announced, a combination of
    technologies for decompression and GPU direct storage, CUDA
    technology. For a Windows and DirectX ecosystem.

  * End goal is to have all these pieces available as a part of the
    architecture. Working with Microsoft on the direct storage. Watch
    this space to hear more about it.

  * A lot of the foundations are there in GPU direct storage. For EXR,
    the host would open a file and read its header and offset
    table. This gives the host enough information to deal with
    it. Processing of the remainder of the file would move to GPU
    memory.  This is what we’re in the middle of.

  * Eric: The host reads the header, then never reads the file again.

  * Larry: what’s the programming model? Steve: a combination of a
    memcopy interface.  Posix pread semantics: start at this offset,
    read this number of bytes.

  * Larry: a new library, or built into the CUDA drivers? Steve: the
    bulk is built into the header drivers.
  
  * Larry: what if you don’t have the right kind of network hardware?
    The situation is that we don’t know where people will draw the
    files from, so should just transparently work in the fallback
    case.
  
  * Steve: thelibrary needs to separate the header reading from the file reading.

  * Larry: Tangential thing, we have control over what compression
    methods are supported in the library and used out in the
    world. But if there was different lossless compression, we could
    add a new method.

  * Need two options:

    1. GPU able to handle files that are out there.

    2. We need something that a GPU could process more efficiently.

  * Larry: Nobody wants uncompressed files. 

  * Steve: we’re primarily interested in decompressing. We can
    decompress fast enough on the GPU that it can saturate. So it’s
    faster to move the compressed data.
  
  * Rod: we’re in the process of thinking about changes in the
    library. We should consider them now. We should find a way to
    incorporate them.

  * Steve: the offset table doesn’t really have enough information. If
    you only have it on the host, you have to make some
    assumptions. It would be nice if the offset table has the offset
    and the size.
  
  * Surprise from reading the spec: Some tiles can be uncompressed. If
    some of the tiles are larger compressed, it stores the original
    data.

  * Eric: backwards compatible metadata, could start decompression at
    an offset.

  * Kimball: Should look at encoding some other information about
    compression types in offset tables.

  * Rod: The header requires parsing to find the data you want.  The
    trick at Epic is we’re doing that on the header. And just
    streaming the pixel data to the GPU. Have to go looking for the
    pixel data.  Should be easy to skip to the pixel data quickl, skip
    over attributes you don’t need.

  * Steve: Some numbers: Multiple GB of data/second. One tile per
    microsecond. You have one millisecond to pull a thousand tiles out
    of a file.

  * Rod: getting a sense for what size image, and what frames per
    second. Zip compression is 2:1. 8K latlong, 4K wide.

  * Kimball: the IO core in the writer is built around pread. So that
    should be ready to go, in terms of hooking it into the CUDA IO
    thing.

  * Two sides to the work:

    1. general exr library should be better about it

    2. another addon that knows about the GPU. The OpenEXR library
       should not have GPUisms in it. Has to go somewhere
       else. Ideally the library supports that “somwhere else”

  * Kimball: we have an example at Weta, a custom stream. 

  * Steve: 200 RGB data, 8K image, 2:1 compression, then you could
    read about 200 per second.

  * Rod: At Epic, our experience is getting less than 20.

  * Steve: PCGen4 gives another factor of 4.

  * Should be able to get 25 Gb/second out of them.  We still hae room
    on the GPU to do other things that just read OpenEXR files.  Read
    multiple streams simultaneously. Most of the bottlenecks on are on
    the Windows side.

  * Rod: can read only int data, but OpenEXR is shorts. Have to
    swizzle the data.  We have to reinterpret at as fp16.  Should be
    able to swizzle two shorts simultaneously.

  * Kimball: I hadn’t considered changes to the chunk table. Would
    make the CPU side faster. Would sacrifice some of the corrupt file
    data processing.

  * Peter: Each chunk has the same information in the header. Change
    the chunk table so the chunk headers are duplicate.

  * Peter: A compression format could put padding in it, so the data
    is aligned and easy to skip over.

  * Kimball: I started work on core because of texture access. Don’t
    know that the sparse texture architecture would be. Allow alterate
    ordering in an EXR file?

  * Larry: we depend on the fact that it’s tiled as well as mipmapped.

  * Steve: a huge image, I want to chop out a portion of it. Seek time
    is not a big deal any more.

  * Peter: you can store tiles in any order.

  * Steve: Small optimization, MIP-fail, tiles what are 1x1 or 4x4,
    storing those continuously.

  * Larry: But once you’re using solid state storage, there’s no such
    thing as seek time.

  * Steve: technically there is overhead, but it’s small. But with
    pread, you have only one system call.

* Action items?

  * Kimball: I didn’t finish cleaning up IlmBase out of the RC3
    library. As soon as that’s done, I’ll push up my C library.