# RFC 0001-Common Encryption API

- Feature Name: Common Encryption API
- Author(s): Eric Berry
- Start Date: 2021-06-08
- RFC PR: 0001
- Leader(s): Eric Berry

## Introduction

SecApi needs a mechanism to natively support Common Encryption.

## Motivation / use-cases

SecApi 2 has been used for many years to provide cryptographic operations for DRM libraries on RDK.
SecApi 2 does not natively support ISO/IEC 23001-7 Common encryption. The library attempted to add
the SecCipher_ProcessCtrWithDataShift and SecCipher_ProcessCtrWithOpaqueDataShift to support Common
Encryption, but they were never implemented correctly by any SOC vendor and therefore have not been
actively used by DRM libraries. DRM libraries have adopted a mechanism of gathering all of the
scattered encrypted data into a decryption buffer, decrypting it, then scattering the decrypted data
back into its original location. This functionality works, but the multiple copy operations can
cause slow decryption times on some platforms.

The SecApi 2 library has become complex and the need was identified to create SecApi 3 to simplify
the cryptographic interface. The design of SecApi 3 has been to keep the cryptographic interface as
simple and streamlined as possible so that it could be a general, all-purpose cryptographic library.
The decision was made to continue to use the gather, decrypt, scatter design for Common Encryption
that was used in SecApi 2 to keep its design simple and general purpose. This unfortunately brings
the same inefficient Common Encryption design into SecApi 3.

After much discussion, a desire was expressed to natively support Common Encryption with SecApi 3
even though this would be a specialized cryptographic function.

*ISO/IEC 23001-7 Common encryption in ISO base media file format files* identifies how content is
divided into samples, and sub-samples with clear and encrypted blocks using the CENC (CTR mode),
CENS (CTR mode with pattern encryption), CBC1 (CBC mode), and CBCS (CBC mode with pattern
encryption) encryption modes. This design takes the definitions and fields defined in that
specification and builds an API that can be used to implement the specification.

A comparison was made with the Widevine interface that implements common encryption, shown
below in Use Case 3, as well as the PlayReady interface that implements common encryption, also
shown below in Use Case 4, and this design was verified to be compatible with both of those
implementations.

## Updates/Obsoletes

None

## Affected platforms

All

## Open Source Dependencies

None

## Detailed design

**sa_subsample_length Structure**

```c
typedef struct {
    size_t bytes_of_clear_data;
    size_t bytes_of_protected_data;
} sa_subsample_length;
```

This structure gives the length definition of a subsample. A subsample usually contains a video or
audio NAL (Network Abstraction Layer) unit (or alternatively a NAL unit could be encoded into more
than one subsample). Subsamples are divided into two sections: a clear data section, followed by a
protected data section.

+ `bytes_of_clear_data` identifies the length of the clear data section.
+ `bytes_of_protected_data` identifies the length of the protected data section.

Either one of these values can be 0, but not both.

**sa_sample Structure**

```c
typedef enum {
  SA_BUFFER_TYPE_CLEAR = 0,
  SA_BUFFER_TYPE_SVP
} sa_buffer_type;

typedef struct {
  sa_buffer_type buffer_type;
  
  union {
    struct {
      void* buffer;
      size_t length;
      size_t offset;
    } clear;
    struct {
      sa_svp_buffer buffer;
      size_t offset;
    } svp;
  } context;
} sa_buffer;

typedef struct {
    void* iv;
    size_t iv_length;
    size_t crypt_byte_block;
    size_t skip_byte_block;
    size_t subsample_count;
    sa_subsample_length *subsample_lengths;
    sa_crypto_cipher_context context;
    sa_buffer* out;
    sa_buffer* in;
} sa_sample;
```

+ `sa_sample` provides the definition of a sample.
+ `iv` identifies the IV to used to decrypt the sample. When using CBCS mode, the IV is used for
  the encryption of each subsample. In all other modes, the IV is used starting at the encryption of
  the first subsample only.
+ `iv_length` the length of the `iv` field.
+ `crypt_byte_block` in CENS mode and CBCS mode the protected data is only partially encrypted.
  This field is non-zero and identifies the number of 16 byte blocks that are encrypted. The
  following field identifies the number of 16 bytes block that are skipped. This pattern repeats
  until the entire protected block is used. Any remaining block less than 16 bytes is unencrypted.
  Setting this field to 0 indicates CENC or CBC1 mode and that the entire protected data section is
  encrypted.
+ `skip_byte_block` identifies the number of 16 byte blocks to skip in a protected data section.
  In CENC and CBC1 mode, the entire protected data section is encrypted and this field must be set
  to 0. In CENS and CBCS mode, audio tracks can be fully encrypted, so this field should be set to
  0 for those cases.
+ `subsample_count` identifies the number of subsamples in this sample.
+ `subsample_lengths` is the array of subsample_lengths.
+ `context` contains a cipher context that was initialized with either a SA_CIPHER_ALGORITHM_AES_CTR
  algorithm for CENC or CENS mode or SA_CIPHER_ALGORITHM_AES_CBC for CBC1 or CBCS mode. The context
  was also initialized with the decryption key and for decrypt mode. Trying to use these functions
  with any other encryption algorithms or encrypt mode will result in an error.
+ `out.buffer_type` identifies whether the out buffer is svp or clear.
+ `out.context.svp.buffer` or `out.context.clear.buffer` is the output buffer in which to put the
  decrypted data. Widevine specifies an output buffer per sample, where PlayReady has one output
  buffer for all samples. This design follows the Widevine model because it can support PlayReady,
  but the reverse is not possible.
+ `out.context.clear.length` identifies the length of the output clear buffer.
+ `out.context.svp.offset` or `out.context.clear.offset` identifies the offset into the buffer.
+ `in.buffer_type` identifies whether the in buffer is svp or clear.
+ `in.context.svp.buffer` or `in.context.clear.buffer` buffer from which the encrypted data is
  retrieved. Widevine specifies an input buffer per sample, where PlayReady has one input buffer
  for all samples. This design follows the Widevine model because it can support PlayReady, but
  the reverse is not possible.
+ `in.context.clear.length` identifies the length of the input clear buffer.
+ `in.context.svp.offset` or `in.context.clear.offset` identifies the offset into the SVP buffer.

** Process Common Encryption Function**

```c
sa_status sa_cipher_process_common_encryption(
    size_t samples_length,
    sa_sample *samples);
```

This function is called to decrypt an array of samples either in SVP buffers or in clear buffers.

+ `samples_length` identifies to number of samples in the array.
+ `samples` contains an array of samples to decrypt.

### Use Case 1 - CENC Scheme: Audio Track Sample Followed by Video Track Sample

+ Audio has two subsamples.
+ Video has three subsamples and the third subsample has no clear data.

```
samples_length: 2
sample[0]: // Audio sample no SVP
  iv: 112233445566778899AABBCCDDEEFF00
  iv_length: 16
  crypt_byte_block: 0
  skip_byte_block: 0
  subsample_count: 2
  subsample_lengths[0]:
    bytes_of_clear_data: 50
    bytes_of_protected_data: 100
  subsample_lengths[1]:
    bytes_of_clear_data: 50
    bytes_of_protected_data: 100 // Note that CENC continues block from subsample[0] protected data
  context: CTR algorithm, decrypt mode, decryption key
  out.buffer_type: SA_BUFFER_TYPE_CLEAR
  out.context.clear:
    buffer: *********
    length: 300
    offset: 0
  in.buffer_type: SA_BUFFER_TYPE_CLEAR
  in.context.clear:
    buffer: **********
    length: 300
    offset: 0
sample[1]: // Video sample with SVP
  iv: 112233445566778899AABBCCDDEEFF00
  iv_length: 16
  crypt_byte_block: 0
  skip_byte_block: 0
  subsample_count: 3
  subsample_lengths[0]:
    bytes_of_clear_data: 50
    bytes_of_protected_data: 200
  subsample_lengths[1]:
    bytes_of_clear_data: 50
    bytes_of_protected_data: 200 // Note that CENC continues block from subsample[0] protected data
  subsample_lengths[2]:
    bytes_of_clear_data: 0
    bytes_of_protected_data: 200 // Note that CENC continues block from subsample[1] protected data
  context: CTR algorithm, decrypt mode, decryption key
  out.buffer_type: SA_BUFFER_TYPE_SVP
  out.context.svp:
    buffer: *********
    offset: 0
  in.buffer_type: SA_BUFFER_TYPE_CLEAR
  in.context.clear:
    buffer: **********
    length: 700
    offset: 0
```

### Use Case 2 - CBCS Scheme: Audio Track Sample Followed by Video Track Sample

+ Audio has two subsamples that are fully encrypted.
+ Video has three subsamples with 1:9 pattern encryption and the third subsample has no clear data.

```
samples_length: 2
sample[0]: // Audio sample no SVP
  iv: 112233445566778899AABBCCDDEEFF00
  iv_length: 16
  crypt_byte_block: 1
  skip_byte_block: 0
  subsample_count: 2
  subsample_lengths[0]:
    bytes_of_clear_data: 50
    bytes_of_protected_data: 100
  subsample_lengths[1]:
    bytes_of_clear_data: 50
    bytes_of_protected_data: 100 // Note that CBCS resets the IV with the new subsample
  context: CBC algorithm, decrypt mode, decryption key
  out.buffer_type: SA_BUFFER_TYPE_CLEAR
  out.context.clear:
    buffer: *********
    length: 300
    offset: 0
  in.buffer_type: SA_BUFFER_TYPE_CLEAR
  in.context.clear:
    buffer: **********
    length: 300
    offset: 0
sample[1]: // Video sample with SVP
  iv: 112233445566778899AABBCCDDEEFF00
  iv_length: 16
  crypt_byte_block: 1
  skip_byte_block: 9
  subsample_count: 3
  subsample_lengths[0]:
    bytes_of_clear_data: 50
    bytes_of_protected_data: 200
  subsample_lengths[1]:
    bytes_of_clear_data: 50
    bytes_of_protected_data: 200 // Note that CBCS resets the IV with the new subsample
  subsample_lengths[2]:
    bytes_of_clear_data: 0
    bytes_of_protected_data: 200 // Note that CBCS resets the IV with the new subsample
  context: CBC algorithm, decrypt mode, decryption key
  out.buffer_type: SA_BUFFER_TYPE_SVP
  out.context.svp:
    buffer: *********
    offset: 0
  in.buffer_type: SA_BUFFER_TYPE_CLEAR
  in.context.clear:
    buffer: **********
    length: 700
    offset: 0
```

### Use Case 3 - Mapping Widevine data structure

This use case describes how to map the Widevine data structure into a call to 
`sa_cipher_process_common_encryption`.  The comments listed after the parameters identify which
Widevine field to use.

**Widevine Common Encryption Interface**

```c
typedef enum OEMCryptoBufferType {
    OEMCrypto_BufferType_Clear,
    OEMCrypto_BufferType_Secure,
    OEMCrypto_BufferType_Direct
} OEMCryptoBufferType;

typedef struct {
    OEMCryptoBufferType type;
    union {
        struct {  // type == OEMCrypto_BufferType_Clear
            OEMCrypto_SharedMemory* address;
            size_t address_length;
        } clear;
        struct {  // type == OEMCrypto_BufferType_Secure
            void* handle;
            size_t handle_length;
            size_t offset;
        } secure;
        struct {  // type == OEMCrypto_BufferType_Direct
            bool is_video;
        } direct;
    } buffer;
} OEMCrypto_DestBufferDesc;

typedef struct {
    const OEMCrypto_SharedMemory* input_data
    size_t input_data_length;
    OEMCrypto_DestBufferDesc output_descriptor;
} OEMCrypto_InputOutputPair;

typedef struct {
    size_t num_bytes_clear;
    size_t num_bytes_encrypted;
    uint8_t subsample_flags;
    size_t block_offset;
} OEMCrypto_SubSampleDescription;

typedef struct {
    OEMCrypto_InputOutputPair buffers;
    uint8_t iv[16];
    const OEMCrypto_SubSampleDescription* subsamples;
    size_t subsamples_length;
} OEMCrypto_SampleDescription;

typedef struct {
    size_t encrypt;
    size_t skip;
} OEMCrypto_CENCEncryptPatternDesc;

OEMCryptoResult OEMCrypto_DecryptCENC(
    OEMCrypto_SESSION session,
    const OEMCrypto_SampleDescription* samples,
    size_t samples_length,
    const OEMCrypto_CENCEncryptPatternDesc* pattern);
```

```
samples_length: 2                      // samples_length
sample[0]: // Audio sample no SVP
  iv: 112233445566778899AABBCCDDEEFF00 // samples[0].iv
  iv_length: 16
  crypt_byte_block: 0                  // pattern.encrypt
  skip_byte_block: 0                   // pattern.skip
  subsample_count: 2                   // samples[0].subsamples_length
  subsample_lengths[0]:
    bytes_of_clear_data: 50            // samples[0].subsamples[0].num_bytes_clear
    bytes_of_protected_data: 100       // samples[0].subsamples[0].num_bytes_encrypted
  subsample_lengths[1]:
    bytes_of_clear_data: 50            // samples[0].subsamples[1].num_bytes_clear
    bytes_of_protected_data: 100       // samples[0].subsamples[1].num_bytes_clear
  context: CTR algorithm, decrypt mode, decryption key
  out.buffer_type: SA_BUFFER_TYPE_CLEAR// samples[0].buffers.output_descriptor.type == OEMCrypto_BufferType_Clear
  out.context.clear:
    buffer: *********                  // samples[0].buffers.output_descriptor.buffer.clear.address
    length: 300                        // samples[0].buffers.output_descriptor.buffer.clear.length
    offset: 0
  in.buffer_type: SA_BUFFER_TYPE_CLEAR
  in.context.clear:
    buffer: **********                 // samples[0].buffers.input_data
    length: 300                        // samples[0].buffers.input_data_length
    offset: 0
sample[1]: // Video sample with SVP
  iv: 112233445566778899AABBCCDDEEFF00 // samples[1].iv
  iv_length: 16
  crypt_byte_block: 0                  // pattern.encrypt
  skip_byte_block: 0                   // pattern.skip
  subsample_count: 3
  subsample_lengths[0]:
    bytes_of_clear_data: 50            // samples[1].subsamples[0].num_bytes_clear
    bytes_of_protected_data: 200       // samples[1].subsamples[0].num_bytes_encrypted
  subsample_lengths[1]:
    bytes_of_clear_data: 50            // samples[1].subsamples[1].num_bytes_clear
    bytes_of_protected_data: 200       // samples[1].subsamples[1].num_bytes_encrypted
  subsample_lengths[2]:
    bytes_of_clear_data: 0             // samples[1].subsamples[2].num_bytes_clear
    bytes_of_protected_data: 200        // samples[1].subsamples[2].num_bytes_encrypted
  context: CTR algorithm, decrypt mode, decryption key
  out.buffer_type: SA_BUFFER_TYPE_SVP  // samples[0].buffers.output_descriptor.type == OEMCrypto_BufferType_Secure
  out.context.svp:
    buffer: *********                  // samples[1].buffers.output_descriptor.buffer.secure.handle
    offset: 0                          // samples[1].buffers.output_descriptor.buffer.secure.offset
  in.buffer_type: SA_BUFFER_TYPE_CLEAR
  in.context.clear:
    buffer: **********                 // samples[1].buffers.input_data
    length: 700                        // samples[1].buffers.input_data_length
    offset: 0
```

### Use Case 4 - Mapping PlayReady data structure

This use case describes how to map the PlayReady data structure into a call to
`sa_cipher_process_common_encryption`.  The comments listed after the parameters identify which
PlayReady field to use.

**PlayReady Common Encryption Interface**

```c
DRM_API DRM_RESULT DRM_CALL Drm_Reader_DecryptMultipleOpaque(
    __in
                DRM_DECRYPT_CONTEXT      *f_pDecryptContext,
    __in
                DRM_DWORD                 f_cEncryptedRegionInitializationVectors,
    __in_ecount( f_cEncryptedRegionInitializationVectors )
        const   DRM_UINT64               *f_pEncryptedRegionInitializationVectorsHigh,
    __in_ecount_opt( f_cEncryptedRegionInitializationVectors )
        const   DRM_UINT64               *f_pEncryptedRegionInitializationVectorsLow,
    __in_ecount( f_cEncryptedRegionInitializationVectors )
        const   DRM_DWORD                *f_pEncryptedRegionCounts,
    __in
                DRM_DWORD                 f_cEncryptedRegionMappings,
    __in_ecount( f_cEncryptedRegionMappings )
        const   DRM_DWORD                *f_pEncryptedRegionMappings,
    __in
                DRM_DWORD                 f_cEncryptedRegionSkip,
    __in_ecount_opt( f_cEncryptedRegionSkip )
        const   DRM_DWORD                *f_pEncryptedRegionSkip,
    __in
                DRM_DWORD                 f_cbEncryptedContent,
    __in_bcount( f_cbEncryptedContent )
        const   DRM_BYTE                 *f_pbEncryptedContent,
    __out
                DRM_DWORD                *f_pcbOpaqueClearContent,
    __deref_out_bcount( *f_pcbOpaqueClearContent )
                DRM_BYTE                **f_ppbOpaqueClearContent
     );
```

```
samples_length: 2                      // samples_length
sample[0]: // Audio sample no SVP
  iv: 112233445566778899AABBCCDDEEFF00 // f_pEncryptedRegionInitializationVectorsHigh[0] +
                                       // f_pEncryptedRegionInitializationVectorsLow[0]
  iv_length: 16
  crypt_byte_block: 0                  // if(f_cEncryptedRegionSkip == 0) 0
                                       // if (f_cEncryptedRegionSkip == 2) f_pEncryptedRegionSkip[0]
  skip_byte_block: 0                   // if(f_cEncryptedRegionSkip == 0) 0
                                       // if (f_cEncryptedRegionSkip == 2) f_pEncryptedRegionSkip[1]
  subsample_count: 2                   // f_pEncryptedRegionCounts[0] / 2
  subsample_lengths[0]:
    bytes_of_clear_data: 50            // f_pEncryptedRegionMappings[0]
    bytes_of_protected_data: 100       // f_pEncryptedRegionMappings[1]
  subsample_lengths[1]:
    bytes_of_clear_data: 50            // f_pEncryptedRegionMappings[2]
    bytes_of_protected_data: 100       // f_pEncryptedRegionMappings[3]
  context: CTR algorithm, decrypt mode, decryption key
  out.buffer_type: SA_BUFFER_TYPE_SVP  // Drm_Content_SetProperty(DRM_CSP_DECRYPTION_OUTPUT_MODE)
  out.context.clear:
    buffer: *********                  // f_ppbOpaqueClearContent
    length: 300
    offset: 0                          // 0
  in.buffer_type: SA_BUFFER_TYPE_CLEAR
  in.context.clear:
    buffer: **********                 // f_pbEncryptedContent
    length: 300                        // calculated from f_pEncryptedRegionMappings[0..3]
    offset: 0
sample[1]: // Video sample with SVP
  iv: 112233445566778899AABBCCDDEEFF00 // f_pEncryptedRegionInitializationVectorsHigh[1] +
                                       // f_pEncryptedRegionInitializationVectorsLow[1]
  iv_length: 16
  crypt_byte_block: 0                  // if(f_cEncryptedRegionSkip == 0) 0
                                       // if (f_cEncryptedRegionSkip == 2) f_pEncryptedRegionSkip[0]
  skip_byte_block: 0                   // if(f_cEncryptedRegionSkip == 0) 0
                                       // if (f_cEncryptedRegionSkip == 2) f_pEncryptedRegionSkip[1]
  subsample_count: 3                   // f_pEncryptedRegionCounts[1] / 2
  subsample_lengths[0]:
    bytes_of_clear_data: 50            // f_pEncryptedRegionMappings[4]
    bytes_of_protected_data: 200       // f_pEncryptedRegionMappings[5]
  subsample_lengths[1]:
    bytes_of_clear_data: 50            // f_pEncryptedRegionMappings[6]
    bytes_of_protected_data: 200       // f_pEncryptedRegionMappings[7]
  subsample_lengths[2]:
    bytes_of_clear_data: 0             // f_pEncryptedRegionMappings[8]
    bytes_of_protected_data: 200       // f_pEncryptedRegionMappings[9]
  context: CTR algorithm, decrypt mode, decryption key
  out.buffer_type: SA_BUFFER_TYPE_SVP  // Drm_Content_SetProperty(DRM_CSP_DECRYPTION_OUTPUT_MODE)
  out.context.svp:
    buffer: *********                  // f_ppbOpaqueClearContent
    offset: 0                          // 0
  in.buffer_type: SA_BUFFER_TYPE_CLEAR
  in.context.clear:
    buffer: **********                 // f_pbEncryptedContent + 300
    length: 700                        // calculated from f_pEncryptedRegionMappings[4..9]
    offset: 0
```

## Drawbacks

SecApi 3 as originally designed is meant to be a general, all-purpose cryptographic library. Adding
support for Common Encryption with SecApi 3 would add a specialized cryptographic function to this
library.

## Alternatives considered

None

## Unresolved questions

None

## Future possibilities

None

## References

*ISO/IEC 23001-7 Common encryption in ISO base media file format files*