BLS Signature Aggregation 

1.  Hierarchies

We have roughly four aggregation-like techniques that apply to 
BLS signatures.  We briefly explain them and classify their possible
layerings according to several criteria:
- performance or priority,
- mutual compatibility and natural layering,
- if signature side aggregation can be done securely outside the verifier,
  or if the scheme is really just a local batching technique, 
- and possible future work.

We have two linear aggregation strategies: 

_Distinct message_ aggregation requires verifiers compute one pairing
for each message, which favours distinct messages being among
the final/outermost aggregation techniques, and sounds compatible
with other aggregation techniques.  We may aggregate the signature
side for distinct messages early, meaning outside the verifier. 

In principle, we might aggregate the message hashes if an identical
set of messages is signed by several signers, improving performance,
but this requires some care.  We might consider if any threshold
schemes support distinct message aggregation like this.

_Proof-of-possesion_ requires key material be previously establish
and verified by the protocol, but requires no masking with expensive
exponentiations.  See the security arguments in 
 The Power of Proofs-of-Possession:
 Securing Multiparty Signatures against Rogue-Key Attacks
 by Thomas Ristenpart and Scott Yilek
 at https://eprint.iacr.org/2007/264.pdf

We expect this proof-of-possesion assumption makes delinearization
aggregation techniques unnecessary, although this might not be true
if different security goals apply to different aggregation layers,
maybe even in our para-chain vs relay chain situation.  In particular,
verifiers might either batch verifications, or prove their own correct
behavior using ideas around random oracle delinearization.

We believe proof-of-possesion must occur "below" other aggregation
techniques because the first/lowest aggregator should itself validate
the proof-of-possesion, with later/higher aggregators avoiding this
only by attributing all responsibility to the first/lowest.

We also have two delinearization strategies:

_Random oracle delinearization_ is the most robust and flexible
aggregation technique, and also the slowest.  It delinearizes by
exponentiating both signatures and public keys with the result of a
query to a random oracle that depends upon both the signer's public
key along with the set of all signing keys included in the signature.  
See the security arguments in
 Compact Multi-Signatures for Smaller Blockchains
 by Dan Boneh, Manu Drijvers, and Gregory Neven
 at https://eprint.iacr.org/2018/483.pdf

We envision random oracle delinearization comes "below" other
aggregation techniques because doing it higher up requires access to
the separate signatures, and we'd expect to simply pay the high cost
throughout.  That said, we could an initial validation layer might
aggregate differently due to some trade off in performance vs threat
model, maybe with local small random maskings.  

_Local small random delinearization_ aka _batching_ should normally
occur entirely within each verifier, making it likely the final or
outermost aggregation technique.  Any such batching scheme takes a
collection of signature triples (public key, message, signature) and
considers them all simultaneously.

We know batching is superfluous when combined with proof-of-possesion,
unless the proofs-of-possesion come with different security assurances.
We envision batching to be superfluous when combined with random
oracle delinearization too, provided all keys live in one key set.
In principle, one might delinearize between different key sets locally
though, which protects the signatures in different key sets from one
another. 

We now see four likely hierarchies that yield nice APIs:

If we have proofs-of-possesion then we simply aggregate freely and
support distinct messages, but avoid delinearization by requiring
that all proof-of-possesion have similar security assurances. 
Polkadot will use this for validators.

If we pay the cost of random oracle delinearization then we do so
early, and support distinct message aggregation too, but require
identical key sets only for signatures corresponding to the same
message.  Fog of Trust shall use roughly this. 

If we merely batch then we do so at verification time, and
delinearize only between keys corresponding to the same message.
We can make the combined points cachable using `BuildHasher`

If we do not want any duplicate messages, then callers can simply
detect duplicates themselves, and give an error.  In this case,
both first and third option provide equivalent performance.


Polkadot could use this for payments, as messages should mostly be
distinct, but EdDSA batching sounds faster.


2. Performance

In a normal BLS signature one verification equation is either
   e(H_1(m),P_2) = e(sigma_1,G_2) where P_2=G_2^p
or
   e(P_1,H_2(m)) = e(G_1,sigma_2) where P_1=G_1^p

We have G_1 living in a curve defined over the base field Fq with G_2 living in a curve defined over an extension field Fqe, which makes G_2 much bigger.

There are no exponentiations in a single BLS signature verification, so only the H_1 vs H_2 speeds influenced verifiers preferences, and verifiers prefer H_1.  Signers prefer H_1 too, assuming they sign many messages but cache their public key.

Aggregating distinct messages requires a pairing and addition in the target group for each message.  As the public keys are not aggregated here, their prepared point form may be cached, so presumably the H_i speed dominates for verifiers, magnifying the no aggregation case for H_1 preferences. 

Aggregating identical messages with proof-of-possesion requires verifiers do addition in P_{2-i} when the signature uses H_i, so this eventually favours H_2 with enough signatures per message, but the number is large.  If the signing key sets are commonly identical, then again their combined prepared point form may be cached.

Aggregating identical messages with repeated signers with any delinearization scheme requires verifiers do exponentiation in P_{2-i} when the signature uses H_i, so verifiers favour H_2 much faster, although less fast with batching than with random oracle delinearization.  If the signer list is fixed, then perhaps one could cache delinerized signers too, which slows the switch to H_2.  If we do switch to H_2 so that verifier exponentiations happen in P_1, then actually caching affine or prepared points for messages becomes more important, assuming the same messages get verified in different batches.  In all this, proof-of-possesion becomes much more tempting of course.  

We expect threshold BLS signatures favor H_2 much faster for signers because they compute [commitments on the public key curve](https://github.com/poanetwork/threshold_crypto/blob/master/src/poly.rs#L578), which requires numerous exponentiations.  In principle, verifiers might still favor H_1, so if the verifiers outnumber the signers by a wide margin then the system may become quite slow.  


3. Operations

Serialization and deserialization go from and to affine elements, but deserialization requires an expensive scalar multiplication check.

Anything algebraic like scalar multiplication results in projective points, but may or may not operate faster if given affine points.  Hashing produces projective points too.  Affine's `mul_bits` function has less short circuiting logic, but still remains non-constant time.  If affine is faster, then support cached affine points for delinerazation. 

Pairing requires prepared points, which one obtains from affine points.  It appears the preparation step is expensive only on the big curve.  We think conversion to affine always benefits from being batched, so currently we batch them whenever possible.  We could cache affine or prepared points, but only we they'll actually be repeated as we cannot do algebra on them.  

We need to check benchmarks for all these operatins.  


Pippenger’s algorithm


https://github.com/rust-lang/rfcs/pull/2289#issuecomment-409337644


```
fn batch_normalization<II: ?Sized>(v: &mut II)
where for<'a> &'a mut II: IntoIterator<Item = &'a mut Self>,
      for<'a> <&'a mut II as IntoIterator>::IntoIter: DoubleEndedIterator+ExactSizeIterator
```


/// - ECDSA:  https://cr.yp.to/badbatch/badbatch-20120919.pdf