created_at2023-01-22 16:41:31.583298
updated_at2023-07-25 12:42:18.906061
descriptionEasy-to-check Base58 encoding for identities
Dr. Maxim Orlovsky (dr-orlovsky)



# Baid58: a easy-to-check Base58 encoding for identities ![Build]( ![Tests]( ![Lints]( [![codecov](]( [![](]( [![Docs](]( [![Apache-2 licensed](](./LICENSE) ## TL;DR _**Baid58 is a Base58 equipped with an optional checksum (which is easy to see and verify) and human-readable information about the value.**_ ## Overview A lot of [binary-to-text encoding formats][b2t] exists today, which are designed for a different specific cases. Why another one? Well, since we have a need to encode short-length unique identifiers - like file or data structure hashes, cryptographic public keys, digital identities and certificates etc.. `Baid58` is a format for representing unique identities based on Base58 encoding ("baid" is a combination of "base" and "identity"). It is designed to match the following criteria: * be as short as possible; * but still copyable with a single mouse click; * work well with URLs; * maybe used as a file or directory name; * may be equipped with easy-to-visually verify checksum information when needed; * may contain simple human-readable prefix explaining the meaning of the value; * rely on an existing widespread binary-to-text encoding. We have chosen Base58 as most concise and widespread encoding which can be copied with a single click. We designed a way how it can be stuffed with prefix and suffix information to represent a human-readable identifier (HRI) and checksum in a different ways depending on a use case, like: - **file name**: `tommy-fuel-pagoda-7EnUZgFtu28sWqqH3womkTopXCkgAGsCLvLnYvNcPLRt.stl` - **single-click address**: `stlTommyFuelPagoda07EnUZgFtu28sWqqH3womkTopXCkgAGsCLvLnYvNcPLRt` - **visually clear address**: `stl_tommy_fuel_pagoda_7EnUZgFtu28sWqqH3womkTopXCkgAGsCLvLnYvNcPLRt` - **URI or a part of URL**: `stl:7EnUZgFtu28sWqqH3womkTopXCkgAGsCLvLnYvNcPLRt#tommy_fuel_pagoda` As you see, a `Baid58` encoded value is composed of the following components: * The actual *value* encoded with a Base58 encoding (using bitcoin flavour of it); * Optional *human-readable identifier (HRI)* which can prefix or follow the main value; * Optional checksum *mnemonic*, representing 32 least-significant bits of BLAKE3 hash of the value created using HRI as a hashing key. The mnemonic is created using [] dictionary and consists of three easy-to-distinguish words. ## Why not... ### Why not Base64 Since it contains characters which can't be used in URLs, file names and the encoded string can't be always selected with a single-click. ### Why not Base58 Baid58 is in fact Base58 equipped with an optional checksum (which is easy to see and verify) and human-readable information about the value. ### Why not Bech32 Bech32 strings are usually too long, while have no real advantages: * it is said they do not contain characters which can be confused - but this is not a problem when a checksum is used and checked both visually and by a computer, while ... * bech32 "checksum" is not visually distinguishable and most people even do not know where it is. In the result one may craft a string which will be still visually similar even when it has a correct and different checksum - and both humans and computers will miss the attack. * bech32 is stuffed with ECC, but if the string is broken we probably shouldn't use it at all (instead of trying to automatically correct errors). And we can see broken values when the mnemonic checksum doesn't match; * it is said Bech32 can result in shorter QRs, but it is not true: for instance QR code for both Base58-encoded 160-bit bitcoin P2SH and Bech32-encoded 160-bit P2WPK address have exactly the same size - if a user hasn't forgotten to uppercase the address value - or Bech32 QR code is larger if the uppercase was not made! As a result, we are getting longer strings to read, non-standard wierd encoding, false feel of safety - and no advantages over Base58, which only needs efficient and clearly-distinguished checksum and value type information - and this is exactly what Baid58 does. ### Why not multiformats [Multiformats] by Protocol Labs are a way to represent different binary-to-text encodings and values in a consistent way. However, there are some reasons to avoid their use for the case we need: - they do not provide any checksum information; - they do not provide any human-readable information; - they introduce support to multiple encodings while we need just a single one which matches our criteria; - they are not widely adopted; - they use non-fixed length integer encoding which is a bad practice from the point of view of deterministic computing and [strict types] (where Baid58 is used). ## Using crate Both HRI part and mnemonic checksum may be omitted - in this case we have just an unmodified `Base58` string. Alternatively, they can be formatted with this crate using rich functionality of rust display formatting language in the following ways: ## HRI, checksum and chunking The presence of human-readable identifier and checksum is controlled by precision flag and by alignment flags. All the options can be combined with the mnemonic flags from the next section; if a specific mnemonic flag is present than the checksum is not provided. | Flag | HRI | Checksum | Mnemonic | Separators | Example | |------|----------|-------------------|------------------------|------------|------------------------| | none | absent | absent | defined by other flags | n/a | | | `.0` | suffix | absent | defined by other flags | `.` | `ID.hri` | | `.1` | absent | added | absent | n/a | `IDchecksum` | | `.2` | absent | added | absent | n/a | chunk(`ID`) | | `.3` | absent | added | absent | n/a | chunk(`IDchecksum`) | | `.`N | reserved | | A`<` | prefix | absent | defined by other flags | `A` | `hri`A`ID` | | A`^` | prefix | added* | defined by other flags | `A` | `hri`A`IDchecksum` | | A`>` | suffix | added* | defined by other flags | `A` | `IDchecksum`A`hri` | _* added if no mnemonic flags are given_ ## Mnemonic representation of the checksum The presence and position of mnemonic is defined by alternative and sign flags: | Flag | HRI | Checksum | Mnemonic | Separators | Example | |------|--------|----------------------|----------------------|------------|------------------------| | none | - | defined by HRI flags | absent | n/a | | | `#` | - | - | suffix (dashed) | `#` | `ID#solo-lemur-wishes` | | `0` | - | - | prefix (capitalized) | `0` | `SoloLemurWishes0ID` | | `-` | - | - | prefix (dashes) | `-` | `solo-lemur-wishes-ID` | | `+` | - | - | prefix (underscored) | `_` | `solo_lemur_wishes_ID` | If width is given, it is used to place multiple fill characters between the value and HRI. Example formatting strings from the above: - **file name**: `{:-.1}` -> `tommy-fuel-pagoda-7EnUZgFtu28sWqqH3womkTopXCkgAGsCLvLnYvNcPLRt.stl` - **single-click address**: `{:<0}` -> `stlTommyFuelPagoda07EnUZgFtu28sWqqH3womkTopXCkgAGsCLvLnYvNcPLRt` - **visually clear address**: `{:_<+}` -> `stl_tommy_fuel_pagoda_7EnUZgFtu28sWqqH3womkTopXCkgAGsCLvLnYvNcPLRt` - **URI or a part of URL**: `{::^#}` -> `stl:7EnUZgFtu28sWqqH3womkTopXCkgAGsCLvLnYvNcPLRt#tommy_fuel_pagoda` - **Using checksum embedded into the ID**: `{::^}` -> `stl:2dzcCoX9c65gi1GoJ1LFzb5FcQ9pAc8o3Pj8TpcH2mkAdMLCpP` [b2t]: []: [multiformats]: [strict types]:
Commit count: 51

cargo fmt