![logo](logo.png)
# Moleco
Moleco stands for **mole**cule to **co**lor. It generates unique color swatch for given substance based on its InChI notation. It can also generate color identification for mixture using MInChI notation.
## How to run
```bash
moleco generate "InChI=1S/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3" --print
```
That will generate a color swatch for caffeine.
![caffeine](readme/caffeine.png)
## Installation
For now you can only install it with help of [cargo, rust package manager](https://doc.rust-lang.org/cargo/getting-started/installation.html).
```bash
cargo install moleco
```
## Support for mixtures
Of course in nature there is much more likely to see mixtures instead of single substances, so MInChI is supported as well. You can generate **toothpaste**:
```bash
moleco generate "MInChI=0.00.1S/C12H26O4S.Na/c1-2-3-4-5-6-7-8-9-10-11-12-16-17(13,14)15;/h2-12H2,1H3,(H,13,14,15);/q;+1/p-1&C3H8O3/c4-1-3(6)2-5/h3-6H,1-2H2&C7H5NO3S.Na/c9-7-5-3-1-2-4-6(5)12(10,11)8-7;/h1-4H,(H,8,9);/q;+1/p-1&Ca.H3O4P.2H2O/c;1-5(2,3)4;;/h;(H3,1,2,3,4);2*1H2/q+2;;;/p-2&FH2O3P.2Na/c1-5(2,3)4;;/h(H2,2,3,4);;/q;2*+1/p-2&H2O/h1H2/n{6&2&&5&3&4&1}/g{215wf-3&25wf-2&1wf-2&8wf-3&2wf-3&5wf-1&15wf-3}" --print
```
![toothpaste](readme/toothpaste.png)
NOTE: Printing may not be supported (well) in all terminals, so results may vary, but saved images will be correct.
or **dishwashing liquid**:
```bash
moleco generate "MInChI=0.00.1S/C12H26O4S.Na/c1-2-3-4-5-6-7-8-9-10-11-12-16-17(13,14)15;/h2-12H2,1H3,(H,13,14,15);/q;+1/p-1&C18H30O3S.Na/c1-2-3-4-5-6-7-8-9-10-11-12-17-13-15-18(16-14-17)22(19,20)21;/h13-16H,2-12H2,1H3,(H,19,20,21);/q;+1/p-1&ClH.Na/h1H;/q;+1/p-1&H2O/h1H2/n{4&{2&4}&&{1&4}&3}/g{807wf-3&{6pp1&4pp1}117wf-3&1wf-2&{27pp0&73pp0}66wf-3&}" --print
```
![dishwashing liquid](readme/dishwashingliquid.png)
or solution of **9-Borabicyclo[3.3.1]nonane in undefined amounts of hexanes**:
```bash
moleco generate "MInChI=0.00.1S/C6H12/c1-6-4-2-3-5-6/h6H,2-5H2,1H3&C6H14/c1-3-5-6-4-2/h3-6H2,1-2H3&C6H14/c1-4-5-6(2)3/h6H,4-5H2,1-3H3&C6H14/c1-4-6(3)5-2/h6H,4-5H2,1-3H3&C8H15B/c1-3-7-5-2-6-8(4-1)9-7/h7-9H,1-6H2/n{5&{2&3&4&1}}/g{4mr-1&{&&&}}" --print
```
![borabicyclononane in hexanes](readme/boarabicyclononaneinhexanes.png)
or, if you are fan, you can generate **bechamel sauce**:
```bash
moleco generate "MInChI=0.00.1S/C12H17N4OS.ClH/c1-8-11(3-4-17)18-7-16(8)6-10-5-14-9(2)15-12(10)13;/h5,7,17H,3-4,6H2,1-2H3,(H2,13,14,15);1H/q+1;/p-1&C17H20N4O6/c1-7-3-9-10(4-8(7)2)21(5-11(23)14(25)12(24)6-22)15-13(18-9)16(26)20-17(27)19-15/h3-4,11-12,14,22-25H,5-6H2,1-2H3,(H,20,26,27)/t11-,12+,14-/m0/s1&C19H19N7O6/c20-19-25-15-14(17(30)26-19)23-11(8-22-15)7-21-10-3-1-9(2-4-10)16(29)24-12(18(31)32)5-6-13(27)28/h1-4,8,12,21H,5-7H2,(H,24,29)(H,27,28)(H,31,32)(H3,20,22,25,26,30)/t12-/m0/s1&C20H30O/c1-16(8-6-9-17(2)13-15-21)11-12-19-18(3)10-7-14-20(19,4)5/h6,8-9,11-13,21H,7,10,14-15H2,1-5H3/b9-6+,12-11+,16-8+,17-13+&C27H44O/c1-19(2)8-6-9-21(4)25-15-16-26-22(10-7-17-27(25,26)5)12-13-23-18-24(28)14-11-20(23)3/h12-13,19,21,24-26,28H,3,6-11,14-18H2,1-2,4-5H3/b22-12+,23-13-/t21-,24+,25-,26+,27-/m1/s1&C27H46O/c1-18(2)7-6-8-19(3)23-11-12-24-22-10-9-20-17-21(28)13-15-26(20,4)25(22)14-16-27(23,24)5/h9,18-19,21-25,28H,6-8,10-17H2,1-5H3/t19-,21+,22+,23-,24+,25+,26+,27-/m1/s1&C6H5NO2/c8-6(9)5-2-1-3-7-4-5/h1-4H,(H,8,9)&C8H10NO6P/c1-5-8(11)7(3-10)6(2-9-5)4-15-16(12,13)14/h2-3,11H,4H2,1H3,(H2,12,13,14)&C9H17NO5/c1-9(2,5-11)7(14)8(15)10-4-3-6(12)13/h7,11,14H,3-5H2,1-2H3,(H,10,15)(H,12,13)/t7-/m0/s1&Ca/q+2&Na/q+1/n{{{{&}&6&11&&4}&{{&}&&&1&2&7&9&8&3&}}&{{&}&6&11&&&4&5&10}}/g{{{{56wf-2&25wf-3}8wf-1&3wf-3&1wf-2&125wf-4&}466wf-3&{{56wf-4&168wf-3}725wf-3&187wf-4&137wf-3&447wf-8&215wf-8&6365wf-8&1008wf-8&341wf-8&49wf-8&9wf-3}534wf-3}1pp1&{{6wv-1&2wv-2}2wv-2&8wv-5&48wv-5&48wv-3&36wv-3&&&}9pp1}" --print
```
![bechamel sauce](readme/bechamelsauce.png)
## Motivation
Idea was to create color code for containers with specific substances, that are easy to distinct:
![Cylinders with technical gases](readme/concept/moleco-cylinders.png)
...and if you change form factor - it is still easy, if you know color codes:
![Cans with technical gases](readme/concept/moleco-cans.png)
(As you can notice - oxygen and argon have similar swatches - primary and complementary, so you must be careful with those two; such collisions are inevitable, so be creative with design, create patterns and use accents, so you won't introduce confusion).
## How to generate InChI or MInChI?
For simple substances you can use [PubChem](https://pubchem.ncbi.nlm.nih.gov/), try also searching "substance name IhChI" - you should find it. For mixtures you can use [MInChI demo](http://molmatinf.com/minchidemo/).
## How mixture bar sizes are calculated
First of all - values at mixture bar (at the bottom for mixtures) are on **logharitmic** scale. This may be problematic, since if you consider two solutions of ethanol, one 40% and second 70% - its hard to see what is what:
![ethanol 40%](readme/ethanolwater4060.png)
![ethanol 70%](readme/ethanolwater7030.png)
Not really a difference.
But that was not the goal - the goal was to quickly differ between solutions with small amounts of potentially harmful chemicals. Consider again solution of ethanol - one 40% in water, second 40% of ethanol and 0.1% of bitrex (denatonium benzoate) in water.
![ethanol 40%](readme/ethanolwater4060.png)
![ethanol 40% with bitrex](readme/ethanolwaterbitrex.png)
Now its easy to make a difference even if there are trace amounts of extra substances.
### Order of color swatches
Order is not guaranteed. Moleco will try to **keep original order** of substances in mixture - the one given in command (MInChi demo (see links below) have specific order for substances). It may happen though that one of substances in middle of notation has missing or unestimated concentration - in such case its swatch **will be moved to the end** of the bar, so **primary colors of substances will be visibly matching to bar colors**.
Good example of such behavior is image of **dishwashing liquid** - if you decipher notation you will see that third substance (sodium chloride) has missing concentration, so it is moved to the end of the bar, behind water swatch. (You can find full notation in examples above).
![dishwashing liquid](readme/dishwashingliquid.png)
### Unknown and unestimated capacity
Sometimes you will not pass all the concentration in mixture, like in this 37% solution of formaldehyde in water:
```bash
moleco generate "MInChI=0.00.1S/CH2O/c1-2/h1H2&H2O/h1H2/n{1&2}/g{37wf-2&}" --print
```
![37% formaldehyde in water](readme/formaldehydewater.png)
is easy to calculate remaining amount of water (not precisely, not in molar sense, but since sizes are logarithmic we can skip small uncertainties) - it is ~63%. But what if there are two solvants like water and methanol without giving their concentrations - then it is possible to estimate remaining amount, but not exact amount of each solvent. In such case the remaining compound is marked as **unknown**.
```bash
moleco generate "MInChI=0.00.1S/CH2O/c1-2/h1H2&CH4O/c1-2/h2H,1H3 &H2O/h1H2/n{1&3&2}/g{37wf-2&&}" --print
```
![37% formaldehyde in water and methanol](readme/formaldehydemethanolwater.png)
Furthermore, if you use **ratio** (`VP`) in notation and you wont pass concentration of **at least one** ingredient, then the remaining amount is marked as **unestimated**.
```bash
moleco generate "MInChI=0.00.1S/CH2O/c1-2/h1H2&H2O/h1H2/n{1&2}/g{37vp0&}" --print
```
![37% formaldehyde in water unestimated](readme/formaldehydewater2.png)
Similar is with molar per liter/kilogram notions - `MB` and `MR` - if you use them **at all** the bar will show extra **unestimated** and **unknown** compound. It is becauce moleco is not calculating molar mass and volumes (it doesn't contain any internal database for substances), so it assumes that there is something extra as the result. You may wonder why its not treated like in case of range notation (see next paragraph) and not left in hands of user - `MB` and `MR` are currently **always wrong** - thats why. If you want to have quick walkaround - simply replace it with `VP` notation, or, take the longer route, and actually convert those notations to other, that is fully supported.
```bash
moleco generate "MInChI=0.00.1S/CH2O/c1-2/h1H2&H2O/h1H2/n{1&2}/g{37mb0&63mb0}" --print
```
![37% formaldehyde in water molar](readme/formaldehydewater3.png)
### Extra concentration notes
In case of range notation, like "10:20" only higher amount will be taken into account. This is due to fact that moleco is trying to estimate unknown/unestimated substances and if max possible solution is exceeding potential capacity - it is assumed **user knows what he is doing**. If you want to show extra substance, because you know there is some, you can always add it as separate, unmarked substance. See exambles below - second one is showing extra substance because one extra group is added to indexation and concentration notation.
```bash
moleco generate "MInChI=0.00.1S/C2H6O/c1-2-3/h3H,2H2,1H3&H2O/h1H2/n{1&2}/g{4vp1&6vp1}" --print-only
```
vs
```bash
moleco generate "MInChI=0.00.1S/C2H6O/c1-2-3/h3H,2H2,1H3&H2O/h1H2/n{1&2&}/g{4vp1&6vp1&}" --print-only
```
results look like
![37% formaldehyde in water](readme/ethanolwater4060.png)
![37% formaldehyde in water open bar](readme/ethanolwater4060open.png)
## Questions
### Why no support for molar mass and volume?
That would require incorporating some database of substances and their properties. This is way above the initial scope of this project, but could be considered in future.
### Are there collisions?
Yes, a lot. Out of over 117 millions of unique InChI strings that you can fetch from - only more that 80 millions are unique. And those are exact collisions, not including fact, that if given hue in swatch is different only by 1 degree - it is too little to be detected by human eye, even if technically there is no collision. Be warned and if you want to differentiate between two substances with similar swatches - be creative with design.
### Why no support for InChIKey?
Initially idea was to create system that is unique for every substance - and InChIKey already had some confirmed collisions, so it was not considered. Reality was more brutal (see above) but it was too late to include InChIKey.
### Why the shape?
Diamond divided into four parts was initial idea, usually when creating color swatch you will get 4 or 5 colors, but to have nice complement hue - 4 is easy to generate and diamond shape looks nice. To avoid confusion with [NFPA 704 marking](https://en.wikipedia.org/wiki/NFPA_704) - cutouts were introduced - therefore this "flower" shape.
Orientation mark is introduced as well to not be confused in case if single compound mark.
### How to recognize the substance?
It may be challenging to recognize the substance based on the color swatch after some time, so be sure to keep the name of substance or InChI notation somewhere close if you are using just the swatch. If you have original image file though - original substance will be saved in EXIF metadata.
## References
### InChI and MInChI
*
*
*
*
### Color spaces
*
*
### PubChem resources
*
*