# acetylene-parser

A string parser for different chemical nomenclature.

## functions

- `tokenize(string, type="formula") -> Substance`

  Tokenizes a string describing a chemical, yielding a Substance with (optional)
  functional groups corresponding to (more) fundamental components.

  - "formula" type expects a simple "secondary school" element-symbol naming
  string.

  - "smiles" expects a chemical name utilizing the
    [SMILES system](https://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system).
  - **TODO**: "iupac" expects a chemical name utilizing the
    [IUPAC system](https://en.wikipedia.org/wiki/IUPAC_nomenclature_of_organic_chemistry).

## roadmap

- [x] implement SMILES parsing
- [ ] improve struct based on SMILES findings
- [ ] decide whether InChI implementation is worth it

## references

### SMILES

- https://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system
- http://opensmiles.org/opensmiles.html
- http://www.dalkescientific.com/writings/diary/archive/2004/01/05/tokens.html

### IUPAC

- http://www.chem.uiuc.edu/GenChemReferences/nomenclature_rules.html
- https://web.archive.org/web/20100626004648/http://www.acdlabs.co.uk/iupac/nomenclature/93/r93_125.htm
- https://en.wikipedia.org/wiki/IUPAC_nomenclature_of_organic_chemistry
- https://bitbucket.org/dan2097/opsin/src

### InChI

- http://iupac.org/who-we-are/divisions/division-details/inchi/
- http://www.inchi-trust.org/downloads/

### data

- http://www.chemicalize.org/blog/
- http://www.chemspider.com/