Crates.io | c0sv |
lib.rs | c0sv |
version | 0.2.0 |
source | src |
created_at | 2020-11-28 07:33:19.948367 |
updated_at | 2020-12-04 20:19:46.040595 |
description | Binary CSV, using C0 ASCII control codes |
homepage | |
repository | |
max_upload_size | |
id | 317372 |
size | 24,113 |
This is a binary CSV format. Incredibly simple, and separated by ASCII Control characters. This uses SOH (Start of Heading), STX (Start of Text), ETX (End of Text), ESC (Escape), US (Unit Separator), and RS (Record Separator).
The stream is expressed in the following faux-EBNF (where * represents any single byte):
stream = [header], STX, records, ETX
header = SOH, units
units = unit, { US, unit}
unit = { (* - control) | (ESC, *) }
control = SOH | STX | ETX | ESC | US | RS
records = units, { RS, units}
This is mostly a simple experiment to see how feasible it would be to create a very CSV-like format using these ASCII control characters for delimitation (and particularly to use the control characters in the way they are intended to be used). It's probably not extraordinarily useful, because the only real purpose of CSV is exchange where manual readability and/or writability is important. If you want good binary flexibility, you're probably better off using a good binary format, like bincode or messagepack.
Still, this does have some convenient aspects, such as the fact that it is rather easily streamable, allowing processing records while needing only one in memory at a time. At the expense of being slower to parse, this format is capable of being slightly smaller than most other binary formats, as there are no length prefixes.
Over CSV:
[STX][ETX]
is a document with a single record consisting of a single empty field. This
means it is impossible to represent a record without any fields (it will be a
single empty field, rather than having no fields), or a document without any
records (it will be a document with a single record of a single empty field).That last one could feasibly be solvable by using the record separator and unit separator as prefixes instead of separators, but that's less fun, and doesn't fit the semantics of the control characters. It also increases the document size by one extra byte per record.