# CSV Uploader A custom CSV -> DB uploader program. ## Speed Trust me, you'll need speed when uploading 5M records. Parallelized in a two step process (looped): 1. We buffer records in an array as we read and parse (ex. 1000 records). This is the reader (main thread) 1. Once that array fills up, we push the asynchronous upload future/task to a stack to be executed. (ex. 4 uploader threads) **Warning!**: the paralellization between threads (step 2) is still being worked on. I'm still reading up on the `tokio` library lol. :) ## _Custom_ Data As a secondary goal. We **normalize** the data while we parse it. This is highly variable and dependant on two things: 1. The DB and the Data Types it uses. 1. The datasets we're uploading and the type of data we've seen so far. So our current process is: - Parse to JSON data types - Drop any empty String values - Parse "False" -> false, "True" -> true - Replace ' inside Strings to " and try parsing again (because there's been some datasets in which that's been the case) ## Supported DB's (for now) - RethinkDB # Data Tested - https://catalogueoflife.org [ColDP Format Download](https://api.checklistbank.org/dataset/9842/export.zip?format=ColDP).