csv_uploader

Crates.iocsv_uploader
lib.rscsv_uploader
version0.1.3
sourcesrc
created_at2023-02-10 21:35:32.166478
updated_at2023-02-23 08:36:36.193913
descriptionA custom TSV/CSV -> DB uploader.
homepage
repository
max_upload_size
id782162
size44,991
RubenD (rubend056)

documentation

README

CSV Uploader

A custom CSV -> DB uploader program.

Speed

Trust me, you'll need speed when uploading 5M records.

Parallelized in a two step process (looped):

  1. We buffer records in an array as we read and parse (ex. 1000 records). This is the reader (main thread)
  2. Once that array fills up, we push the asynchronous upload future/task to a stack to be executed. (ex. 4 uploader threads)

Warning!: the paralellization between threads (step 2) is still being worked on. I'm still reading up on the tokio library lol. :)

Custom Data

As a secondary goal. We normalize the data while we parse it.

This is highly variable and dependant on two things:

  1. The DB and the Data Types it uses.
  2. The datasets we're uploading and the type of data we've seen so far.

So our current process is:

  • Parse to JSON data types
  • Drop any empty String values
  • Parse "False" -> false, "True" -> true
  • Replace ' inside Strings to " and try parsing again (because there's been some datasets in which that's been the case)

Supported DB's (for now)

  • RethinkDB

Data Tested

Commit count: 0

cargo fmt