Crates.io | kvc |
lib.rs | kvc |
version | 1.1.3 |
source | src |
created_at | 2021-03-27 15:41:31.835957 |
updated_at | 2023-06-21 03:18:16.199052 |
description | Very simple key-value-count tools to go from / to pandas data frames or streaming formats |
homepage | https://github.com/jodavaho/kvc |
repository | |
max_upload_size | |
id | 374260 |
size | 1,370,491 |
This crate / package is a rust module that handles streaming input and output. It's purpose is to tally or accumulate values for a streaming set of keys. It is designed to be stupidly simple and consume / produce whitespace seperate values.
I use this library to parse simple journal-like logs where each line is of the form:
2021-03-01 warnings:3 error ... (other items with optional counts)
Supposing I wanted to do some processing on this data. This is a very readable / writeable format, but is not standard.
We can use
kvc-stream
to covert it into something more lika a stream of k-value pairs
or
kvc-df
to convert it to a pandas dataframe
The kvc journal format is very simple.
These are valid frames, one per line:
a
event event
2021-04-01 april_fools_pranks:4
2021-03-01 key another_key a-third-key <weird-symbols_ar_ok!> this_has_occured_three_times:3 this_twice this_twice
2021-04-02 # Nothing happened that day
Suppose that's stored in data.txt
. (try it!)
Running <data.txt kvc-stream
produces:
1 a 1
2 event 2
3 Date 2021-04-01
3 april_fools_pranks 4
4 Date 2021-03-01
4 <weird-symbols_ar_ok!> 1
4 a-third-key 1
4 this_has_occured_three_times 3
4 this_twice 2
4 key 1
4 another_key 1
5 Date 2021-04-02
Running cat data.txt | kvc-df
(or < data.txt kvc-df
) produces:
index april_fools_pranks this_twice a <weird-symbols_ar_ok!> event Date a-third-key key this_has_occured_three_times another_key
1 0 0 1 0 0 0 0 0 0 0
2 0 0 0 0 2 0 0 0 0 0
3 4 0 0 0 0 2021-04-01 0 0 0 0
4 0 2 0 1 0 2021-03-01 1 1 3 1
5 0 0 0 0 0 2021-04-02 0 0 0 0
OK, so I actually aligned the text and output an index with cat data.txt | kvc-df -i | column -t
I use this to keep a journal of events and easily scrape it for analysis in other programs or databases.