Crates.io | pcompress |
lib.rs | pcompress |
version | 1.0.7 |
source | src |
created_at | 2021-08-07 02:11:27.472805 |
updated_at | 2022-04-01 20:23:02.337534 |
description | Experimental, efficient, and performant binary representation of districting plans |
homepage | |
repository | https://github.com/InnovativeInventor/pcompress |
max_upload_size | |
id | 432645 |
size | 157,911 |
Previously, it was hard to store the state of every single step of a Markov Chain Monte Carlo run from GerryChain Python or GerryChain Julia. This repo produces an efficient, streamable intermediate binary representation of partitions/districting assignments that enables generated plans to be saved (and analyzed) on-the-fly. Each step is represented as the diff from the previous step, enabling a significant reduction in disk usage per step. The intermediate representation is then compressed with LZMA2 (via XZ).
With pcompress, you can save/replay MCMC runs in a common portable format, enabling our current use cases such as:
pcompress
is currently used within MGGG to power nearly all of our MCMC/ensemble analysis in order to provide quick analysis turnaround times and ensure reproducibility.
These stats are from the initial annoucement of pcompress
at lab meeting.
Note that these metrics may be slightly outdated -- you may see better real-world performance.
Additionally, these metrics do not take into account updaters/scoring overhead (as this is dependent on the user's code).
The upper bounds given are intended to give an estimate of how fast pcompress
could go, if we optimized further and implemented sharding.
cargo install pcompress
pip install pcompress
Note that chain
is a normal MarkovChain object and graph
is a normal GerryChain graph.
from pcompress import Record
for partition in Record(chain, "pa-run.chain"):
# normal chain stuff here
from pcompress import Record
for partition in Replay(graph, "pa-run.chain", updaters=my_updaters):
# normal chain stuff here
For more examples with GerryChain Python, look here.
pcompress
is written and maintained by Max Fan and is licensed under the AGPLv3 license.
If you want to contribute, PRs are always welcome.