Crates.io | conda_curation |
lib.rs | conda_curation |
version | 0.7.0 |
source | src |
created_at | 2024-05-07 21:11:30.351147 |
updated_at | 2024-07-14 19:51:41.259151 |
description | Reduce conda repodata to enforce policy and speed up solves. Alpha software. |
homepage | |
repository | https://github.com/AaronOpfer/conda_curation |
max_upload_size | |
id | 1233020 |
size | 160,453 |
conda_curation
conda_curation
is a tool which is designed to filter conda repositories, especially Conda Forge, in order to remove packages based on a variety of kinds of criteria.
matchspecs/secure_python.yaml
)python-3.9.18-h12345678_0
is superceded by python-3.9.18-h12345678_1
, and so the former package is removed)dev
and rc
packages (i.e. 2.0.0.dev0
or 2.0.0.rc0
).pypy
, etc)python >=3.12
, and specifies -C python
, then older openssl
such as openssl 1.1.1n
will be removed, since mamba create -n ... openssl==1.1.1n python>=3.12
cannot be solved.Supports CEP-15 base_url
: if the source respository (as specified by the --channel-alias
flag) does not already have a info.base_url
set, then the output repodata.json
will have its info.base_url
set to the --channel-alias
. If it was set in the original repodata.json
, then it will be preserved.
If all clients support CEP-15, then this obviates the need for a proxy server configured to 30x redirect all package requests to the --channel-alias
destination.
conda_curation
serves a small-to-medium sized enterprise that want to begin using Conda internally and wants to leverage the rich Conda Forge package ecosystem rather than create their own packages or hand-curate.
The main reason why conda_curation
was created was for performance: by reducing the Conda Forge repodata to a smaller size, substantial Conda client performance improvements may be observed. At Chicago Trading Company, a prototype of this repodata-filtering system applied to Conda-Forge reduced mamba mambabuild
runtimes by about two minutes across a wide variety of pipelines. mamba create --dry-run
commands were seen to take 10 seconds instead of 20 seconds. Solve failures were also rendered much faster (and cleaner).
A security team may demand that insecure packages, such as older Python interpreters, CA certificate bundles, OpenSSL versions, etc. are completely unavailable from within the enterprise. conda_curation
is capable of creating these kinds of policies.
There are significant feature limitations of this software, as it was initially only targeting a Minimum Viable Product (MVP) of fitting into a specific point in Chicago Trading Company's artifact delivery. As such, it will be necessary for the user to bring their own HTTP proxy / cache proxy system for serving packages, but also contains a diversion for .*repodata.*\.json.*
URLs that redirects to the rendered output of conda_curation
. We have successfully done this using nginx
with 301 redirects for asset downloads to the artifact server during thest testing phase, and by putting nginx directly in front of the artifact server in the deployment phase.
The original prototype of this tool was developed by myself (@AaronOpfer) at Chicago Trading Company, based on observations from my colleague Bozhao Jiang that hand-crafted "curated" channels caused conda builds to finish several minutes faster than they were previously. The original version was written in Python and, due to its performance issues, reached a hard limit on feature development as the development cycle time lengthened. I rewrote the project in Rust in my free time to create this version, and have received permission to release it to the community under the MIT License.