| Crates.io | fetcher |
| lib.rs | fetcher |
| version | 0.15.3 |
| created_at | 2022-02-07 03:08:16.833897+00 |
| updated_at | 2025-06-09 00:38:43.55421+00 |
| description | Data automation and pipelining framework |
| homepage | |
| repository | https://github.com/SergeyKasmy/fetcher |
| max_upload_size | |
| id | 528150 |
| size | 481,035 |
fetcher is a flexible async framework designed to make it easy to create robust applications for building data pipelines to extract, transform, and deliver data from various sources to diverse destinations. In easier words, it makes it easy to create an app that periodically checks a source, for example a website, for some data, makes it pretty, and sends it to the users.
fetcher is made to be easily extensible to support as many use-cases as possible while providing tools to support most of the common ones out of the box.
At the heart of fetcher is the Task. It represents a specific instance of a data pipeline which consists of 2 main stages:
Source: Fetches data from an external source (e.g. HTTP endpoint, email inbox).Action: Applies transformations (filters, modifications, parsing) to the fetched data.
The most notable action is Sink that sends the transformed data somewhere (e.g. Discord channel, Telegram chat, another program's stdin)An Entry is the unit of data flowing through the pipeline. It most notably contains:
id: A unique identifier for the entry, used for tracking read/unread status and replies.raw_contents: The raw, untransformed data fetched from the source.msg: A Message that contains the formated and structured data,
like title, body, link, that will end up sent to a sink.A Job is a collections of one or more tasks that are executed together, potentially on a schedule.
Jobs can also be run either concurrently or in parallel (depending on the "send" feature) as a part of a JobGroup.
Everything in fetcher is defined and used via traits, including but not limited to:
Jobs, Tasks,
Sources, Actions,
JobGroups.
This allows you to define and use anything you might be missing in fetcher by default without having to modify any fetcher code whatsoever.
The easiest way to extend fetcher's parsing capabilities is to use transform_fn
that allows you to just pass in an async closure that modifies entries in whatever way you might want.
serde to get better error reporting and more flexibility than using Json?
Easy-peasy, just use transform_fn to wrap an async closure
in which you just call let deserialized: Foo = serde_json::from_str(&entry.raw_contents) and use it however you want.Replace's & Extract's?
transform_fn got your back, too.Sink trait on your type.MarkAsRead
and Filter on your type.ExternalSave yourself and do whatever you want.If anything is not extensible, this is a bug and it should be reported.
To use fetcher, you need to add it as a dependency to your Cargo.toml file:
[dependencies]
fetcher = { version = "0.15", features = ["full"] }
tokio = { version = "1", features = ["full"] }
For the smallest example on how to use fetcher, please see examples/simple_website_to_stdout.rs.
More complete examples can be found in the examples/ directory. They demonstrate how toj
Use the (enabled by default) send feature to enable tokio multithreading support.
If send is disabled, then the Send + Sync bounds are relaxed from most types
but job groups no longer run jobs in parallel, using [tokio::task::spawn_local] instead of [tokio::spawn].
Please note that this requires you to wrap your calls to JobGroup::run in a [tokio::task::LocalSet] to work.
Please see tests/non_send.rs for an example.
The nightly feature enables some traits implementation for some Rust nightly-only types, like !.
Each source, action, and sink (which is also an action but different enough to warrant being separate), is gated behind a feature gate to help on the already pretty bad build times for apps using fetcher.
A feature is usually named using "(source|action|sink)-(name)" format.
Not only that, all sources, actions, and sinks (and misc features like google-oauth2) are also grouped into "all-(sources|actions|sinks|misc)" features
to enable every source, action, sink, or misc respectively.
Every feature can be enabled with the feature full.
This is the preffered way to use fetcher for the first time as it enables to use everything you might need before you actually know what you need.
Later on full can be replaced with the actual features you use to get some easy compile time gains.
For example, an app fetching RSS feeds and sending them to a telegram channel might use features source-http, action-feed, and sink-telegram.
fetcher was completely rewritten in v0.15.0. It changed from an application with a config file to an application framework.
This was mostly done to make using fetcher correctly as easy and bug-free as possible.
Not to mention the huge config file was getting unwieldy and difficult to write and extend to your needs.
To make the config file more flexible would require integrating an actual programming language into it (like Lua).
I actually considered integrating Lua into the config file (a-la the Astral web framework) before I remembered that
we already have a properly integrated programming language, the one fetcher has always been written in in the first place.
I decided to double down on the fact that fetcher is written in Rust,
instead making fetcher a highly-extensible easy-to-use generic automation and data pipelining framework
which can be used to build apps, including apps similar to what fetcher has originally been.
Since then fetcher-core and fetcher-config crates are no longer used (or needed),
so if anybody needs these on crates.io, hit me up!
Contributions are very welcome! Please feel free to submit a pull request or open issues for any bugs, feature requests, or general feedback.
License: MPL-2.0