[fav_core](https://crates.io/crates/fav_core) is the core library of [fav_cli](https://github.com/kingwingfly/fav) (A cli tool to download remote resources and keep a local state in protobuf). In simple words, `fav_core` is a helper to build a stateful crawler. # Usage [fav_utils](https://crates.io/crates/fav_utils) provides the utils for [fav_cli](https://crates.io/crates/fav_cli), which now only support [BiliBili](https://www.bilibili.com)(Like Chinese YouTube). You can see it as an example for using this crate. To save status, instead of using json, this crate uses `protobuf` since it is faster. You need to define data structures with [protobuf](https://protobuf.dev) like [this example](https://github.com/kingwingfly/fav/blob/dev/fav_utils/proto/bili.proto) (To derive trait for code generated by protobuf, see [example](https://github.com/kingwingfly/fav/blob/dev/fav_utils/build.rs)). `Sets` contains `Set`s, `Set` contains `Res`s(resource). The workflow is: 1. fetch `Sets` to refresh `Set`s 2. fetch `Set` to refresh `Res`s 3. fetch and pull `Res` to download To implement this workflow and maintain a local state, `fav_core` has many useful traits: 1. network helper - `Api`: help defining the APIs - `ApiProvider`: make app able to provide API based on `ApiKind` enum - `Net`: make app able to use the Internet 2. Config - `Config: HttpConfig + ProtoLocal` mark the app able to be config and persisted - `HttpConfig`: define the default headers, cookies 3. Status and attributes - `Sets`: iterate over and get subset of sets - `Set`: iterate over and get subset of resources - `Res: Meta` - `Meta`: the metadata of resource, `Meta: Attr + Status` - `Attr`: provide resource's id and title - `Status`: the status of resource, like saved, fetched, tracked and expired 4. Operations - `Ops`: `Ops: AuthOps + SetsOps + SetOps + ResOps`, means the **app** can perform all needed operations - `AuthOps`: used to login and logout - `SetsOps`: used to `fetch_sets` info, for example, add `English` `Chinese` `Japanese` as new movie collections to `Sets` defined in protobuf. - `SetOps`: used to `fetch_set` info, for example, add 《Oliver Twist》《Roman Holiday》《Twelve Angry Men》to `English` collection. - `ResOps`: used to `fetch` and `pull` , for example, `fetch` id of 《Oliver Twist》 in target website, `pull` the resources to local disk based on the fetched id. 5. Persistence - `PathInfo`: defined where to store status and config - `ProtoLocal`: `ProtoLocal: PathInfo + MessageFull` used to read and write status and config - `SaveLocal`: make app able to download `Res`, and modify local status. 6. visualize (optional): show status as table 7. Ext methods: - `SetOpsExt: SetOps` batch fetch set in sets - `ResOpsExt: ResOps` batch fetch resources in set - `XXStatusExt`: batch modify children's StatusFlags To draw a conclusion, this crate contains all traits you need to build a stateful crawler. You can define data structures with `protobuf` for fast read and write. Make them stateful, configurable, and able to be persisted. Many network helper is provided, you can `request_json` and `resquest_protobuf` directly. And `Ext` traits are provided so that you can batch fetch and pull data or modify the resources' StatusFlags. An example can be found in [fav](https://github.com/kingwingfly/fav) repo. # CHANGELOG - 0.1.1 -> 0.1.2: `XXOpsExt` needs `batch_size` passed so that users can define the number of jobs concurrently. - 0.0.X -> 0.1.X: `Ops` related traits' methods need `Fut: Future<...>`, if Future is ready, one can cleanup, shutdown gracefully and return `FavCoreError::Cancel`. And `OpsExt` methods handle SIGINT based on this, keeps things reliable.