Crates.io | rove |
lib.rs | rove |
version | 0.1.1 |
source | src |
created_at | 2023-10-19 15:16:46.843498 |
updated_at | 2023-10-19 15:55:30.167215 |
description | System for real time spatial and timeseries quality control of weather data |
homepage | |
repository | https://github.com/metno/rove |
max_upload_size | |
id | 1007915 |
size | 115,545 |
ROVE is a system for performing real-time quality control (spatial and temporal) on weather data at scale. It was created to meet Met Norway's internal QC needs under the CONFIDENT project, and replace legacy systems. However, it was designed to be modular and generic enough to fit others' needs, and we hope it will see wider use.
In alpha testing.
Benchmarking code is available here.
There are three benchmarks:
Here are the results run on an M1 mac:
single_benchmark thrpt: 53.036 Kelem/s
series_benchmark thrpt: 5.4106 Melem/s
spatial_benchmark thrpt: 194.01 Kelem/s
Kelem/s = thousand data points per second, M for million.
It is worth noting that ROVE scales horizontally. If you need more throughput than one node can provide, you can set up as many as you need behind a load balancer, though in most cases it's likely your bottleneck will be your data source.
To use ROVE you need to generate bindings for the API in the language you want to use. The API definition can be found here.
The API has 2 endpoints:
Once you've set up bindings here's an example of how to use them to make a request to Met Norway's ROVE server in Python:
import grpc
import proto.rove_pb2 as rove
import proto.rove_pb2_grpc as rove_grpc
from proto.rove_pb2 import google_dot_protobuf_dot_timestamp__pb2 as ts
from datetime import datetime, timezone
def send_series(stub):
request = rove.ValidateSeriesRequest(
series_id="frost:18700/air_temperature",
start_time=ts.Timestamp(
seconds=int(datetime(2023, 6, 26, hour=14, tzinfo=timezone.utc).timestamp())
),
end_time=ts.Timestamp(
seconds=int(datetime(2023, 6, 26, hour=16, tzinfo=timezone.utc).timestamp())
),
tests=["dip_check", "step_check"],
)
print("Sending ValidateSeries request")
responses = stub.ValidateSeries(request)
print("Response:\n")
for response in responses:
print("Test name: ", response.test, "\n")
for result in response.results:
print(
" Time: ",
datetime.fromtimestamp(result.time.seconds, tz=timezone.utc),
)
print(" Flag: ", rove.Flag.Name(result.flag), "\n")
def send_spatial(stub):
request = rove.ValidateSpatialRequest(
spatial_id="frost:air_temperature",
time=ts.Timestamp(
seconds=int(datetime(2023, 6, 26, hour=14, tzinfo=timezone.utc).timestamp())
),
tests=["buddy_check", "sct"],
polygon=[
rove.GeoPoint(lat=59.93, lon=10.05),
rove.GeoPoint(lat=59.93, lon=11.0),
rove.GeoPoint(lat=60.25, lon=10.77),
],
)
print("Sending ValidateSpatial request")
responses = stub.ValidateSpatial(request)
print("Response:\n")
for response in responses:
print("Test name: ", response.test, "\n")
for result in response.results:
print(
" location: (lat: ",
result.location.lat,
" lon: ",
result.location.lon,
")",
)
print(" Flag: ", rove.Flag.Name(result.flag), "\n")
def main():
channel = grpc.insecure_channel("157.249.77.242:1337")
stub = rove_grpc.RoveStub(channel)
send_series(stub)
send_spatial(stub)
if __name__ == "__main__":
main()
Warning: ROVE is not yet production-ready.
You can set up your own ROVE instance connected to your own data source. It can work as either a gRPC server that receives requests over the network, or a component within a larger service, where QC runs are triggered by function calls. Examples of both are available in the documentation.
Of particular note, you will need to provide implementations of the DataConnector Trait so that ROVE knows how to talk to your data sources. Some real-world examples of DataConnector implementations can be found under met_connectors, where frost talks to a http REST API, and lustre_netatmo reads data from csv files over Network File System.
Crate documentation is available here.
Contributions are welcome, contact Ingrid (ingridra@met.no).