# statsdproxy

A proxy for transforming, pre-aggregating and routing statsd metrics, like
[Veneur](https://github.com/stripe/veneur), [Vector](https://vector.dev/) or
[Brubeck](https://github.com/github/brubeck).

Currently supports the following transformations:

* Deny- or allow-listing of specific tag keys or metric names
* Adding hardcoded tags to all metrics
* Basic cardinality limiting, tracking the number of distinct tag values per
  key or the number of overall timeseries (=combinations of metrics and tags).

See `example.yml` for details.

A major goal is minimal overhead and **no loss of information** due to
unnecessarily strict parsing. Statsdproxy intends to orient itself around
[dogstatsd](https://docs.datadoghq.com/developers/dogstatsd/datagram_shell/?tab=metrics)
protocol but should gracefully degrade for other statsd dialects, in that those
metrics and otherwise unparseable bytes will be forwarded as-is.

**This is not a Sentry product**, not deployed in any sort of production
environment, but a side-project done during Hackweek.


## Basic usage

1. Run a "statsd server" on port 8081 that just prints metrics

   ```
   socat -u UDP-RECVFROM:8081,fork SYSTEM:"cat; echo"
   ```

2. Copy `example.yaml` to `config.yaml` and edit it
3. Run statsdproxy to read metrics from port 8080, transform them using the
   middleware in `config.yaml` and forward the new metrics to port 8081:

   ```
   cargo run --release -- --listen 127.0.0.1:8080 --upstream 127.0.0.1:8081 -c config.yaml
   ```

5. Send metrics to statsdproxy:

   ```
   yes 'users.online:1|c|@0.5' | nc -u 127.0.0.1 8080
   ```

4. You should see new metrics in `socat` with your middlewares applied.

## Usage with Snuba

Patch the following settings in `snuba/settings/__init__.py`:

```python
DOGSTATSD_HOST = "127.0.0.1"
DOGSTATSD_PORT = "8080"
```

This will send metrics to port 8080.

## Processing model

This is the processing model used by the provided server. It should be respected
by any usage of this software as a library.

* The server receives metrics as bytes over udp, either singly or several joined
  with `\n`.
* For every metric received, the server invokes the `poll` method of the topmost
  middleware.
    * The middleware may use this invocation to do any needed internal
      bookkeeping.
    * The middleware should then invoke the `poll` method of the next
      middleware, if any.
* Once `poll` returns, the server invokes the `submit` method of the topmost
  middleware with a mutable reference to the current metric.
    * The middleware should process the metric.
        * If processing was successful, and if appropriate to its function
          (eg. a metric aggregator might hold onto metrics), the middleware
          should `submit` the processed metric to the next middleware, returning
          the result of this call.
        * If processing was unsuccessful (eg. unknown StatsD dialect), the
          unchanged metric should be treated as the processed metric, and passed
          on or held as above.
        * If a middleware becomes unable to handle more metrics during
          processing, such that it cannot handle the current metric, it should
          return `Overloaded`.
    * If an overload is indicated, the server shall pause (TODO: how long)
      before calling `submit` again with the same metric. (If an overload is
      indicated too many times, maybe drop the metric?)
* Separately, if no metric is received by the server for 1 second, it will
  invoke the `poll` method of the topmost middleware. This invocation of `poll`
  should be handled the same as above.